How to split male and female voices from an audio file(in c++ or java)

audio speech-recognition speech

11,186

Solution 1

This is potentially a very complicated question, and it is similar to writing your own speech recognition (or identification) algorithm.

You would start by converting the audio into the frequency domain, which is done using a Fast Fourier Transform.

For each slice in time that you take an FFT, this will give you a list of frequencies and their amplitudes. You will somehow need to detect the fundamental tone by analysing the harmonics. The 2nd and 3rd harmonics will be clearest. It's very hard to figure out which harmonics they are, especially with the background noise and the natural difference between people's voices in terms of which harmonics are loudest. Then you can try to determine if the speaker is male or female by whatever you guessed the fundamental tone to be.

Keep in mind that during many parts of speech like sibilance ('s', 't', etc) there is no tone, just noise. It will need to be pretty intelligent.

Hope that sets you in the right general direction.

Note: if the two voices are simultaneous and you want to separate them cleanly, then this won't help you. I don't believe anyone alive has solved such a problem.

Solution 2

One such tool that makes this possible is LIUM spkdiarization. Written in Java and available under GPL, it is a speech recognition tool and uses statistical models for male, female and child. Luckily for you, the models are provided and you can use it without having to tag the recordings and train the models.

See the scripting page of the LIUM wiki for examples, search in page for "gender".

Solution 3

I think this is already possible. I just started taking an on-line course on Machine Learning by Stanford University with professor Andrew Ng, and during the first lecture he shows a demo where an audio recording of two overlapping voices is processed and the individual voices extracted (the same with music in the background and a person speaking). Apparently it uses an unsupervised learning algorithm that allows it to extract the two underlying patterns. You may want to look into that course (there's one version of the course here: http://www.academicearth.org/courses/machine-learning)

Solution 4

I would start by saying this is impossible. Speech recognition is really, really hard.

You're not clear in your question - are the voices overlapping? If so, splitting them up will be absurdly difficult.

If they are separate, your more likely bet is to have a large set of samples of male and female voices, and look for common characteristics (and a way to programmatically identify them). If the samples aren't recorded cleanly (if they have background noise), things get even more complicated.

You may get away with an average tone - male voices are generally deeper than female..

View more solutions

11,186

Author by

thomasrutter

Web application developer well versed in Javascript, PHP, MySQL, Debian GNU/Linux, and stuff. Creator of the Neon Javascript framework and a site explaining settings on your Android phone.

Updated on June 04, 2022

Comments

thomasrutter almost 2 years

I want to differentiate betwen the male n female voices in an audio file and seperate them.As an output I want the two voices seperated.Can u please help me out n can the coding be done in java or c++
- thomasrutter about 15 years
  
  Are they both talking at the same time? Ie, is this about separating two voices speaking over one another or just determining which one is speaking at a time?
Amit Patil about 15 years

+1. Just to back up the others, splitting simultaneous voices is a Hard Problem which even best-of-breed audio processors out there still can't solve with any great reliability.
thomasrutter about 15 years

Yeah I like that idea, a statistical approach. You could have it learn the more it identifies correctly.
ılǝ over 11 years

Useful reference. If you have seen the presentation, could you provide some outline on how the algorithm works? Is this a sort of "training" over some samples?
buley over 11 years

It was this lecture that drove me searching to find this Stackoverflow question, so I'm stuck in a reference loop. This is obviously a non-trivial problem and I've yet to see a discussion of an implementation outside of Ng's lecture. I believe Ng mentions in this lecture that he's an SPSS guy but I'd like to attempt this in R.