How do I process the microphone input in real-time?

11,489

Both DirectSound and the Wave API ultimately give you buffers filled with audio data that you can process. The size of these buffers can be varied, but realistically you will need to keep latency to less then 10mS for useful realtime processing. This means processing data within 10mS of it arriving at the buffer, minus the time between it arriving at the audio hardware and getting to the buffer, which will depend on the driver. For this reason I would recommend processing no more than 5mS of data at a time.

The main architectural difference between the two is that with DirectSound you allocate a circular buffer which is then filled by the DirectSound audio driver whereas the Wave API takes a queue of pre-allocated WAVEHDR buffers which are filled, returned to the app and then recycled. There are various notification methods for both APIs, such as window messages or events. However, for low-latency processing it's probably advisable to maintain a dedicated streaming thread and wait for new data to arrive.

For various reasons I would recommend DirectSound over the Wave API for new development - it will certainly be easier to achieve lower latency.

Whichever method you choose to do the capturing, once you have your data you simply pass it to your processing algorithm and wait for the next buffer to be ready. As long as you can process the data faster than it arrives then you'll have your (pseudo) real time analysis.

There are also alternative APIs that may be more suitable. Have a look at ASIO, Kernel Streaming (for XP only - I wouldn't bother) and, new in Vista, the Core Audio APIs.

Share:
11,489
recursive9
Author by

recursive9

Founder of Crystal Gears Portfolio: - www.movertrends.com - www.friendshopper.com - www.digitalshow.com - www.translatorscorner.com - www.gigpay.com

Updated on June 04, 2022

Comments

  • recursive9
    recursive9 almost 2 years

    I'm starting to create a proof of concept for an idea I have, and at this point, I need some guidance as to how I should begin.

    I need to sample the microphone input, and process that signal in real-time (think Auto-Tune, but working live), as opposed to "recording" for a while.

    What I'm doing is "kind of" a "mic input to MIDI converter", so it needs to respond quite fast.

    I investigated a bit online, and apparently the way to go is either DirectSound or the WaveIn* API functions. Now, according to what I read, the WaveIn APIs will let me fill a buffer of a certain size, which is fine for recording and post-processing, but I'm wondering... How do I do real-time processing?

    Do I use 10ms buffers and keep a circular 50ms or 100ms array myself, and I get a function that triggers the analysis every 10ms? (which has access to the latest 100ms of input, of which only 10ms are new)

    Am I missing something here?

    Also, how is this done with DirectSound? Does it give me any improved capabilities over the regular Win32 APIs?