PCM audio amplitude values?

iphone android audio audio-recording pcm

16,571

Solution 1

Think of the surface of the microphone. When it's silent, the surface is motionless at position zero. When you talk, that causes the air around your mouth to vibrate. Vibrations are spring like, and have movement in both directions, as in back and forth, or up and down, or in and out. The vibrations in the air cause the microphone surface to vibrate as well, as in move up and down. When it moves down, that might be measured or sampled a positive value. When it moves up that might be sampled as a negative value. (Or it could be the opposite.) When you stop talking the surface settles back down to the zero position.

What numbers you get from your PCM recording data depend on the gain of the system. With common 16 bit samples, the range is from -32768 to 32767 for the largest possible excursion of a vibration that can be recorded without distortion, clipping or overflow. Usually the gain is set a bit lower so that the maximum values aren't right on the edge of distortion.

ADDED:

8-bit PCM audio is often an unsigned data type, with the range from 0..255, with a value of 128 indicating "silence". So you have to add/subtract this bias, as well as scale by about 256 to convert between 8-bit and 16-bit audio PCM waveforms.

Solution 2

The raw numbers are an artefact of the quantization process used to convert an analog audio signal into digital. It makes more sense to think of an audio signal as a vibration around 0, extending as far as +1 and -1 for maximum excursion of the signal. Outside that, you get clipping, which distorts the harmonics and sounds terrible.

However, computers don't work all that well in terms of fractions, so discrete integers from 0 to 65536 are used to map that range. In most applications like this, a +32767 is considered maximum positive excursion of the microphone's or speaker's diaphragm. There is no correlation between a sample point and a sound pressure level, unless you start factoring in the characteristics of the recording (or playback) circuits.

(BTW, 16-bit audio is very standard and widely used. It is a good balance of signal-to-noise ratio and dynamic range. 8-bit is noisy unless you do some funky non-standard scaling.)

Solution 3

Lots of good answers here, but they don't directly address your questions in an easy to read way.

What exactly are the units for the amplitude values? The values are signed 16-bit, so they must range from -32K to +32K. But what do these values represent? Decibels?

The values have no unit. They simply represent a number that has come out of an analog-to-digital converter. The numbers from the A/D converter are a function of the microphone and pre-amplifier characteristics.

If I use 8-bit values, then the values must range from -128 to +128. How would that get mapped to the volume/"loudness" of the 16-bit values? Would you just use a 16-to-1 quantisation mapping?

I don't understand this question. If you are recording 8-bit audio, your values will be 8-bits. Are you converting 8-bit audio to 16-bit?

Why are there negative values? I would think that complete silence would result in values of 0

The diaphragm on a microphone vibrates in both directions and as a result creates positive and negative voltages. A value of 0 is silence as it indicates that the diaphragm is not moving. See how microphones work

For more details on how sound is represented digitally, see here.

Solution 4

Why are there negative values? I would think that complete silence
would result in values of 0

The diaphragm on a microphone vibrates in both directions and as a result creates positive and negative voltages. A value of 0 is silence as it indicates that the diaphragm is not moving. See how microphones work

Small clarification: The position of the diaphragm is being recorded. Silence occurs when there is no vibration, when there is no change in position. So the vibration you are seeing is what is pushing the air and creating changes in air pressure over time. The air is no longer being pushed at the top and bottom peaks of any vibration, so the peaks are when silence occurs. The loudest part of the signal is when the position changes the fastest which is somewhere in the middle of the peaks. The speed with which the diaphragm moves from one peak to another determines the amount of pressure that's generated by the diaphragm. When the top and bottom peaks are reduced to zero (or some other number they share) then there is no vibration and no sound at all. Also as the diaphragm slows down so that there's a greater space of time between peaks, there is less sound pressure being generated or recorded.

I recommend the Yamaha Sound Reinforcement Handbook for more in depth reading. Understanding the idea of calculus would help the understanding of audio and vibration as well.

Solution 5

The 16bit numbers are the A/D convertor values from your microphone (you knew this). Know also that the amplifier between your microphone and the A/D convertor has an Automatic Gain Control (AGC). The AGC will actively change the amplification of the microphone signal to prevent too much voltage from hitting the A/D convertor (usually < 2Volts dc). Also, there is DC voltage de-coupling which sets the input signal in the middle of the A/D convertor's range (say 1Volt dc).

So, when there is no sound hitting the microphone, the AGC amplifier is sending a flat line 1.0 Volt dc signal to the A/D convertor. When sound waves hit the microphone, it creates a corresponding AC voltage wave. The AGC amp takes the AC voltage wave, centers it at 1.0 Vdc, and sends it to the A/D convertor. The A/D samples (measures the DC Voltage at say 44,000 / per second), and spits out the +/-16bit values of the voltage. So -65,536 = 0.0 Vdc and +65,536 = 2.0 Vdc. A value of +100 = 1.00001529 Vdc and -100 = 0.99998474 Vdc hitting the A/D convertor.

+Values are above 1.0 Vdc, -Values are below 1.0 Vdc.

Note, most audio systems use a log formula to curve the audio wave logarithmically, so a human ear can better hear it. In digital audio systems (with ADCs), Digital Signal Processing puts this curve on the signal. DSPs chips are big business, TI has made a fortune using them for all kinds of applications, not just audio processing. DSPs can work the very complicated math onto a real time stream of data that would choke an iPhone's ARM7 processor. Say you are sending 2MHz pulses to an array of 256 ultrasound sensor/receivers--you get the idea.

View more solutions

16,571

Author by

stackoverflowuser2010

Updated on June 07, 2022

Comments

stackoverflowuser2010 almost 2 years
I am starting out with audio recording using my Android smartphone.

I successfully saved voice recordings to a PCM file. When I parse the data and print out the signed, 16-bit values, I can create a graph like the one below. However, I do not understand the amplitude values along the y-axis.
1. What exactly are the units for the amplitude values? The values are signed 16-bit, so they must range from -32K to +32K. But what do these values represent? Decibels?
2. If I use 8-bit values, then the values must range from -128 to +128. How would that get mapped to the volume/"loudness" of the 16-bit values? Would you just use a 16-to-1 quantisation mapping?
3. Why are there negative values? I would think that complete silence would result in values of 0.
If someone can point me to a website with information on what's being recorded, I would appreciate it. I found webpages on the PCM file format, but not what the data values are.
stackoverflowuser2010 about 13 years

So is the y-axis (amplitude) unitless? If that's the case, is it then up to the playback application to interpret the value and play the signal at a given volume?
hotpaw2 about 13 years

@stackoverflowuser2010 : Unless you have a calibrated mic and recording path with a known gain, the amplitude values are assumed unitless. e.g. on playback, someone will be playing with a volume knob to some arbitrary setting (whatever "sounds good"), and an AGC likely added some unknown gain to the record path as well.
stackoverflowuser2010 about 13 years

I still don't understand what happens at the recording end. There seems to be a lot of variables with the microphone and the analog-to-digital converter (ADC) to map noises to the PCM range (whether it's +1 to -1 or +32K to -32K). Consider two sound sources: normal speech around 40 dB and a jet engine around 120 dB. Who decides that the PCM's +32K value maps to 120 dB or 40 dB? Is this a setting decided by the sound engineer or ADC unit or what? Is there a way in Android/iPhone to set this range mapping?
hotpaw2 about 13 years

Yes, there are a lot of variables. The gain may be decided by an undocumented AGC (automatic gain control) algorithm.
Deva about 13 years

@stackoverflowuser2010 You're confusing a digital world and an analog world. There is no correlation with a sample value and a decibel level. Basically If a microphone and recording circuit is designed to record up to 40db, then a sample level of +32k (or -32k) is going to be close to the peak level of a 40db sound. Move the microphone further away from the source, and the same peak will digitize rather lower than +32k. And that still ignores the effects of an AGC circuit!
Deva about 13 years

@stackoverflowuser2010 Likewise, a digital signal swinging between +32k and -32k might be 25db in my headphones, but I can easily make it 90db by using a powerful audio amplifier. But it's still peaking at 32k.
stackoverflowuser2010 about 13 years

Regarding my question about 8-bit audio, my intention was to ask about how the true audible level gets mapped to the 8-bit range. If a lawnmower at 80 dB produces a 16-bit amplitude value of +20K, would that same 80 dB noise produce an 8-bit amplitude value of, say, +78?
Error 454 about 13 years

Ideally, yes (depending on whether the audio is signed/unsigned). The answer depends on the specified range of the microphone you are using and how well they've tuned the pre-amplifier to map the dynamic range of voltage levels coming out of it.
Bob C about 8 years

Excellent! Thanks for this. I'm not so concerned about the units, but I was unclear on what a "negative" value meant. Explaining it in terms of vibration makes perfect sense. So a -1.0f value is just as "loud" as a +1.0f value? But in terms of "vibration frequency," if you tried to mix the two values, they'd cancel out and give you (essentially) silence.