From Amplitude or FFT to dB

python signal-processing fft wav decibel

15,218

A decibel meter measures a signal's mean power. So from your time signal recording you can calculate the mean signal power with:

chunk_size = 44100
num_chunk  = len(signal) // chunk_size
sn = []
for chunk in range(0, num_chunk):
  sn.append(np.mean(signal[chunk*chunk_size:(chunk+1)*chunk_size]**2))

Then the corresponding mean signal power in decibels is simply given by:

logsn = 10*np.log10(sn)

A equivalent relationship could also be obtained for a frequency domain signal with the use of Parseval's theorem, but in your case would require unecessary FFT computations (this relationship is mostly useful when you already have to compute the FFT for other purposes).

Note however that depending on what you compare there may be some (hopefully small) discrepancies. For example the use of non-linear amplifier and speakers would affect the relationship. Similarly ambient noises would add to the measured power by the decibel meter.

15,218

Author by

Selim Turkoglu

Updated on June 15, 2022

Comments

Selim Turkoglu almost 2 years

I've a Python code which performs FFT on a wav file and plot the amplitude vs time / amplitude vs freq graphs. I want to calculate dB from these graphs (they are long arrays). I do not want to calculate exact dBA, I just want to see a linear relationship after my calculations. I've dB meter, I will compare it. Here is my code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function
import scipy.io.wavfile as wavfile
import scipy
import scipy.fftpack
import numpy as np
from matplotlib import pyplot as plt

fs_rate, signal = wavfile.read("output.wav")
print ("Frequency sampling", fs_rate)
l_audio = len(signal.shape)
print ("Channels", l_audio)
if l_audio == 2:
    signal = signal.sum(axis=1) / 2
N = signal.shape[0]
print ("Complete Samplings N", N)
secs = N / float(fs_rate)
print ("secs", secs)
Ts = 1.0/fs_rate # sampling interval in time
print ("Timestep between samples Ts", Ts)
t = scipy.arange(0, secs, Ts) # time vector as scipy arange field / numpy.ndarray
FFT = abs(scipy.fft(signal))
FFT_side = FFT[range(N//4)] # one side FFT range
freqs = scipy.fftpack.fftfreq(signal.size, t[1]-t[0])
fft_freqs = np.array(freqs)
freqs_side = freqs[range(N//4)] # one side frequency range
fft_freqs_side = np.array(freqs_side)

makespositive = signal[44100:]*(-1)
logal = np.log10(makespositive)

sn1 = np.mean(logal[1:44100])
sn2 = np.mean(logal[44100:88200])
sn3 = np.mean(logal[88200:132300])
sn4 = np.mean(logal[132300:176400])

print(sn1)
print(sn2)
print(sn3)
print(sn4)

abs(FFT_side)
for a in range(500):
    FFT_side[a] = 0

plt.subplot(311)
p1 = plt.plot(t[44100:], signal[44100:], "g") # plotting the signal
plt.xlabel('Time')
plt.ylabel('Amplitude')

plt.subplot(312)
p1 = plt.plot(t[44100:], logal, "r") # plotting the signal
plt.xlabel('Time')
plt.ylabel('Amplitude')

plt.subplot(313)
p3 = plt.plot(freqs_side, abs(FFT_side), "b") # plotting the positive fft spectrum
plt.xlabel('Frequency (Hz)')
plt.ylabel('Count single-sided')
plt.show()

First plot is amplitude vs time, second one is logarithm of previous graph and the last one is FFT. In sn1,sn2 part I tried to calculate dB from signal. First I took log and then calculated mean value for each second. It did not give me a clear relationship. I also tried this and did not worked.

import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wf

fs, signal = wf.read('output.wav')  # Load the file
ref = 32768  # 0 dBFS is 32678 with an int16 signal

N = 8192
win = np.hamming(N)                                                       
x = signal[0:N] * win                             # Take a slice and multiply by a window
sp = np.fft.rfft(x)                               # Calculate real FFT
s_mag = np.abs(sp) * 2 / np.sum(win)              # Scale the magnitude of FFT by window and factor of 2,
                                                  # because we are using half of FFT spectrum
s_dbfs = 20 * np.log10(s_mag / ref)               # Convert to dBFS
freq = np.arange((N / 2) + 1) / (float(N) / fs)   # Frequency axis
plt.plot(freq, s_dbfs)
plt.grid(True)

So which steps should I perform? (Sum/mean all freq amplitudes then take log or reverse, or perform it for signal etc.)

import numpy as np
import matplotlib.pyplot as plt
import scipy.io.wavfile as wf

fs, signal = wf.read('db1.wav') 

signal2 = signal[44100:]
chunk_size = 44100
num_chunk  = len(signal2) // chunk_size
sn = []
for chunk in range(0, num_chunk):
    sn.append(np.mean(signal2[chunk*chunk_size:(chunk+1)*chunk_size].astype(float)**2))

print(sn)

logsn = 20*np.log10(sn)

print(logsn)

Output:

[4.6057844427695475e+17, 5.0025315250895744e+17, 5.028593412665193e+17, 4.910948397471887e+17]
[353.26607217 353.98379668 354.02893044 353.82330741]

Selim Turkoglu over 5 years

I added the plot of first graph. According to your code the first and last part of the sn is positive, middle ones are negative: [16105357.619081633, -3698520.4044557824, -4771604.623208617, 48875571.17558957]. First and last second is silence (around 33 dB), between these there are vacuum cleaner (around 55 dB). I do not understand why first and last value is not same or close. Also because of negativity, I faced with error: [16105357.619081633, -3698520.4044557824, -4771604.623208617, 48875571.17558957]
SleuthEye over 5 years

sn being the mean of squared quantities (thus positive) shouldn't be negative. Assuming you've used my code exactly as-is (i.e. without typos), then it may be that signal contains integer values which overflow during the squaring operation. You could try (signal[...].astype(float))**2.
Selim Turkoglu over 5 years

You can check the edit in first entry. I just added .astype(float) to the previous one. Still I'm performing it on same wav file. Results are too similar that I cannot distinguish silence and vacuum cleaner. @SleuthEye
Selim Turkoglu over 5 years

Additionally we always checked mean value. Bu at silence amplitude is around -0.5, at noise amplitude is around between -0.25 -0.75. So mean value of silence -0.5, but mean value of noise is also -0.5 from (-0.25-0.75)/2. I know you said RMS but mean confused me @SleuthEye
SleuthEye over 5 years

It looks like you have a fairly large bias in your input signal. I'm not sure exactly where it's coming from, but typical audio usually oscillate around 0. You'd have better results once you figure out where that bias comes from and remove it (else you could remove the mean value of the signal over smallish windows of say 100ms).
Selim Turkoglu over 5 years

I fixed that bias issue (it was adding -5x10^8 so I substracted). My output is now: [2.3161987933791843e+17, 2.5553999430996653e+17, 2.568668790355783e+17, 2.4620444302688246e+17] [347.29551662 348.14917762 348.19416218 347.82591772] Also you can check new graph of first code @SleuthEye
SleuthEye over 5 years

From new graph, with a maximum amplitude of ~2e8, the mean squared value should be less than ~4e16, so your output of ~2.5e+17 does not seem to match.
Selim Turkoglu over 5 years

I presented the exact code and plot. Why I faced with such that issue? Also mean squared values are close for both silence and noise situation. Additionally first I can took the abs(signal) then performed loop. sn is still negative without astype. Is it okay? @SleuthEye
SleuthEye over 5 years

In one case you are loading db1.wav and another output.wav I assume that might be the reason for the difference. What does print(signal2.dtype) give you?
Selim Turkoglu over 5 years

int32. wav files are same, I just changed the name because I tried different versions to ensure I wrote correct. Overflowing issue may cause this problem as you pointed before since Python is dealing with huge numbers. @SleuthEye
SleuthEye over 5 years

Right now with chunks of about 1sec you end up capturing some of the noisier part of the signal in the first quieter chunk which reduces the difference between the noisy and quiet part. Try splitting the signal in smaller chunks.
Selim Turkoglu over 5 years

For 8820: [2.139055863683949e+17, 2.249777252317077e+17, 2.3267567615403248e+17, 2.3808157320083248e+17, 2.484 5883573462458e+17, 2.5547951221054723e+17, 2.5414628804703328e+17, 2.5556553717743168e+17, 2.5664029 974492352e+17, 2.558683343698968e+17, 2.591194857540977e+17, 2.5686734537040586e+17, 2.5666697693860 944e+17, 2.5736359217607843e+17, 2.543169949387e+17, 2.5226561521156406e+17, 2.4434635201936656e+17, 2.448911185499071e+17, 2.4480967355009898e+17, 2.4470945580347578e+17] @SleuthEye
Selim Turkoglu over 5 years

[346.60444254 347.04279043 347.33501969 347.53451567 347.90508892 348.14712157 348.10167542 348.15004578 348.18649708 348.16033084 348.27000146 348.19417795 348.18739991 348.21094219 348.10750766 348.03716117 347.76011719 347.7794607 347.7765715 347.77301502]
SleuthEye over 5 years

Out of curiosity what's np.min & np.max on those same chunks as float?
Selim Turkoglu over 5 years

Chunk size:22050 max 60201654.0 min 6186645.5 max 262774628.0 min -233976502.0 max 281736633.5 min -273703238.5 max 268182806.0 min -250401337.5 max 314386002.5 min -255943050.5 max 282537845.0 min -248732147.0 max 295691067.5 min -236847509.5 max 20141079.0 min -5764759.5 Every max,min represents half second respectively. @SleuthEye
Selim Turkoglu over 5 years

By playing with signal (mono/stereo) I got new results on same wav file. Chunk is still 22050. [1000700752140674.9, 1660613646741485.5, 5472748741534938.0, 5354087369440628.0, 5607701170444393.0, 5266715634192847.0, 1259943910814627.2, 54102395761750.03] [300.00608452 304.40537205 314.7641102 314.57370908 314.97569725 314.4307974 302.00702424 274.66432994] As I understood; around 300 is silence, around 315 is vacuum cleaner. Not a significant difference but may work to distinguish If you say results seem appropriate. @SleuthEye
SleuthEye over 5 years

Other than the fact that I had incorrectly provided you with the formula 20*np.log10() instead of 10*np.log10(), these values are more in line with what I'd expect from the time signal plots you've shown. If the mono/stereo affected the result, I'd guess you probably had overflow when mixing the channels. If that's the case, you could fix the mixing with signal = signal.astype(float).sum(axis=1) / 2
Selim Turkoglu over 5 years

I corrected according to your reply [1000700752140674.9, 1660613646741485.5, 5472748741534938.0, 5354087369440628.0, 5607701170444393.0, 5266715634192847.0, 1259943910814627.2, 54102395761750.03] [150.00304226 152.20268603 157.3820551 157.28685454 157.48784862 157.2153987 151.00351212 137.33216497] In my previous answer I wrote signal.sum(axis=1) / 2 now added float part and changed multiplexer to 10. Seems fine and same. In output 137 is silence, 157 is noise. And the actual difference was around 20 dB which is so nice. Thanks a lot
WinEunuuchs2Unix over 3 years

I'm curious if you guys ever made sample output of end result?