From Amplitude or FFT to dB
A decibel meter measures a signal's mean power. So from your time signal recording you can calculate the mean signal power with:
chunk_size = 44100
num_chunk = len(signal) // chunk_size
sn = []
for chunk in range(0, num_chunk):
sn.append(np.mean(signal[chunk*chunk_size:(chunk+1)*chunk_size]**2))
Then the corresponding mean signal power in decibels is simply given by:
logsn = 10*np.log10(sn)
A equivalent relationship could also be obtained for a frequency domain signal with the use of Parseval's theorem, but in your case would require unecessary FFT computations (this relationship is mostly useful when you already have to compute the FFT for other purposes).
Note however that depending on what you compare there may be some (hopefully small) discrepancies. For example the use of non-linear amplifier and speakers would affect the relationship. Similarly ambient noises would add to the measured power by the decibel meter.
Selim Turkoglu
Updated on June 15, 2022Comments
-
Selim Turkoglu almost 2 years
I've a Python code which performs FFT on a wav file and plot the amplitude vs time / amplitude vs freq graphs. I want to calculate dB from these graphs (they are long arrays). I do not want to calculate exact dBA, I just want to see a linear relationship after my calculations. I've dB meter, I will compare it. Here is my code:
#!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function import scipy.io.wavfile as wavfile import scipy import scipy.fftpack import numpy as np from matplotlib import pyplot as plt fs_rate, signal = wavfile.read("output.wav") print ("Frequency sampling", fs_rate) l_audio = len(signal.shape) print ("Channels", l_audio) if l_audio == 2: signal = signal.sum(axis=1) / 2 N = signal.shape[0] print ("Complete Samplings N", N) secs = N / float(fs_rate) print ("secs", secs) Ts = 1.0/fs_rate # sampling interval in time print ("Timestep between samples Ts", Ts) t = scipy.arange(0, secs, Ts) # time vector as scipy arange field / numpy.ndarray FFT = abs(scipy.fft(signal)) FFT_side = FFT[range(N//4)] # one side FFT range freqs = scipy.fftpack.fftfreq(signal.size, t[1]-t[0]) fft_freqs = np.array(freqs) freqs_side = freqs[range(N//4)] # one side frequency range fft_freqs_side = np.array(freqs_side) makespositive = signal[44100:]*(-1) logal = np.log10(makespositive) sn1 = np.mean(logal[1:44100]) sn2 = np.mean(logal[44100:88200]) sn3 = np.mean(logal[88200:132300]) sn4 = np.mean(logal[132300:176400]) print(sn1) print(sn2) print(sn3) print(sn4) abs(FFT_side) for a in range(500): FFT_side[a] = 0 plt.subplot(311) p1 = plt.plot(t[44100:], signal[44100:], "g") # plotting the signal plt.xlabel('Time') plt.ylabel('Amplitude') plt.subplot(312) p1 = plt.plot(t[44100:], logal, "r") # plotting the signal plt.xlabel('Time') plt.ylabel('Amplitude') plt.subplot(313) p3 = plt.plot(freqs_side, abs(FFT_side), "b") # plotting the positive fft spectrum plt.xlabel('Frequency (Hz)') plt.ylabel('Count single-sided') plt.show()
First plot is amplitude vs time, second one is logarithm of previous graph and the last one is FFT. In sn1,sn2 part I tried to calculate dB from signal. First I took log and then calculated mean value for each second. It did not give me a clear relationship. I also tried this and did not worked.
import numpy as np import matplotlib.pyplot as plt import scipy.io.wavfile as wf fs, signal = wf.read('output.wav') # Load the file ref = 32768 # 0 dBFS is 32678 with an int16 signal N = 8192 win = np.hamming(N) x = signal[0:N] * win # Take a slice and multiply by a window sp = np.fft.rfft(x) # Calculate real FFT s_mag = np.abs(sp) * 2 / np.sum(win) # Scale the magnitude of FFT by window and factor of 2, # because we are using half of FFT spectrum s_dbfs = 20 * np.log10(s_mag / ref) # Convert to dBFS freq = np.arange((N / 2) + 1) / (float(N) / fs) # Frequency axis plt.plot(freq, s_dbfs) plt.grid(True)
So which steps should I perform? (Sum/mean all freq amplitudes then take log or reverse, or perform it for signal etc.)
import numpy as np import matplotlib.pyplot as plt import scipy.io.wavfile as wf fs, signal = wf.read('db1.wav') signal2 = signal[44100:] chunk_size = 44100 num_chunk = len(signal2) // chunk_size sn = [] for chunk in range(0, num_chunk): sn.append(np.mean(signal2[chunk*chunk_size:(chunk+1)*chunk_size].astype(float)**2)) print(sn) logsn = 20*np.log10(sn) print(logsn)
Output:
[4.6057844427695475e+17, 5.0025315250895744e+17, 5.028593412665193e+17, 4.910948397471887e+17] [353.26607217 353.98379668 354.02893044 353.82330741]
-
Selim Turkoglu over 5 yearsI added the plot of first graph. According to your code the first and last part of the sn is positive, middle ones are negative:
[16105357.619081633, -3698520.4044557824, -4771604.623208617, 48875571.17558957]
. First and last second is silence (around 33 dB), between these there are vacuum cleaner (around 55 dB). I do not understand why first and last value is not same or close. Also because of negativity, I faced with error:[16105357.619081633, -3698520.4044557824, -4771604.623208617, 48875571.17558957]
-
SleuthEye over 5 years
sn
being the mean of squared quantities (thus positive) shouldn't be negative. Assuming you've used my code exactly as-is (i.e. without typos), then it may be that signal contains integer values which overflow during the squaring operation. You could try(signal[...].astype(float))**2
. -
Selim Turkoglu over 5 yearsYou can check the edit in first entry. I just added
.astype(float)
to the previous one. Still I'm performing it on same wav file. Results are too similar that I cannot distinguish silence and vacuum cleaner. @SleuthEye -
Selim Turkoglu over 5 yearsAdditionally we always checked mean value. Bu at silence amplitude is around -0.5, at noise amplitude is around between -0.25 -0.75. So mean value of silence -0.5, but mean value of noise is also -0.5 from (-0.25-0.75)/2. I know you said RMS but mean confused me @SleuthEye
-
SleuthEye over 5 yearsIt looks like you have a fairly large bias in your input signal. I'm not sure exactly where it's coming from, but typical audio usually oscillate around 0. You'd have better results once you figure out where that bias comes from and remove it (else you could remove the mean value of the signal over smallish windows of say 100ms).
-
Selim Turkoglu over 5 yearsI fixed that bias issue (it was adding -5x10^8 so I substracted). My output is now:
[2.3161987933791843e+17, 2.5553999430996653e+17, 2.568668790355783e+17, 2.4620444302688246e+17] [347.29551662 348.14917762 348.19416218 347.82591772]
Also you can check new graph of first code @SleuthEye -
SleuthEye over 5 yearsFrom new graph, with a maximum amplitude of ~2e8, the mean squared value should be less than ~4e16, so your output of ~2.5e+17 does not seem to match.
-
Selim Turkoglu over 5 yearsI presented the exact code and plot. Why I faced with such that issue? Also mean squared values are close for both silence and noise situation. Additionally first I can took the abs(signal) then performed loop. sn is still negative without astype. Is it okay? @SleuthEye
-
SleuthEye over 5 yearsIn one case you are loading
db1.wav
and anotheroutput.wav
I assume that might be the reason for the difference. What doesprint(signal2.dtype)
give you? -
Selim Turkoglu over 5 years
int32
. wav files are same, I just changed the name because I tried different versions to ensure I wrote correct. Overflowing issue may cause this problem as you pointed before since Python is dealing with huge numbers. @SleuthEye -
SleuthEye over 5 yearsRight now with chunks of about 1sec you end up capturing some of the noisier part of the signal in the first quieter chunk which reduces the difference between the noisy and quiet part. Try splitting the signal in smaller chunks.
-
Selim Turkoglu over 5 yearsFor 8820:
[2.139055863683949e+17, 2.249777252317077e+17, 2.3267567615403248e+17, 2.3808157320083248e+17, 2.484 5883573462458e+17, 2.5547951221054723e+17, 2.5414628804703328e+17, 2.5556553717743168e+17, 2.5664029 974492352e+17, 2.558683343698968e+17, 2.591194857540977e+17, 2.5686734537040586e+17, 2.5666697693860 944e+17, 2.5736359217607843e+17, 2.543169949387e+17, 2.5226561521156406e+17, 2.4434635201936656e+17, 2.448911185499071e+17, 2.4480967355009898e+17, 2.4470945580347578e+17]
@SleuthEye -
Selim Turkoglu over 5 years
[346.60444254 347.04279043 347.33501969 347.53451567 347.90508892 348.14712157 348.10167542 348.15004578 348.18649708 348.16033084 348.27000146 348.19417795 348.18739991 348.21094219 348.10750766 348.03716117 347.76011719 347.7794607 347.7765715 347.77301502]
-
SleuthEye over 5 yearsOut of curiosity what's
np.min
&np.max
on those same chunks as float? -
Selim Turkoglu over 5 yearsChunk size:22050
max 60201654.0 min 6186645.5 max 262774628.0 min -233976502.0 max 281736633.5 min -273703238.5 max 268182806.0 min -250401337.5 max 314386002.5 min -255943050.5 max 282537845.0 min -248732147.0 max 295691067.5 min -236847509.5 max 20141079.0 min -5764759.5
Every max,min represents half second respectively. @SleuthEye -
Selim Turkoglu over 5 yearsBy playing with signal (mono/stereo) I got new results on same wav file. Chunk is still 22050.
[1000700752140674.9, 1660613646741485.5, 5472748741534938.0, 5354087369440628.0, 5607701170444393.0, 5266715634192847.0, 1259943910814627.2, 54102395761750.03] [300.00608452 304.40537205 314.7641102 314.57370908 314.97569725 314.4307974 302.00702424 274.66432994]
As I understood; around 300 is silence, around 315 is vacuum cleaner. Not a significant difference but may work to distinguish If you say results seem appropriate. @SleuthEye -
SleuthEye over 5 yearsOther than the fact that I had incorrectly provided you with the formula
20*np.log10()
instead of10*np.log10()
, these values are more in line with what I'd expect from the time signal plots you've shown. If the mono/stereo affected the result, I'd guess you probably had overflow when mixing the channels. If that's the case, you could fix the mixing withsignal = signal.astype(float).sum(axis=1) / 2
-
Selim Turkoglu over 5 yearsI corrected according to your reply
[1000700752140674.9, 1660613646741485.5, 5472748741534938.0, 5354087369440628.0, 5607701170444393.0, 5266715634192847.0, 1259943910814627.2, 54102395761750.03] [150.00304226 152.20268603 157.3820551 157.28685454 157.48784862 157.2153987 151.00351212 137.33216497]
In my previous answer I wrotesignal.sum(axis=1) / 2
now added float part and changed multiplexer to 10. Seems fine and same. In output 137 is silence, 157 is noise. And the actual difference was around 20 dB which is so nice. Thanks a lot -
WinEunuuchs2Unix over 3 yearsI'm curious if you guys ever made sample output of end result?