Trying to get the frequencies of a .wav file in Python

17,098

Solution 1

If you'd like to detect pitch of a sound (and it seems you do), then in terms of Python libraries your best bet is aubio. Please consult this example for implementation.

import sys
from aubio import source, pitch

win_s = 4096
hop_s = 512 

s = source(your_file, samplerate, hop_s)
samplerate = s.samplerate

tolerance = 0.8

pitch_o = pitch("yin", win_s, hop_s, samplerate)
pitch_o.set_unit("midi")
pitch_o.set_tolerance(tolerance)

pitches = []
confidences = []

total_frames = 0
while True:
    samples, read = s()
    pitch = pitch_o(samples)[0]
    pitches += [pitch]
    confidence = pitch_o.get_confidence()
    confidences += [confidence]
    total_frames += read
    if read < hop_s: break

print("Average frequency = " + str(np.array(pitches).mean()) + " hz")

Be sure to check docs on pitch detection methods.

I also thought you might be interested in estimation of mean frequency and some other audio parameters without using any special libraries. Let's just use numpy! This should give you much better insight into how such audio features can be calculated. It's based off specprop from seewave package. Check docs for meaning of computed features.

import numpy as np

def spectral_properties(y: np.ndarray, fs: int) -> dict:
    spec = np.abs(np.fft.rfft(y))
    freq = np.fft.rfftfreq(len(y), d=1 / fs)
    spec = np.abs(spec)
    amp = spec / spec.sum()
    mean = (freq * amp).sum()
    sd = np.sqrt(np.sum(amp * ((freq - mean) ** 2)))
    amp_cumsum = np.cumsum(amp)
    median = freq[len(amp_cumsum[amp_cumsum <= 0.5]) + 1]
    mode = freq[amp.argmax()]
    Q25 = freq[len(amp_cumsum[amp_cumsum <= 0.25]) + 1]
    Q75 = freq[len(amp_cumsum[amp_cumsum <= 0.75]) + 1]
    IQR = Q75 - Q25
    z = amp - amp.mean()
    w = amp.std()
    skew = ((z ** 3).sum() / (len(spec) - 1)) / w ** 3
    kurt = ((z ** 4).sum() / (len(spec) - 1)) / w ** 4

    result_d = {
        'mean': mean,
        'sd': sd,
        'median': median,
        'mode': mode,
        'Q25': Q25,
        'Q75': Q75,
        'IQR': IQR,
        'skew': skew,
        'kurt': kurt
    }

    return result_d

Solution 2

I felt the OPs frustration - it shouldnt be so hard to find how to get values of the sprectrogram instead of seeing the spectrogram image if someone needs to:

#!/usr/bin/env python

import librosa
import sys
import numpy as np
import matplotlib.pyplot as plt
import librosa.display

np.set_printoptions(threshold=sys.maxsize)

filename = 'filename.wav'
Fs = 44100
clip, sample_rate = librosa.load(filename, sr=Fs)

n_fft = 1024  # frame length 
start = 0 

hop_length=512

#commented out code to display Spectrogram
X = librosa.stft(clip, n_fft=n_fft, hop_length=hop_length)
#Xdb = librosa.amplitude_to_db(abs(X))
#plt.figure(figsize=(14, 5))
#librosa.display.specshow(Xdb, sr=Fs, x_axis='time', y_axis='hz') 
#If to pring log of frequencies  
#librosa.display.specshow(Xdb, sr=Fs, x_axis='time', y_axis='log')
#plt.colorbar()

#librosa.display.waveplot(clip, sr=Fs)
#plt.show()

#now print all values 

t_samples = np.arange(clip.shape[0]) / Fs
t_frames = np.arange(X.shape[1]) * hop_length / Fs
#f_hertz = np.arange(N / 2 + 1) * Fs / N       # Works only when N is even
f_hertz = np.fft.rfftfreq(n_fft, 1 / Fs)         # Works also when N is odd

#example
print('Time (seconds) of last sample:', t_samples[-1])
print('Time (seconds) of last frame: ', t_frames[-1])
print('Frequency (Hz) of last bin:   ', f_hertz[-1])

print('Time (seconds) :', len(t_samples))

#prints array of time frames 
print('Time of frames (seconds) : ', t_frames)
#prints array of frequency bins
print('Frequency (Hz) : ', f_hertz)

print('Number of frames : ', len(t_frames))
print('Number of bins : ', len(f_hertz))

#This code is working to printout frame by frame intensity of each frequency
#on top line gives freq bins
curLine = 'Bins,'
for b in range(1, len(f_hertz)):
    curLine += str(f_hertz[b]) + ','
print(curLine)

curLine = ''
for f in range(1, len(t_frames)):
    curLine = str(t_frames[f]) + ','
    for b in range(1, len(f_hertz)): #for each frame, we get list of bin values printed
        curLine += str("%.02f" % np.abs(X[b, f])) + ','
        #remove format of the float for full details if needed
        #curLine += str(np.abs(X[b, f])) + ','
        #print other useful info like phase of frequency bin b at frame f.
        #curLine += str("%.02f" % np.angle(X[b, f])) + ',' 
    print(curLine)

Solution 3

This answer is quite late, but you could try this:

(Note: I deserve very little credit for this since I got most of it from other SO posts and this great article on FFT using Python: https://realpython.com/python-scipy-fft/)

import numpy as np
from scipy.fft import *
from scipy.io import wavfile


def freq(file, start_time, end_time):

    # Open the file and convert to mono
    sr, data = wavfile.read(file)
    if data.ndim > 1:
        data = data[:, 0]
    else:
        pass

    # Return a slice of the data from start_time to end_time
    dataToRead = data[int(start_time * sr / 1000) : int(end_time * sr / 1000) + 1]

    # Fourier Transform
    N = len(dataToRead)
    yf = rfft(dataToRead)
    xf = rfftfreq(N, 1 / sr)

    # Uncomment these to see the frequency spectrum as a plot
    # plt.plot(xf, np.abs(yf))
    # plt.show()

    # Get the most dominant frequency and return it
    idx = np.argmax(np.abs(yf))
    freq = xf[idx]
    return freq

This code can work for any .wav file, but it may be slightly off since it only returns the most dominant frequency, and also because it only uses the first channel of the audio (if not mono).

If you want to learn more about how the Fourier transform works, check out this video by 3blue1brown with a visual explanation: https://www.youtube.com/watch?v=spUNpyF58BY

Share:
17,098
Dante Falzone
Author by

Dante Falzone

I am currently employed as a senior C++ developer at Intelligent Technology Advance Design in Omaha. I've been programming for fun in C, C++, Rust, and Python since high school, as well as dabbling in many other languages including Java and Assembly. My first programming language was Python. I worked for six months as a professional doing full-stack development, mostly in Python. I used Django and Bootstrap on Apache for web development plus AWS Lambda for some of the backend.

Updated on June 14, 2022

Comments

  • Dante Falzone
    Dante Falzone about 2 years

    I know that questions about .wav files in Python have been just about beaten to death, but I am extremely frustrated as no one's answer seems to be working for me. What I'm trying to do seems relatively simple to me: I want to know exactly what frequencies there are in a .wav file at given times. I want to know, for example, "from the time n milliseconds to n + 10 milliseconds, the average frequency of the sound was x hertz". I have seen people talking about Fourier transforms and Goertzel algorithms, as well as various modules, that I can't seem to figure out how to get to do what I've described. I've tried looking up such things as "find frequency of a wav file in python" about twenty times to no avail. Can someone please help me?

    What I'm looking for is a solution like this pseudocode, or at least one that will do something like what the pseudocode is getting at:

    import some_module_that_can_help_me_do_this as freq
    
    file = 'output.wav'
    start_time = 1000  # Start 1000 milliseconds into the file
    end_time = 1010  # End 10 milliseconds thereafter
    
    print("Average frequency = " + str(freq.average(start_time, end_time)) + " hz")
    

    Please assume (as I'm sure you can tell) that I'm an idiot at math. This is my first question here so be gentle

  • Dante Falzone
    Dante Falzone over 5 years
    Thanks, but there appears to be a problem. When I run the code as you have it here (with the appropriate print statement for the freq call), it gives me the error ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). So I tried changing line 10 to if data[i].all() < 0 and data[i+1].all() > 0: and then if data[i].any() < 0 and data[i+1].any() > 0: , but both times I just got 0.0 no matter what times I specified for start/end.
  • vitalious
    vitalious over 5 years
    That's odd, can you share the file and the start/end time you're passing to this function?
  • vitalious
    vitalious over 5 years
    Which version of Python are you running?
  • Dante Falzone
    Dante Falzone over 5 years
    Running Python 3.7. Here's the version that raises a ValueError: https://pastebin.com/aT4F4bg4 And here's one version that returns 0.0 no matter what start/end times I use: https://pastebin.com/CB9TwbLT Here's one that also only returns 0.0: https://pastebin.com/rCSw5Rc4
  • Dante Falzone
    Dante Falzone over 5 years
    The first one is essentially exactly what you've written, the other two are me trying to fix the ValueError by doing what the error message told me to do (i.e., put .any() and .all() onto the data arrays)
  • vitalious
    vitalious over 5 years
    @kitty4537 Something is off about that data array. It probably has nested arrays for some reason. Look into the structure of the array. It should contain just the values of the sound sampling taken at the sample rate.
  • Lukasz Tracewski
    Lukasz Tracewski over 5 years
    This algorithm can work only in case of a single frequency present in the signal. Essentially it's a simplified way of calculating zero crossings rate.
  • Dante Falzone
    Dante Falzone over 5 years
    Worked really well, but I had to tweak a few things. It first gave me an error saying confidence was not defined, and I wasn't sure how to calculate it, so I just hardcoded it to 0.8. Then it gave me a concatenation type error because np.array(pitches).mean() doesn't return a string, so I str()ed it and now it works perfectly. Thank you
  • Lukasz Tracewski
    Lukasz Tracewski over 5 years
    Thanks @kitty4537 for the effort to point out problems. Fixed. As an extras I have thrown in code that takes you through calculation of various spectral features. Enjoy!
  • ChucK
    ChucK over 2 years
    This worked for me, thank you.
  • developer01
    developer01 over 2 years
    Hey! This is super helpful. What if I have multiple frequencies in the file and I'm not super sure where they start and end? Also, what do I do for DTMF tones? I feed in a DTMF 1 tone into this program and I get 1209 Hz in return, but I also need the 697 Hz too. Please help as I am at a loss of where to go from here.
  • developer01
    developer01 over 2 years
    How should one go about it if you have DTMF tones? @LukaszTracewski
  • Lukasz Tracewski
    Lukasz Tracewski over 2 years
    @developer01 It's not my answer, but I don't think the freq function would work beyond the simplest case.