extracting pitch features from audio file
Solution 1
You can map frequencies to musical notes:
with being the midi note number to be calculated, the frequency and the chamber pitch (in modern music 440.0 Hz is common).
As you may know a single frequency doesn't make a musical pitch. "Pitch" arises from the sensation of the fundamental of harmonic sounds, i.e. sounds that mainly consist of integer multiples of one single frequency (= the fundamental).
If you want to have Chroma Features in Python, you can use the Bregman Audio-Visual Information Toolbox. Note that chroma features don't give you information about the octave of a pitch, so you just get information about the pitch class.
from bregman.suite import Chromagram
audio_file = "mono_file.wav"
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
F.X # all chroma features
F.X[:,0] # one feature
The general problem of extracting pitch information from audio is called pitch detection.
Solution 2
You can try reading the literature on pitch detection, which is quite extensive. Generally autocorrelation-based methods seem to work pretty well; frequency-domain or zero-crossing methods are less robust (so FFT doesn't really help much). A good starting point may be to implement one of these two algorithms:
YAAPT, from: Stephen A. Zahorian and Hongbing Hu, "A spectral-temporal method for robust fundamental frequency tracking", J. Acoust. Soc. Am. 123, 4559 (2008). http://bingweb.binghamton.edu/~hhu1/paper/Zahorian2008spectral.pdf and MATLAB code here: http://ws2.binghamton.edu/zahorian/yaapt.htm
YIN, from: De Cheveigné, A., Kawahara, H. "YIN, a fundamental frequency estimator for speech and music", J. Acoust. Soc. Am. 111, 1917-1930 (2002). http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf
As far as off-the-shelf solutions, check out Aubio, C code with python wrapper, several pitch-extraction algorithms available including YIN and multiple-comb.
Solution 3
If you're willing to use 3rd party libraries (at least as a reference for how other people accomplished this):
Extracting musical information from sound, a presentation from PyCon 2012, shows how to use the AudioNest Python API:
Here is the relevant EchoNest documentation:
- Track API Methods
- Detailed Analyze Documentation
Relevant excerpt:
pitch content is given by a “chroma” vector, corresponding to the 12 pitch classes C, C#, D to B, with values ranging from 0 to 1 that describe the relative dominance of every pitch in the chromatic scale. For example a C Major chord would likely be represented by large values of C, E and G (i.e. classes 0, 4, and 7). Vectors are normalized to 1 by their strongest dimension, therefore noisy sounds are likely represented by values that are all close to 1, while pure tones are described by one value at 1 (the pitch) and others near 0.
EchoNest does the analysis on their servers. They provide free API keys for non-commercial use.
If EchoNest is not an option, I would look at the open-source aubio project. It has python bindings, and you can examine the source to see how they accomplished pitch detection.
Related videos on Youtube
Ada Xu
Updated on September 15, 2022Comments
-
Ada Xu over 1 year
I am trying to extract pitch features from an audio file which I would use for a classification problem. I am using python(scipy/numpy) for classification.
I think I can get frequency features using
scipy.fft
but I don't know how to approximate musical notes using frequencies. I researched a bit and found that I need to get chroma features which map frequencies to12
bins for notes of a chromatic scale.I think there's a chroma toolbox for matlab but I don't think there's anything similiar for python.
How should I go forward with this? Could anyone also suggest reading material I should look into?
-
Ada Xu over 10 yearsThanks a lot... Could you also recommend reading material or books on pitch detection or application of dsp to music in general?
-
Frank Zalkow over 10 yearsAs an general introduction to a wide range of computer music issues C. Roads The Computer Music Tutorial (1994, Cambridge: MIT Press) is a very accessible and comprehensive (>1000 pages) reference book. For me the 1st part of M. Müllers Information Retrieval for Music and Motion (2007, Berlin, Heidelberg: Springer) was great (less comprehensive, more up-to-date, more technical). If you are interested in a particular topic, the procceedings of ISMIR are a rich seam of information. Others may give you other (and better?) references. I'd be interested too.
-
Ada Xu over 10 yearsThanks .. I'll look into them
-
Wyrmwood over 10 yearsPitch IS the fundamental frequency. The harmonics comprise the timbre (pronounced tamber). For example, a flute and a violin can play the same pitch (fundamental frequency), but their timbre is the harmonic frequency characteristics that make them sound different.
-
Frank Zalkow over 10 yearsI think, pitch and timbre are no "physical-acoustical" facts, but rather psychoacoustical effects. That's why I wanted stress, that "pitch" arises from sensation of the fundamental and it's not the fundamental itself. Would you agree with that?
-
Alex I over 10 yearsI have to agree with Frank Zalkow here. Non-harmonic/non-periodic sounds, even modulated noise bursts, can have perceived pitch, so the fundamental frequency is clearly not everything.
-
Ada Xu over 10 yearsThanks a lot :) About aubio, I am finding implementing examples on this page aubio.org/doc/latest/examples.html a little difficult. I can't find the methods they've used in their examples in the library and there isn't enough documentation.
-
Ada Xu over 10 yearsThanks. Interesting vid :)