Difference between mel-spectrogram and an MFCC
Solution 1
To get MFCC, compute the DCT on the mel-spectrogram. The mel-spectrogram is often log-scaled before.
MFCC is a very compressible representation, often using just 20 or 13 coefficients instead of 32-64 bands in Mel spectrogram. The MFCC is a bit more decorrelarated, which can be beneficial with linear models like Gaussian Mixture Models. With lots of data and strong classifiers like Convolutional Neural Networks, mel-spectrogram can often perform better.
Solution 2
I suppose, jonnor's answer is not exactly correct. There are two steps:
1. Take logs of Mel spectrogram.
2. Compute DCT on logs.
Moreover, taking logs seems to be "the main part" for training NN: https://qr.ae/TWtPLD
Solution 3
A key difference is that the mel-spectrogram has the semantics of a spectrum, whereas MFCC in a sense is a 'spectrum of a spectrum'. The real question is thus: What is the purpose of applying the DCT to the mel-spectrogram, which has good answers here and there.
Note that in the meantime librosa also has a mfcc
function. And looking at its implementation basically confirms that it is
- calling
melspectrogram
, - converting its output to logs (via
power_to_db
), - taking the DCT of the frequencies, as if they were a signal,
- truncating the new 'spectrum of spectrum' after the first
n_mfcc
coefficients.