Difference between mel-spectrogram and an MFCC

spectrogram mfcc librosa

17,295

Solution 1

To get MFCC, compute the DCT on the mel-spectrogram. The mel-spectrogram is often log-scaled before.

MFCC is a very compressible representation, often using just 20 or 13 coefficients instead of 32-64 bands in Mel spectrogram. The MFCC is a bit more decorrelarated, which can be beneficial with linear models like Gaussian Mixture Models. With lots of data and strong classifiers like Convolutional Neural Networks, mel-spectrogram can often perform better.

Solution 2

I suppose, jonnor's answer is not exactly correct. There are two steps:
1. Take logs of Mel spectrogram.
2. Compute DCT on logs.
Moreover, taking logs seems to be "the main part" for training NN: https://qr.ae/TWtPLD

Solution 3

A key difference is that the mel-spectrogram has the semantics of a spectrum, whereas MFCC in a sense is a 'spectrum of a spectrum'. The real question is thus: What is the purpose of applying the DCT to the mel-spectrogram, which has good answers here and there.

Note that in the meantime librosa also has a mfcc function. And looking at its implementation basically confirms that it is

calling melspectrogram,
converting its output to logs (via power_to_db),
taking the DCT of the frequencies, as if they were a signal,
truncating the new 'spectrum of spectrum' after the first n_mfcc coefficients.

17,295

monadoboi

CS Student at University of Bristol.

Updated on May 16, 2022

Comments

monadoboi about 2 years

I'm using the librosa library to convert music segments into mel-spectrograms to use as inputs for my neural network, as shown in the docs here.

How is this different from MFCCs, if at all? Are there any advantages or disadvantages to using either?