get the amplitude data from an mp3 audio files using python

12,470

MP3 is encoded wave (+ tags and other stuff). All you need to do is decode it using MP3 decoder. Decoder will give you whole audio data you need for further processing.

How to decode mp3? I am shocked there are so few available tools for Python. Although I found a good one in this question. It's called pydub and I hope I can use a sample snippet from author (I updated it with more info from wiki):

from pydub import AudioSegment

sound = AudioSegment.from_mp3("test.mp3")

# get raw audio data as a bytestring
raw_data = sound.raw_data
# get the frame rate
sample_rate = sound.frame_rate
# get amount of bytes contained in one sample
sample_size = sound.sample_width
# get channels
channels = sound.channels

Note that raw_data is 'on air' at this point ;). Now it's up to you how do you want to use gathered data, but this module seems to give you everything you need.

Share:
12,470
Nik391
Author by

Nik391

Love to solve different software engineering problems including Machine Learning, Artificial Intelligence, python, django, javascript, iOS development, fullstack development and lot more

Updated on June 18, 2022

Comments

  • Nik391
    Nik391 about 2 years

    I have an mp3 file and I want to basically plot the amplitude spectrum present in that audio sample. I know that we can do this very easily if we have a wav file. There are lot of python packages available for handling wav file format. However, I do not want to convert the file into wav format then store it and then use it. What I am trying to achieve is to get the amplitude of an mp3 file directly and even if I have to convert it into wav format, the script should do it on air during runtime without actually storing the file in the database. I know we can convert the file like follows:

    from pydub import AudioSegment
    sound = AudioSegment.from_mp3("test.mp3")
    sound.export("temp.wav", format="wav")
    

    and it creates the temp.wav which it supposed to but can we just use the content without storing the actual file?

  • Nik391
    Nik391 almost 8 years
    Thats excellent. Thats exactly what I needed the Raw audio data.
  • P.hunter
    P.hunter over 6 years
    @Nik391 can you please tell me how did you managed to use that raw data with respect to the Amplitude? that would be extremely helpful to me.
  • Nik391
    Nik391 over 6 years
    @PaulNicolashunter the raw data returned by the function is in a string format, you just need to convert it into an integer format using numpy something like this np.fromstring(raw_data, dtype=np.int16)
  • P.hunter
    P.hunter over 6 years
    @Nik391 so what i'm getting is that the string('raw_data') which is in a unicode format represents the Amplitude per second right. and converting it to a numpy array it gives us the integer representation of amplitude per second am I right?
  • Jacek
    Jacek over 6 years
    You need sample_size and channels to interpret raw_data as sound wave. Each frame is channels*sample_size bytes long. So if audio is mono (channel = 1) and sample_size = 2 bytes, you need to take first 2 bytes from raw_data, make 2-byte intereger out of it and you get the amplitude of the first frame.
  • P.hunter
    P.hunter over 6 years
    So if channels are 2 it means audio is stereo and sample_size is sample width ? and as my channels are 2 so i have to take first 2 byes of my raw_data how i'm supposed to achieve that? isn't the raw_data is data of all the frames?
  • Jacek
    Jacek over 6 years
    If _ is a sample and you have 3 channels then song |_ _ _| |_ _ _| |_ _ _| has 6 samples, 3 frames. Each _ is sample_size bytes long. If sample_size = 2 bytes then my song is 12 bytes long, and played at sample_rate = 6 Hz will have duration of 1 second.
  • Jacek
    Jacek over 6 years
    yes, channels = 2 means audio is stereo. Each frame has information what to send to each channel, so channels are always synced together.
  • Jacek
    Jacek over 6 years
    "how i'm supposed to achieve that?" It's the matter of another question, how to deal with bytestring in Python language. Maybe this can help stackoverflow.com/questions/22824539/…
  • P.hunter
    P.hunter over 6 years
    thanks mate, it was *9 samples ,3 frames because i see 9 _ there, and then it means song is 18 bytes long is the sample size is 2 bytes right? and what about sample_width? does it has any connection with it?
  • Jacek
    Jacek over 6 years
    yes, my bad, it has 9 samples ofc, and 18 bytes long, if sample_size=2. sample_size is sample_width here.