Reading a wav file with scipy and librosa in python

17,373

Solution 1

This sounds like a quantization problem. If samples in the wave file are stored as float and librosa is just performing a straight cast to an int, and value less than 1 will be truncated to 0. More than likely, this is why sig is an array of all zeros. The float must be scaled to map it into range of an int. For example,

>>> a = sp.randn(10)
>>> a
array([-0.04250369,  0.244113  ,  0.64479281, -0.3665814 , -0.2836227 ,
       -0.27808428, -0.07668698, -1.3104602 ,  0.95253315, -0.56778205])

Convert a to type int without scaling

>>> a.astype(int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Convert a to int with scaling for 16-bit integer

>>> b = (a* 32767).astype(int)
>>> b
array([ -1392,   7998,  21127, -12011,  -9293,  -9111,  -2512, -42939,
        31211, -18604])

Convert scaled int back to float

>>> c = b/32767.0
>>> c
array([-0.04248177,  0.24408704,  0.64476455, -0.36655782, -0.28360851,
       -0.27805414, -0.0766625 , -1.31043428,  0.9525132 , -0.56776635])

c and b are only equal to about 3 or 4 decimal places due to quantization to int.

If librosa is returning a float, you can scale it by 2**15 and cast it to an int to get same range of values that scipy wave reader is returning. Since librosa is returning a float, chances are the values going to lie within a much smaller range, such as [-1, +1], than a 16-bit integer which will be in [-32768, +32767]. So you need to scale one to get the ranges to match. For example,

sig, rate = librosa.load(spec_file, mono=True)
sig = sig × 32767

Solution 2

  • If you yourself do not want to do the quantization, then you could use pylab using the pylab.specgram function, to do it for you. You can look inside the function and see how it uses vmin and vmax.

  • It is not completely clear from your post (at least for me) what you want to achieve (as there is also neither a sample input file nor any script beforehand from you). But anyways, to check if the spectrogram of a wave file has significant differences depending on the case that the signal data returned from any of the read functions is float32 or int, I tested the following 3 functions.

Python Script:

_wav_file_ = "africa-toto.wav"

def spectogram_librosa(_wav_file_):
    import librosa
    import pylab
    import numpy as np
    
    (sig, rate) = librosa.load(_wav_file_, sr=None, mono=True,  dtype=np.float32)
    pylab.specgram(sig, Fs=rate)
    pylab.savefig('spectrogram3.png')

def graph_spectrogram_wave(wav_file):
    import wave
    import pylab
    def get_wav_info(wav_file):
        wav = wave.open(wav_file, 'r')
        frames = wav.readframes(-1)
        sound_info = pylab.fromstring(frames, 'int16')
        frame_rate = wav.getframerate()
        wav.close()
        return sound_info, frame_rate
    sound_info, frame_rate = get_wav_info(wav_file)
    pylab.figure(num=3, figsize=(10, 6))
    pylab.title('spectrogram pylab with wav_file')
    pylab.specgram(sound_info, Fs=frame_rate)
    pylab.savefig('spectrogram2.png')


def graph_wavfileread(_wav_file_):
    import matplotlib.pyplot as plt
    from scipy import signal
    from scipy.io import wavfile
    import numpy as np   
    sample_rate, samples = wavfile.read(_wav_file_)   
    frequencies, times, spectrogram = signal.spectrogram(samples,sample_rate,nfft=1024)
    plt.pcolormesh(times, frequencies, 10*np.log10(spectrogram))
    plt.ylabel('Frequency [Hz]')
    plt.xlabel('Time [sec]')
    plt.savefig("spectogram1.png")
    

spectogram_librosa(_wav_file_)
#graph_wavfileread(_wav_file_)
#graph_spectrogram_wave(_wav_file_)
  • which produced the following 3 outputs:

enter image description here

enter image description here

enter image description here

which apart from the minor differences in size and intensity seem quite similar, no matter the read method, library or data type, which makes me question a little, for what purpose need the outputs be 'exactly' same and how exact should they be.

  • I do find strange though that the librosa.load() function offers a dtype parameter but works anyways only with float values. Googling in this regard led to me to only this issue which wasn't much help and this issue says that that's how it will stay with librosa, as internally it seems to only use floats.
Share:
17,373
Jose Ramon
Author by

Jose Ramon

Updated on June 07, 2022

Comments

  • Jose Ramon
    Jose Ramon almost 2 years

    I am trying to load a .wav file in Python using the scipy folder. My final objective is to create the spectrogram of that audio file. The code for reading the file could be summarized as follows:

    import scipy.io.wavfile as wav
    (sig, rate) = wav.read(_wav_file_)
    

    For some .wav files I am receiving the following error:

    WavFileWarning: Chunk (non-data) not understood, skipping it. WavFileWarning) ** ValueError: Incomplete wav chunk.

    Therefore, I decided to use librosa for reading the files using the:

    import librosa
    (sig, rate) = librosa.load(_wav_file_, sr=None)
    

    That is working properly for all cases, however, I noticed a difference in the colors of the spectrogram. While it was the same exact figure, however, somehow the colors were inversed. More specifically, I noticed that when keeping the same function for calculation of the specs and changing only the way I am reading the .wav there was this difference. Any idea what can produce that thing? Is there a default difference between the way the two approaches read the .wav file?

    EDIT:

    (rate1, sig1) = wav.read(spec_file) # rate1 = 16000
    sig, rate = librosa.load(spec_file) # rate 22050
    sig = np.array(α*sig, dtype = "int16") 
    

    Something that almost worked is to multiple the result of sig with a constant α alpha that was the scale between the max values of the signal from scipy wavread and the signal derived from librosa. Still though the signal rates were different.

  • Jose Ramon
    Jose Ramon over 5 years
    But in order to do the scaling I need to find the min and max from each dataset. Right?That is kind of impossible for me.
  • fstop_22
    fstop_22 over 5 years
    The scale factor is more than likely constant. If it was not, the volume of the wave file would be changing for each block of data that is read.
  • Jose Ramon
    Jose Ramon over 5 years
    I want to read the audio and then calculate the spectrogram following the following examples: haythamfayek.com/2016/04/21/…. I noticed that with librosa and scipy waveread there is a difference in the resulted colors.
  • fstop_22
    fstop_22 over 5 years
    What is the source of the data and what is being used to create the wav file?
  • Jose Ramon
    Jose Ramon over 5 years
    This information is unfortunately something that i do not have access to it. The wav files are part of a database that can be found here: zenodo.org/record/1188976#.XFmYoVxKi73
  • Jose Ramon
    Jose Ramon over 5 years
    Yes I tried that but in a bit arbitary way. I just scale but found the max and min in both cases in 20-30 wav files.
  • Jose Ramon
    Jose Ramon over 5 years
    In that way the spectrograms are kind of similar. Still there is one remaining issue I guess is that both ways have different sample rate. I thought that the sample rate is defined by the .wav file. However, it seems that is not the case.
  • fstop_22
    fstop_22 over 5 years
    The sample rate should be the same. What are values you are getting for sample rates?
  • Jose Ramon
    Jose Ramon over 5 years
    For the scipy wav.read it is 16.000 while for the librosa 22050.
  • fstop_22
    fstop_22 over 5 years
    It sounds like librosa and scipy are using different formats for wav file.