Determining Bit-Depth of a wav file
Solution 1
I highly recommend the soundfile module (but mind you, I'm very biased because I wrote a large part of it).
There you can open your file as a soundfile.SoundFile object, which has a subtype attribute that holds the information you are looking for.
In your case that would probably be 'PCM_16'
or 'PCM_24'
.
Solution 2
Esentially the same answer as from Matthias, but with copy-pastable code.
Requirements
pip install soundfile
Code
import soundfile as sf
ob = sf.SoundFile('example.wav')
print('Sample rate: {}'.format(ob.samplerate))
print('Channels: {}'.format(ob.channels))
print('Subtype: {}'.format(ob.subtype))
Explanation
- Channels: Usually 2, meaning you have one left speaker and one right speaker.
- Sample rate: Audio signals are analog, but we want to represent them digitally. Meaning we want to discretize them in value and in time. The sample rate gives how many times per second we get a value. The unit is Hz. The sample rate needs to be at least double of the highest frequency in the original sound, otherwise you get aliasing. Human hearing range goes from ~20Hz to ~20kHz, so you can cut off anything above 20kHZ. Meaning a sample rate of more than 40kHz does not make much sense.
- Bit-depth: The higher the bit-depth, the more dynamic range can be captured. Dynamic range is the difference between the quietest and loudest volume of an instrument, part or piece of music. A typical value seems to be 16 bit or 24 bit. A bit-depth of 16 bit has a theoretical dynamic range of 96 dB, whereas 24 bit has a dynamic range of 144 dB (source).
-
Subtype:
PCM_16
means 16 bit depth, where PCM stands for Pulse-Code Modulation.
Alternative
If you only look for a command line tool, then I can recommend MediaInfo:
$ mediainfo example.wav
General
Complete name : example.wav
Format : Wave
File size : 83.2 MiB
Duration : 8 min 14 s
Overall bit rate mode : Constant
Overall bit rate : 1 411 kb/s
Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 8 min 14 s
Bit rate mode : Constant
Bit rate : 1 411.2 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Bit depth : 16 bits
Stream size : 83.2 MiB (100%)
user3535074
Updated on June 22, 2022Comments
-
user3535074 almost 2 years
I am looking for a fast, preferably standard library mechanism to determine the bit-depth of wav file e.g. '16-bit' or '24-bit'.
I am using a subprocess call to Sox to get a plethora of audio metadata but a subprocess call is very slow and the only information I can only currently get reliably from Sox is the bit-depth.
The built in wave module does not have a function like "getbitdepth()" and is also not compatible with 24bit wav files - I could use a 'try except' to access the files metadata using the wave module (if it works, manually record that it is 16bit) then on except call sox instead (where sox will perform the analysis to accurately record its bitdepth). My concern is that that this approach feels like guess work. What if a an 8bit file is read? I would be manually assigning 16-bit when it is not.
SciPy.io.wavefile also is not compatible with 24bit audio so creates a similar issue.
This tutorial is really interesting and even includes some really low level (low level for Python at least) scripting examples to extract information from the wav files headers - unfortunately these scripts don't work for 16-bit audio.
Is there any way to simply (and without calling sox) determine what bit-depth the wav file I'm checking has?
The wave header parser script I'm using is as follows:
import struct import os def print_wave_header(f): ''' Function takes an audio file path as a parameter and returns a dictionary of metadata parsed from the header ''' r = {} #the results of the header parse r['path'] = f fin = open(f,"rb") # Read wav file, "r flag" - read, "b flag" - binary ChunkID=fin.read(4) # First four bytes are ChunkID which must be "RIFF" in ASCII r["ChunkID"]=ChunkID ChunkSizeString=fin.read(4) # Total Size of File in Bytes - 8 Bytes ChunkSize=struct.unpack('I',ChunkSizeString) # 'I' Format is to to treat the 4 bytes as unsigned 32-bit inter TotalSize=ChunkSize[0]+8 # The subscript is used because struct unpack returns everything as tuple r["TotalSize"]=TotalSize DataSize=TotalSize-44 # This is the number of bytes of data r["DataSize"]=DataSize Format=fin.read(4) # "WAVE" in ASCII r["Format"]=Format SubChunk1ID=fin.read(4) # "fmt " in ASCII r["SubChunk1ID"]=SubChunk1ID SubChunk1SizeString=fin.read(4) # Should be 16 (PCM, Pulse Code Modulation) SubChunk1Size=struct.unpack("I",SubChunk1SizeString) # 'I' format to treat as unsigned 32-bit integer r["SubChunk1Size"]=SubChunk1Size AudioFormatString=fin.read(2) # Should be 1 (PCM) AudioFormat=struct.unpack("H",AudioFormatString) ## 'H' format to treat as unsigned 16-bit integer r["AudioFormat"]=AudioFormat[0] NumChannelsString=fin.read(2) # Should be 1 for mono, 2 for stereo NumChannels=struct.unpack("H",NumChannelsString) # 'H' unsigned 16-bit integer r["NumChannels"]=NumChannels[0] SampleRateString=fin.read(4) # Should be 44100 (CD sampling rate) SampleRate=struct.unpack("I",SampleRateString) r["SampleRate"]=SampleRate[0] ByteRateString=fin.read(4) # 44100*NumChan*2 (88200 - Mono, 176400 - Stereo) ByteRate=struct.unpack("I",ByteRateString) # 'I' unsigned 32 bit integer r["ByteRate"]=ByteRate[0] BlockAlignString=fin.read(2) # NumChan*2 (2 - Mono, 4 - Stereo) BlockAlign=struct.unpack("H",BlockAlignString) # 'H' unsigned 16-bit integer r["BlockAlign"]=BlockAlign[0] BitsPerSampleString=fin.read(2) # 16 (CD has 16-bits per sample for each channel) BitsPerSample=struct.unpack("H",BitsPerSampleString) # 'H' unsigned 16-bit integer r["BitsPerSample"]=BitsPerSample[0] SubChunk2ID=fin.read(4) # "data" in ASCII r["SubChunk2ID"]=SubChunk2ID SubChunk2SizeString=fin.read(4) # Number of Data Bytes, Same as DataSize SubChunk2Size=struct.unpack("I",SubChunk2SizeString) r["SubChunk2Size"]=SubChunk2Size[0] S1String=fin.read(2) # Read first data, number between -32768 and 32767 S1=struct.unpack("h",S1String) r["S1"]=S1[0] S2String=fin.read(2) # Read second data, number between -32768 and 32767 S2=struct.unpack("h",S2String) r["S2"]=S2[0] S3String=fin.read(2) # Read second data, number between -32768 and 32767 S3=struct.unpack("h",S3String) r["S3"]=S3[0] S4String=fin.read(2) # Read second data, number between -32768 and 32767 S4=struct.unpack("h",S4String) r["S4"]=S4[0] S5String=fin.read(2) # Read second data, number between -32768 and 32767 S5=struct.unpack("h",S5String) r["S5"]=S5[0] fin.close() return r