How to compare two lossless audio files?

7,152

Solution 1

I'd try converting them both to WAV and comparing their checksums.

ffmpeg -i file1.m4a file1.wav
ffmpeg -i file2.flac file2.wav
md5sum file1.wav
md5sum file2.wav
rm file?.wav

Compare the md5s produced. If they match, congratulations! Your files contain the same data. If they don't match, post the output of those commands here, and I'll look. Potentially there is a bitrate difference or something (there ought not to be... but there may be, I don't know.)

Note that the ffmpegs will generate comparatively large intermediate files.

Solution 2

You can use the hash muxer to generate a checksum of the decoded media. No need to convert files, and it is unaffected by metadata or other factors that can cause a standalone sum tool to report false differences.

Example to compare WAV → FLAC. Because FLAC is lossless the hashes should be the same:

$ ffmpeg -loglevel error -i input.wav output.flac

$ ffmpeg -loglevel error -i input.wav -map 0 -f hash -
  SHA256=c1acb198952f5c341190ffb62eeafe4f10c8f48c67a188e25087471a74eaa957

$ ffmpeg -loglevel error -i output.flac -map 0 -f hash -
  SHA256=c1acb198952f5c341190ffb62eeafe4f10c8f48c67a188e25087471a74eaa957
  • There are many available hash algorithms to choose from. Some are faster than others. You can select an algorithm with the -hash option, such as -hash md5.

  • -map 0 is used in the examples to include all streams into the checksum. Without it the default stream selection behavior will only choose one stream per stream type. If you want to exclude/include specific streams then do so with the -map option with stream specifiers. For example, to exclude all video use negative mapping with -map -0:v, or to only include audio use -map 0:a, or to only include the third audio stream use -map 0:a:2.

  • The streamhash muxer is similar to hash, but it will output a hash per stream, such as one for video and one for audio. Again, it also will use the default stream selection behavior unless you add -map.

  • If you want to compare each individual frame/packet then use the framehash muxer.

Share:
7,152

Related videos on Youtube

Determinant
Author by

Determinant

Updated on September 18, 2022

Comments

  • Determinant
    Determinant almost 2 years

    I have an M4A file which is also converted to a FLAC file. I'd like to see if the conversion is lossless, namely, whether the output to pcm from M4A is exactly identical to the one from FLAC decoding.

    I assume there's a way to use FFmpeg or Libav to produce some "raw" output and compare them?

  • Determinant
    Determinant over 11 years
    It seems that the output size by ffmpeg -y -i in.m4a -ac 2 -ar 48000 -acodec flac out.flac differs from that of ffmpeg -y -i in.m4a -acodec flac out.flac. I have no idea what's going on when converting as well as the subtle paramters. Could you explain a little bit?
  • Determinant
    Determinant over 11 years
    And the former command is copied from a forum, I guess the file size has something to do with the number "48000", right?
  • thirtythreeforty
    thirtythreeforty over 11 years
    Yup. See, the -ar 48000 says to use 48000 samples per second. If that is different than the source's number of samples per second, ffmpeg interpolates (sticks additional values in between), and that makes the resulting file different. If you just let ffmpeg autodetect everthing, it tries to change as little as it can.
  • Determinant
    Determinant over 11 years
    One more question, is wav file the raw file standard? I mean are there any other alternative raw formats besides wav in PC area?
  • thirtythreeforty
    thirtythreeforty over 11 years
    I suppose there are two well-known uncompressed standards: WAV and AIFF. AIFF is used a lot by Apple's systems; the rest of everybody uses WAV.
  • Determinant
    Determinant over 11 years
    Thx! Fast and clear answer.
  • slhck
    slhck over 11 years
    @ymfoi WAV is not a raw file standard per se. WAV files are just containers and therefore can contain different audio codecs. In this case it will be PCM audio (pulse-code modulated), which is lossless. But there can also be compressed codecs inside a WAV file: en.wikipedia.org/wiki/Wav#WAV_file_compression_codecs_compar‌​ed
  • Determinant
    Determinant over 11 years
    @slhck so, are there any methods to extract raw PCM data from the decoder?
  • slhck
    slhck over 11 years
    @ymfoi FFmpeg will choose 16-bit PCM by default, so you already get uncompressed, "unaltered" audio (unless your source used more bit depth like 32 bit; in that case you could specify -c:a pcm_s32le, for example).
  • Determinant
    Determinant over 11 years
    @slhck I see. Can I say that the WAV file I've got from FFmpeg consists of several chuncks of PCM raw data with some additional information, and there're some other similar formats like WAV, say AIFF, etc.?
  • slhck
    slhck over 11 years
    @ymfoi That is correct. WAV, like AIFF, are just containers that indeed store so-called "chunks" of audio data.
  • Determinant
    Determinant over 11 years
    @slhck Thx~ Now I have a clearer understanding of WAV, which has been a mystery to me for years.
  • thirtythreeforty
    thirtythreeforty over 11 years
    @slhck thanks. I had forgotten to make that distinction. In 99% of the cases you'll see, however, PCM is the only stream format you'll see in a WAV container.
  • user
    user over 11 years
    +1 This is good also because it completely avoids the issue of metadata in the uncompressed file, which otherwise could make identical-audio files differ.