Bash function to compare two binary files

10,675
#!/bin/bash
diff -u $1 $2 > /dev/null
if [[ $? -eq 0 ]] ;then
    echo "They are equal!"
else 
    echo "They aren't equal!"
fi;

Of course, i didn't test whether argument is free or not.You can do it yourself.

Enjoy that....

Share:
10,675

Related videos on Youtube

JohnyMoraes
Author by

JohnyMoraes

Updated on September 18, 2022

Comments

  • JohnyMoraes
    JohnyMoraes almost 2 years

    I need a function to compare 2 binary files, here the requirements:

    • 2 files, not 3 or 4
    • files can't be assumed to exist
    • avoid running checksum (CRC/MD5/SHA/...) until one must
    • if running multiple checksums, do so from least expensive to most expensive (order above)
    • print out meaningful error messages
    • usage: binary_compare_two_files file1 file2

    Here's what I have got, I think it can be done much better than this. How?

    #!/bin/bash
    
    function binary_compare_two_files() {
    
      REQUIRED_ARGUMENTS=2
    
      n_arguments="$#"
    
      if [ ! "${n_arguments}" -eq $REQUIRED_ARGUMENTS ]; then
        printf 'Invalid number of arguments. Required: %d, supplied: %d\n' \
          $REQUIRED_ARGUMENTS $n_arguments
        echo 'usage: binary_compare_two_files file1 file2'
        return
      fi
    
      file1="${1}"
      file2="${2}"
    
      if [  ! -f "${file1}" -o ! -f "${file2}" ]; then
        echo 'Invalid arguments. Both arguments need to refer to existing files.'
        return
      fi
    
      file1_size=$(stat -f "%z" "${file1}")
      file2_size=$(stat -f "%z" "${file2}")
    
      if [ ! ${file1_size} -eq ${file2_size} ]; then
        return $((file1_size - file2_size))
      fi
    
      file1_md5=$(md5 -q "${file1}")
      file2_md5=$(md5 -q "${file2}")
    
      if [ ! "${file1_md5}" == "${file2_md5}" ]; then
        return -1
      fi
    
      return 0
    }
    

    I have opted not to use diff/bdiff because I am not sure whether those stat and check for sizes first... I would need to look at the src.

    • enzotib
      enzotib almost 12 years
      It is a homework? Otherwise, why not use standard tools, like cmp or diff?
    • JohnyMoraes
      JohnyMoraes almost 12 years
      Not homework, just trying to learn some Bash scripting as I work night shifts at a toll booth. Why not diff? Description read "Compare files line by line." and that sounds inefficient to me. stat first seems instantaneous rather than "line by line".
    • JohnyMoraes
      JohnyMoraes almost 12 years
      I don't have money from school but there's plenty of stuff online to learn... plus SO! :) Lemme download the source from diff and see what that does...
    • Mat
      Mat almost 12 years
      @Robottinosino: why are you not using cmp?
    • terdon
      terdon almost 12 years
      diff works for binary files: diff a b gives Binary files a and b differ. cmp may well be better. You definitely don't need a script for this.
    • daisy
      daisy almost 12 years
      What you want ? What's wrong with diff ? Implement it in bash is nearly impossible if you don't use other binary programs, but C and other script language like Perl is easier to solve these problems, have you googled about it ?
    • cheshirecatalyst
      cheshirecatalyst almost 12 years
      Comparing files based on checksum is not very reliable. An infinite number of input files will have the same checksum. You need to compare all of the bytes.
    • Stabledog
      Stabledog almost 12 years
      Also, it takes a lot more processing to calculate the md5 sum of a binary stream than to simply compare two streams. So this solution is not only risky, but inefficient.
    • JohnyMoraes
      JohnyMoraes almost 12 years
      Interesting. Well, I learned something today! Thanks! BTW, @Stabledog, md5 is because I may want to look for duplicates in a dir tree so I need to keep track of the hashes as I go along to detect pairs to go back to and bit-by-bit scan... Does it make sense?
    • Stabledog
      Stabledog almost 12 years
      That's fine as a way to correlate duplicates, and in fact the odds of an MD5 value collision (two different files in the same tree producing the same MD5) are quite low. We're just addressing the question posed here... 'diff' is simple, efficient, and gets the job done just fine.
    • Gilles 'SO- stop being evil'
      Gilles 'SO- stop being evil' almost 12 years
      To compare binary files, use cmp (not diff, which may not cope with binary files efficiently or at all). If you want to learn how to do it, look at the source. If you want to look for duplicates, computing checksums ahead of time might be a good strategy; there's already fdupes for that.
    • Gilles 'SO- stop being evil'
      Gilles 'SO- stop being evil' almost 12 years
      @JimParis While it is mathematically true that an infinite number of files have the same MD5 checksum, it does not happen accidentally: MD5 collisions have to be specially crafted. It would be better to use an algorithm without collisions (such as SHA-1), in case maliciously-crafted collisions are an issue.
  • enzotib
    enzotib almost 12 years
    1) diff has a -q option, don't need redirecting output. 2) you could use if diff instead of using $?, it is more terse.