Bash function to compare two binary files
10,675
#!/bin/bash
diff -u $1 $2 > /dev/null
if [[ $? -eq 0 ]] ;then
echo "They are equal!"
else
echo "They aren't equal!"
fi;
Of course, i didn't test whether argument is free or not.You can do it yourself.
Enjoy that....
Related videos on Youtube
Author by
JohnyMoraes
Updated on September 18, 2022Comments
-
JohnyMoraes almost 2 years
I need a function to compare 2 binary files, here the requirements:
- 2 files, not 3 or 4
- files can't be assumed to exist
- avoid running checksum (CRC/MD5/SHA/...) until one must
- if running multiple checksums, do so from least expensive to most expensive (order above)
- print out meaningful error messages
- usage: binary_compare_two_files file1 file2
Here's what I have got, I think it can be done much better than this. How?
#!/bin/bash function binary_compare_two_files() { REQUIRED_ARGUMENTS=2 n_arguments="$#" if [ ! "${n_arguments}" -eq $REQUIRED_ARGUMENTS ]; then printf 'Invalid number of arguments. Required: %d, supplied: %d\n' \ $REQUIRED_ARGUMENTS $n_arguments echo 'usage: binary_compare_two_files file1 file2' return fi file1="${1}" file2="${2}" if [ ! -f "${file1}" -o ! -f "${file2}" ]; then echo 'Invalid arguments. Both arguments need to refer to existing files.' return fi file1_size=$(stat -f "%z" "${file1}") file2_size=$(stat -f "%z" "${file2}") if [ ! ${file1_size} -eq ${file2_size} ]; then return $((file1_size - file2_size)) fi file1_md5=$(md5 -q "${file1}") file2_md5=$(md5 -q "${file2}") if [ ! "${file1_md5}" == "${file2_md5}" ]; then return -1 fi return 0 }
I have opted not to use diff/bdiff because I am not sure whether those
stat
and check for sizes first... I would need to look at the src.-
enzotib almost 12 yearsIt is a homework? Otherwise, why not use standard tools, like
cmp
ordiff
? -
JohnyMoraes almost 12 yearsNot homework, just trying to learn some Bash scripting as I work night shifts at a toll booth. Why not diff? Description read "Compare files line by line." and that sounds inefficient to me.
stat
first seems instantaneous rather than "line by line". -
JohnyMoraes almost 12 yearsI don't have money from school but there's plenty of stuff online to learn... plus SO! :) Lemme download the source from
diff
and see what that does... -
Mat almost 12 years@Robottinosino: why are you not using
cmp
? -
terdon almost 12 years
diff
works for binary files:diff a b
givesBinary files a and b differ
.cmp
may well be better. You definitely don't need a script for this. -
daisy almost 12 yearsWhat you want ? What's wrong with diff ? Implement it in bash is nearly impossible if you don't use other binary programs, but C and other script language like Perl is easier to solve these problems, have you googled about it ?
-
cheshirecatalyst almost 12 yearsComparing files based on checksum is not very reliable. An infinite number of input files will have the same checksum. You need to compare all of the bytes.
-
Stabledog almost 12 yearsAlso, it takes a lot more processing to calculate the md5 sum of a binary stream than to simply compare two streams. So this solution is not only risky, but inefficient.
-
JohnyMoraes almost 12 yearsInteresting. Well, I learned something today! Thanks! BTW, @Stabledog, md5 is because I may want to look for duplicates in a dir tree so I need to keep track of the hashes as I go along to detect pairs to go back to and bit-by-bit scan... Does it make sense?
-
Stabledog almost 12 yearsThat's fine as a way to correlate duplicates, and in fact the odds of an MD5 value collision (two different files in the same tree producing the same MD5) are quite low. We're just addressing the question posed here... 'diff' is simple, efficient, and gets the job done just fine.
-
Gilles 'SO- stop being evil' almost 12 yearsTo compare binary files, use
cmp
(notdiff
, which may not cope with binary files efficiently or at all). If you want to learn how to do it, look at the source. If you want to look for duplicates, computing checksums ahead of time might be a good strategy; there's alreadyfdupes
for that. -
Gilles 'SO- stop being evil' almost 12 years@JimParis While it is mathematically true that an infinite number of files have the same MD5 checksum, it does not happen accidentally: MD5 collisions have to be specially crafted. It would be better to use an algorithm without collisions (such as SHA-1), in case maliciously-crafted collisions are an issue.
-
enzotib almost 12 years1)
diff
has a-q
option, don't need redirecting output. 2) you could useif diff
instead of using$?
, it is more terse.