How do I compare two files containing several md5 checksums to determine changed files?

9,334

Solution 1

I want to know if this can be done using md5sum --check? (since it normally checks for any changes in only 1 MD5 file).

No, it can't.

md5sum --check is meant to read the path to each file in the second column of the input files and check their MD5 checksum agains the checksum reported on the first column; if you want to directly compare the checksums in the two files, you'll have to compare the text files.

Using paste + AWK you could do:

paste file1 file2 | awk '{x = $1 == $3 ? "OK" : "FALSE"; print $2" "x}'
  • paste file1 file2: joins line N of file1 on line N of file2;
  • awk '{x = $1 == $3 ? "OK" : "FALSE"; print $2" "x}': if the first field is equal to the third field (i.e. the MD5 sums match), assigns "OK" to x, otherwise assigns "FALSE" to x and prints the second field (i.e. the filename) followed by the value of x.
% cat file1
5f31caf675f2542a971582442a6625f6 /root/md5filescreator/hash1.txt
4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreator/hash2.txt
1364cdba38ec62d7b711319ff60dea01 /root/md5filescreator/hash3.txt
% cat file2
163559001ec29c4bbbbe96344373760a /root/md5filescreators/hash1.txt
4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreators/hash2.txt
1364cdba38ec62d7b711319ff60dea01 /root/md5filescreators/hash3.txt
% paste file1 file2 | awk '{x = $1 == $3 ? "OK" : "FALSE"; print $2" "x}'
/root/md5filescreator/hash1.txt FALSE
/root/md5filescreator/hash2.txt OK
/root/md5filescreator/hash3.txt OK

Solution 2

A simple way of checking this would be to see which lines are not duplicated across both files:

sort file1 file2 | uniq --unique

uniq --unique prints those lines which haven't appeared again. Accordingly, those files whose hashes match will have duplicated lines, and won't appear in the output. To simply test if any output is produced, use grep:

sort file1 file2 | uniq --unique | grep -q .

In this case, since the directories are different, a bit more processing is needed:

awk -F/ '{print $1, $NF}' | sort | uniq --unique | awk '!a[$2]++{print $2}'

Or, entirely in awk:

awk -F/ 'FNR == NR {hash[$NF] = $1; next} hash[$NF] != $1 {print $NF}'

In both cases, you get just the filenames whose hashes differ.

Share:
9,334

Related videos on Youtube

swapedoc
Author by

swapedoc

NIT-B

Updated on September 18, 2022

Comments

  • swapedoc
    swapedoc almost 2 years

    I have two files MD1 and MD2.

    MD1 contains md5sums:

    5f31caf675f2542a971582442a6625f6  /root/md5filescreator/hash1.txt
    4efe4ba4ba9fd45a29a57893906dcd30  /root/md5filescreator/hash2.txt
    1364cdba38ec62d7b711319ff60dea01  /root/md5filescreator/hash3.txt
    

    where hash1, hash2 and hash3 are three files present in folder md5filescreator.

    Similarly MD2 contains:

    163559001ec29c4bbbbe96344373760a  /root/md5filescreators/hash1.txt
    4efe4ba4ba9fd45a29a57893906dcd30  /root/md5filescreators/hash2.txt
    1364cdba38ec62d7b711319ff60dea01  /root/md5filescreators/hash3.txt
    

    where these files are in folder md5filescreators.

    I want to compare the checksums in md5filescreator with the corresponding file's checksum in md5filecreators.

    The shell script should return OK for files with same checksums and FALSE for those which are not, along with the file names.

    Can this be done using md5sum --check (since it normally checks for any changes in only 1 MD5 file)?

  • swapedoc
    swapedoc over 8 years
    Nice solution,however if I have some extra lines in file 2 163559001ec29c4bbbbe96344373760a/root/md5filescreators/hash1‌​.txt 4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreators/hash2.txt ab6d1089231ec655831db196bef4a729 /root/md5filescreator/xyz.txt 1364cdba38ec62d7b711319ff60dea01 /root/md5filescreators/hash3.txt In that case hash3 will be compared with xyz and not hash3. How to approach in that case ,just curious?(the real number of files is >10^9 ,so sorting may not be beneficial perhaps)
  • swapedoc
    swapedoc over 8 years
    I don't understand how this will work ,NF=4 means 4 fields right?
  • kos
    kos over 8 years
    @swapedoc Nevermind, I missundertood what you were saying. Right now only sorting is occurring to me. For example paste <(sort -k 2.1 file1) <(sort -k 2.1 file2) | awk 'NF == 4 && $1 == $3 {print $2": OK"; next} {print $2": FALSE"}' should work... I don't think there's a way other than sorting though. And yes, NF == 4 is an additional condition needed in order to print : OK (i.e. if there are less than 4 fields the file is missing either in file1 or file2).
  • swapedoc
    swapedoc over 8 years
    Actually I want to compare files with same names located in different folders.The reason for same names are files in folder 1 go through a device and the output is stored in folder 2 .To check the integrity I generate 2 checksum files MD1 and MD2 for all the files in folder 1 and folder 2 respectively. Since these files have same data they should match and have same name.But output file also contains some more files.That is the trouble. :(
  • kos
    kos over 8 years
    @swapedoc Have you tried paste <(sort -k 2.1 file1) <(sort -k 2.1 file2) | awk 'NF == 4 && $1 == $3 {print $2": OK"; next} {print $2": FALSE"}'? How does it perform? Another way would be grepping (or matching with AWK) each single file listed in file1 in file2, it should be less expensive than sorting both files.
  • swapedoc
    swapedoc over 8 years
    I will let you know tomorrow about the performance .Anyways thanks for the clever answer :)
  • kos
    kos over 8 years
    @swapedoc No problem. Let me know, I'm curious. :)
  • swapedoc
    swapedoc over 8 years
  • swapedoc
    swapedoc over 8 years
    Well the new script also failed
  • kos
    kos over 8 years
    @swapedoc How exactly? (You can join the chat and ping me there).
  • swapedoc
    swapedoc over 8 years
    I have pinged you in chat