How do I compare two files containing several md5 checksums to determine changed files?
Solution 1
I want to know if this can be done using
md5sum --check
? (since it normally checks for any changes in only 1 MD5 file).
No, it can't.
md5sum --check
is meant to read the path to each file in the second column of the input files and check their MD5 checksum agains the checksum reported on the first column; if you want to directly compare the checksums in the two files, you'll have to compare the text files.
Using paste
+ AWK you could do:
paste file1 file2 | awk '{x = $1 == $3 ? "OK" : "FALSE"; print $2" "x}'
-
paste file1 file2
: joins line N offile1
on line N offile2
; -
awk '{x = $1 == $3 ? "OK" : "FALSE"; print $2" "x}'
: if the first field is equal to the third field (i.e. the MD5 sums match), assigns "OK" tox
, otherwise assigns "FALSE" tox
and prints the second field (i.e. the filename) followed by the value ofx
.
% cat file1
5f31caf675f2542a971582442a6625f6 /root/md5filescreator/hash1.txt
4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreator/hash2.txt
1364cdba38ec62d7b711319ff60dea01 /root/md5filescreator/hash3.txt
% cat file2
163559001ec29c4bbbbe96344373760a /root/md5filescreators/hash1.txt
4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreators/hash2.txt
1364cdba38ec62d7b711319ff60dea01 /root/md5filescreators/hash3.txt
% paste file1 file2 | awk '{x = $1 == $3 ? "OK" : "FALSE"; print $2" "x}'
/root/md5filescreator/hash1.txt FALSE
/root/md5filescreator/hash2.txt OK
/root/md5filescreator/hash3.txt OK
Solution 2
A simple way of checking this would be to see which lines are not duplicated across both files:
sort file1 file2 | uniq --unique
uniq --unique
prints those lines which haven't appeared again. Accordingly, those files whose hashes match will have duplicated lines, and won't appear in the output. To simply test if any output is produced, use grep
:
sort file1 file2 | uniq --unique | grep -q .
In this case, since the directories are different, a bit more processing is needed:
awk -F/ '{print $1, $NF}' | sort | uniq --unique | awk '!a[$2]++{print $2}'
Or, entirely in awk:
awk -F/ 'FNR == NR {hash[$NF] = $1; next} hash[$NF] != $1 {print $NF}'
In both cases, you get just the filenames whose hashes differ.
Related videos on Youtube
Comments
-
swapedoc almost 2 years
I have two files
MD1
andMD2
.MD1
contains md5sums:5f31caf675f2542a971582442a6625f6 /root/md5filescreator/hash1.txt 4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreator/hash2.txt 1364cdba38ec62d7b711319ff60dea01 /root/md5filescreator/hash3.txt
where
hash1
,hash2
andhash3
are three files present in foldermd5filescreator
.Similarly
MD2
contains:163559001ec29c4bbbbe96344373760a /root/md5filescreators/hash1.txt 4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreators/hash2.txt 1364cdba38ec62d7b711319ff60dea01 /root/md5filescreators/hash3.txt
where these files are in folder
md5filescreators
.I want to compare the checksums in
md5filescreator
with the corresponding file's checksum inmd5filecreators
.The shell script should return OK for files with same checksums and FALSE for those which are not, along with the file names.
Can this be done using
md5sum --check
(since it normally checks for any changes in only 1 MD5 file)? -
swapedoc over 8 yearsNice solution,however if I have some extra lines in file 2 163559001ec29c4bbbbe96344373760a/root/md5filescreators/hash1.txt 4efe4ba4ba9fd45a29a57893906dcd30 /root/md5filescreators/hash2.txt ab6d1089231ec655831db196bef4a729 /root/md5filescreator/xyz.txt 1364cdba38ec62d7b711319ff60dea01 /root/md5filescreators/hash3.txt In that case hash3 will be compared with xyz and not hash3. How to approach in that case ,just curious?(the real number of files is >10^9 ,so sorting may not be beneficial perhaps)
-
swapedoc over 8 yearsI don't understand how this will work ,NF=4 means 4 fields right?
-
kos over 8 years@swapedoc Nevermind, I missundertood what you were saying. Right now only sorting is occurring to me. For example
paste <(sort -k 2.1 file1) <(sort -k 2.1 file2) | awk 'NF == 4 && $1 == $3 {print $2": OK"; next} {print $2": FALSE"}'
should work... I don't think there's a way other than sorting though. And yes,NF == 4
is an additional condition needed in order to print: OK
(i.e. if there are less than 4 fields the file is missing either infile1
orfile2
). -
swapedoc over 8 yearsActually I want to compare files with same names located in different folders.The reason for same names are files in folder 1 go through a device and the output is stored in folder 2 .To check the integrity I generate 2 checksum files MD1 and MD2 for all the files in folder 1 and folder 2 respectively. Since these files have same data they should match and have same name.But output file also contains some more files.That is the trouble. :(
-
kos over 8 years@swapedoc Have you tried
paste <(sort -k 2.1 file1) <(sort -k 2.1 file2) | awk 'NF == 4 && $1 == $3 {print $2": OK"; next} {print $2": FALSE"}'
? How does it perform? Another way would begrep
ping (or matching with AWK) each single file listed infile1
infile2
, it should be less expensive than sorting both files. -
swapedoc over 8 yearsI will let you know tomorrow about the performance .Anyways thanks for the clever answer :)
-
kos over 8 years@swapedoc No problem. Let me know, I'm curious. :)
-
swapedoc over 8 yearsLet us continue this discussion in chat.
-
swapedoc over 8 yearsWell the new script also failed
-
kos over 8 years@swapedoc How exactly? (You can join the chat and ping me there).
-
swapedoc over 8 yearsI have pinged you in chat