How to automate comparison of md5sum hash values for a large number of files

63,485

Solution 1

For example I have a file called test_binary.

MD5 sum of file test is ef7ab26f9a3b2cbd35aa3e7e69aad86c

To test it automatically run this:

$ md5sum -c <<<"ef7ab26f9a3b2cbd35aa3e7e69aad86c *path/to/file/test_binary"
test_binary: OK

or

$ echo "595f44fec1e92a71d3e9e77456ba80d1  filetohashA.txt" | md5sum -c -

Quote from man

   -c, --check
          read MD5 sums from the FILEs and check them

Quote from wiki

Note: There must be two spaces between each md5sum value and filename to be compared. Otherwise, the following error will result: "no properly formatted MD5 checksum lines found".

Link to wiki

Also you can just read md5 hashes from file

$ md5sum -c md5sum_formatted_file.txt

It is expecting file with format:

<md5sum_checksum><space><space><file_name>

About * and <space> after MD5 sum hash. There is little note in man:

 When  checking,  the
       input  should  be a former output of this program.  The default mode is
       to print a line with checksum, a character indicating input  mode  ('*'
       for binary, space for text), and name for each FILE.

And here is link to stackoverflow where I found answer on question, why should we, sometimes, distinguish binary files and text files.


Solution 2

One possibility is to use the utility cfv

sudo apt-get install cfv

CFV supports many types of hashes, and both testing and hash file creation.

# List the files
$ ls
test.c
# Create a hash file
$ cfv -tmd5 -C
temp.md5: 1 files, 1 OK.  0.001 seconds, 302.7K/s
# Test the hash file
$ cfv -tmd5 -T
temp.md5: 1 files, 1 OK.  0.001 seconds, 345.1K/s
# Display the hash file
$ cat *.md5
636564b0b10b153219d6e0dfa917d1e3 *test.c

Solution 3

Yes, asterisk * is required for this command. Take a look at this example.

This is the binary file, and let say the correct md5sum value is exampleofcorrectmd5value00000000 (32 hexadecimal char)

[root@Linux update]# ls -lh
total 137M
-rw-r--r-- 1 root root 137M Nov  5 13:01 binary-file.run.tgz
[root@Linux update]# 

-c, --check

read MD5 sums from the FILEs and check them

If the md5sum value match with the binary file, you'll get this output

[root@Linux ~]# md5sum -c <<< "exampleofcorrectmd5value00000000" *binary-file.run.tgz"
binary-file.run.tgz: OK
[root@Linux ~]# 

And this is when the md5sum value doesn't match

[root@Linux update]# md5sum -c <<< "exampleofwrongmd5value0000000000 *binary-file.run.tgz"
binary-file.run.tgz: FAILED
md5sum: WARNING: 1 of 1 computed checksum did NOT match
[root@Linux update]# 

Without asterisk *, you'll get the following error message even thought the md5 value is correct

[root@Linux ~]# md5sum -c <<< "exampleofcorrectmd5value00000000 binary-file.run.tgz" 
md5sum: standard input: no properly formatted MD5 checksum lines found
[root@Linux ~]# 

Also, you'll get the same error message if md5sum doesn't have 32 hexadecimal characters in it. In this example, it only has 31 characters.

[root@Linux ~]# md5sum -c <<< "exampleofmd5valuelessthan32char *binary-file.run.tgz" 
md5sum: standard input: no properly formatted MD5 checksum lines found
[root@Linux ~]# 

Solution for many files

If you have many files and want to automate the process, you can follow these steps:

user@Ubuntu:~$ ls -lh
total 12K
-rw-rw-r-- 1 user user 4 Nov  5 14:54 file-a
-rw-rw-r-- 1 user user 4 Nov  5 14:54 file-b
-rw-rw-r-- 1 user user 4 Nov  5 14:54 file-c
user@Ubuntu:~$ 

Generate md5sum for each files and save it to md5sum.txt

user@Ubuntu:~$ md5sum * | tee md5sum.txt
0bee89b07a24ae27c83fc3d5951213c1  file-a
1b2297c171a9a450d184871ccf6c9ad4  file-b
7f4d13d9b0b6ac086fd68637067435c5  file-c
user@Ubuntu:~$ 

To check md5sum for all files, use the following command.

user@Ubuntu:~$ md5sum -c md5sum.txt 
file-a: OK
file-b: OK
file-c: OK
user@Ubuntu:~$ 

This is example if the md5sum value doesn't match with the file. In this case, I'm going to modify file-b content

user@Ubuntu:~$ echo "new data" > file-b 
user@Ubuntu:~$ 

See, this is the error message. Hope this helps.

user@Ubuntu:~$ md5sum -c md5sum.txt 
file-a: OK
file-b: FAILED
file-c: OK
md5sum: WARNING: 1 computed checksum did NOT match
user@Ubuntu:~$ 
Share:
63,485

Related videos on Youtube

sourav c.
Author by

sourav c.

I have been with ubuntu since Lucid (Ubuntu 9.04), part-time system administrator and web developer. I like python, C, and shell scripts. Alumni of Department of Physics, IIT Guwahati.

Updated on September 18, 2022

Comments

  • sourav c.
    sourav c. almost 2 years

    I can check md5sum hash of a file from a terminal as,

    $ md5sum my_sensitive_file
    8dad53cfc973c59864b8318263737462 my_sensitive_file
    

    But the difficult part is to compare the hash value with exact one.

    It is difficult to compare the 32 characters output with original/exact hash value by any human for a large numbers of files. First of all the job would be very monotonous and there are big scope of errors.

    Is it possible to automate the comparing process, preferably in CLI?

  • jobin
    jobin about 10 years
    Is the asterisk necessary?
  • c0rp
    c0rp about 10 years
    Interesting question. I always using with *, but wiki said that it should be two spaces. I will search...
  • c0rp
    c0rp about 10 years
    @souravc ok, I found information about *, will update soon
  • sourav c.
    sourav c. about 10 years
    thanks for your answer, but it needs to install another utility. At the same time it provide supports to other format also. It is good to know about it. But at present context I will go with the other answer. Anyway +1 from me.
  • c0rp
    c0rp about 10 years
    @Jobin I add information abour * to answer
  • jobin
    jobin about 10 years
    That makes sense. +1'd
  • O. R. Mapper
    O. R. Mapper about 7 years
    At least on the command line, one space seems to work just as well.
  • Csabi Vidó
    Csabi Vidó almost 4 years
    cfv has been removed from the repositories in 20.04. An alternative is rhash. The program may eventually be ported to Python 3 github.com/cfv-project/cfv/issues/8