Can you use OpenSSL to generate an md5 or sha hash on a directory of files?
Solution 1
You could recursively generate all the hashes, concatenate the hashes into a single file, then generate a hash of that file.
Solution 2
You can't do a cumulative hash of them all to make a single hash, but you can compress them first then compute the hash:
$tar -czpf archive1.tar.gz folder1/
$tar -czpf archive2.tar.gz folder2/
$openssl md5 archive1.tar.gz archive2.tar.gz
to recursively hash each file:
$find . -type f -exec openssl md5 {} +
Solution 3
You should be probably interested to output the digest in coreutils format (identical to md5sum -b)
So md5sum command could be :
find . -path '*/.svn' -prune -o -type f -print0 | sort | tr '\n' '\0' | xargs -0 openssl dgst -md5 -r
or with an output to a file
find . -path '*/.svn' -prune -o -type f -print0 | sort | tr '\n' '\0' | xargs -0 openssl dgst -md5 -r > ../mydigest.md5
Solution 4
It's better to list a hash for each file, and check each hash. If you make a hash from all files, and one of them becomes corrupt, you won't know which one is corrupt. But if you list hashes for every file, a script can tell you when any hash doesn't match (which will tell you that a file is corrupt or changed).
Also, recursive hashing with find
is simpler than so much piping:
find . -type f -print0 | xargs -0 openssl dgst -sha256 -r >> hashes.sha256
You'll want to append the output via >>
, because xargs will invoke openssl
several times, but only as often as it needs to process all files (not e.g. one invocation per file). -r
is for coreutils hash file syntax. You don't want to use OpenSSL's -out
with xargs
, as it will overwrite the file on each invocation. Additionally you may want to capture STDERR, in case OpenSSL can't read/open some files: 2>> error.log
If storage isn't a bottleneck then you can use the -P n
argument of xargs
to run several OpenSSL processes in parallel (not recommended for hard drives).
note: GNU coreutils (md5sum etc.) use OpenSSL as the library for hashing. But you may still want to use OpenSSL instead if your coreutils are very outdated: Support for H/W SHA-hash acceleration was only added recently to OpenSSL. SHA1/SHA256 can be faster than MD5 without acceleration, and are definitely in the gigabit/s range with it.
Solution 5
Doing a md5 sum on the tar would never work unless all of the metadata (creation date, etc.) was identical as well, because tar stores that as part of its archive.
I would probably do an md5 sum of the contents of all of the files:
find folder1 -type f | sort | tr '\n' '\0' | xargs -0 cat | openssl md5
find folder2 -type f | sort | tr '\n' '\0' | xargs -0 cat | openssl md5
Related videos on Youtube
Alexander
Updated on September 17, 2022Comments
-
Alexander almost 2 years
I'm interested in storing an indicator of file / directory integrity between two archived copies of directories. It's around 1TB of data stored recursively on hard drives. Is there a way using OpenSSL to generate a single hash for all the files that can be used as a comparison between two copies of the data, or at a later point to verify the data has not changed?
-
Alexander over 14 years1TB of data - no room to tar them. Is there a way to recursively generate hashes of all files?
-
John T over 14 yearsyes, added it to my answer.
-
akira over 14 yearsnice tar idea, but not always applicable. the 'find' method is better in general. if there is 'no room' for the tarball: % tar -cf - folder | openssl md5
-
Victor Rocheron over 10 yearsFor a single command, something like
md5 -q <(find . -type f 2>/dev/null | xargs md5 -q | sort)
works well in Bash and doesn't require a temp file. Alter if your system usesmd5sum
instead ofmd5
. Also be aware thatsort
can behave differently on different platforms which will affect the final checksum if the order is different. Add flags like! -name ".DS_Store"
to the find component to ignore certain files, like the .DS_Store files on Mac OS X that can throw off the checksum since they're generated by the OS.