SHA-256 or MD5 for file integrity

143,502

Solution 1

Both SHA256 and MDA5 are hashing algorithms. They take your input data, in this case your file, and output a 256/128-bit number. This number is a checksum. There is no encryption taking place because an infinite number of inputs can result in the same hash value, although in reality collisions are rare.

SHA256 takes somewhat more time to calculate than MD5, according to this answer.

Offhand, I'd say that MD5 would be probably be suitable for what you need.

Solution 2

Every answer seems to suggest that you need to use secure hashes to do the job but all of these are tuned to be slow to force a bruteforce attacker to have lots of computing power and depending on your needs this may not be the best solution.

There are algorithms specifically designed to hash files as fast as possible to check integrity and comparison (murmur, XXhash...). Obviously these are not designed for security as they don't meet the requirements of a secure hash algorithm (i.e. randomness) but have low collision rates for large messages. This features make them ideal if you are not looking for security but speed.

Examples of this algorithms and comparison can be found in this excellent answer: Which hashing algorithm is best for uniqueness and speed?.

As an example, we at our Q&A site use murmur3 to hash the images uploaded by the users so we only store them once even if users upload the same image in several answers.

Solution 3

To 1): Yes, on most CPUs, SHA-256 is about only 40% as fast as MD5.

To 2): I would argue for a different algorithm than MD5 in such a case. I would definitely prefer an algorithm that is considered safe. However, this is more a feeling. Cases where this matters would be rather constructed than realistic, e.g. if your backup system encounters an example case of an attack on an MD5-based certificate, you are likely to have two files in such an example with different data, but identical MD5 checksums. For the rest of the cases, it doesn't matter, because MD5 checksums have a collision (= same checksums for different data) virtually only when provoked intentionally. I'm not an expert on the various hashing (checksum generating) algorithms, so I can not suggest another algorithm. Hence this part of the question is still open. Suggested further reading is Cryptographic Hash Function - File or Data Identifier on Wikipedia. Also further down on that page there is a list of cryptographic hash algorithms.

To 3): MD5 is an algorithm to calculate checksums. A checksum calculated using this algorithm is then called an MD5 checksum.

Solution 4

The underlying MD5 algorithm is no longer deemed secure, thus while md5sum is well-suited for identifying known files in situations that are not security related, it should not be relied on if there is a chance that files have been purposefully and maliciously tampered. In the latter case, the use of a newer hashing tool such as sha256sum is highly recommended.

So, if you are simply looking to check for file corruption or file differences, when the source of the file is trusted, MD5 should be sufficient. If you are looking to verify the integrity of a file coming from an untrusted source, or over from a trusted source over an unencrypted connection, MD5 is not sufficient.

Another commenter noted that Ubuntu and others use MD5 checksums. Ubuntu has moved to PGP and SHA256, in addition to MD5, but the documentation of the stronger verification strategies are more difficult to find. See the HowToSHA256SUM page for more details.

Solution 5

  1. No, it's less fast but not that slow
  2. For a backup program it's maybe necessary to have something even faster than MD5

All in all, I'd say that MD5 in addition to the file name is absolutely safe. SHA-256 would just be slower and harder to handle because of its size.

You could also use something less secure than MD5 without any problem. If nobody tries to hack your file integrity this is safe, too.

Share:
143,502
Dave
Author by

Dave

Updated on August 05, 2020

Comments

  • Dave
    Dave almost 4 years

    I know that SHA-256 is favored over MD5 for security, etc., but, if I am to use a method to only check file integrity (that is, nothing to do with password encryption, etc.), is there any advantage of using SHA-256?

    Since MD5 is 128-bit and SHA-256 is 256-bit (therefore twice as big)...

    1. Would it take up to twice as long to encrypt?

    2. Where time is not of essence, like in a backup program, and file integrity is all that is needed, would anyone argue against MD5 for a different algorithm, or even suggest a different technique?

    3. Does using MD5 produce a checksum?

  • Dave
    Dave over 11 years
    Thank you, but as an off topic question, are you saying that encryption must produce a unique 'code'/'id' like a GUID?
  • Dave
    Dave over 11 years
    Thank you, but the problem is I don't know what else I could use! I'm not asking for a recommendation and I am happy to research other approaches, but can you suggest something other than MD5/SHA256?
  • Dave
    Dave over 11 years
    Ah, and the link you have provided also shows other algorythms. I guess I need to find which ones are available to .NET now and find the quickest. Thank you
  • Paul Manta
    Paul Manta over 11 years
    @DaveRook How else would you then decrypt the message?
  • Dave
    Dave over 11 years
    @PaulManta - I have no idea, as I'm only doing this for integrity I've never actually considered encryption like this, but this is brilliant to know. Thank you.
  • dandan78
    dandan78 over 11 years
    @dave I think there's a bit of confusion on the topic because SHA is called a cryptographic hash. What that means (I'm far from an expert) is that you can use it to hash passwords. That way if an attacker gets your user password file, he can't used the hashes to reconstruct the original passwords. Encryption is different in that it is meant to be reversible.
  • Genesis Rock
    Genesis Rock over 11 years
    Depends on the programming language and runtime environment you're using.
  • SaidbakR
    SaidbakR over 11 years
    @DaveRook In addition, If you look arround for famous website such as Sun, Ubuntu and others, you may notice that they supply MD5 checksum for files integrity. This may support its value for such tasks.
  • Rob
    Rob almost 10 years
    @Dave you mentioned below that you were looking for a list of hash functions. Have a look at the wikipedia article on Hash functions: en.wikipedia.org/wiki/List_of_hash_functions
  • Kris Craig
    Kris Craig over 6 years
    I know I'm late to the party, but thank you for making this point! If you are using checksums to verify that an attacker isn't screwing with your files, MD5 is a terrible idea. If the attacker knows what they're doing, they could theoretically find the right collision that'd enable them to execute their code without altering the file's checksum, thus eluding any checksum-based security verification. SHA algos perform well enough on modern CPUs and scale a lot better (as file sizes get bigger, so too must our checksums). If you're dealing with large HD video files, I'd use SHA-512.
  • Yossi Sternlicht
    Yossi Sternlicht almost 2 years
    After running benchmarks on dotnet 6, it now appears that sha256 is faster than md5 so this answer is outdated