What is the value of MD5 checksums if the MD5 hash itself could potentially also have been manipulated?

security download md5sum

9,071

Solution 1

I have heard this is to allow [...] for any malicious changes to be detected also.

Well you heard wrong, then. MD5 (or SHA or whatever) checksums are provided (next to downloads links, specifically) only for verifying a correct download. The only thing they aim to guarantee is that you have the same file as the server. Nothing more, nothing less. If the server is compromised, you’re SOL. It’s really as simple as that.

Solution 2

The solution used by some package management systems such as dpkg is to sign the hash: use the hash as input to one of the public key signing algorithms. See http://www.pgpi.org/doc/pgpintro/#p12

If you have the public key of the signatory, you can verify the signature, which proves the hash is unmodified. This just leaves you with the problem of getting the right public key in advance, although if someone once tampers with the key distribution they also have to tamper with everything you might verify with it otherwise you'll spot that something strange is going on.

Solution 3

Your assumption is correct. There is an exception though. If the server providing the file and the page where the hash is are not managed by the same entity. In that case the software developer may want to say "hey people download this from that place but only believe if hash = xxxx". (This might be usefull for CDN's as an example). I guess this was the reason why someone did it in the first place. Than others just followed thinking how cool it would be to show the hash. Not even thinking how useful it is not even both the file and the hash are on the same location.

Having this said, this is worth what it is. Don't assume too much about security as others already stated. If and only if you can absolutely trust the original hash, than the file is good. Otherwise an attacker with enough motivation and knowledge can tamper both file and the hash, even if these are in different servers and managed by different entities.

Solution 4

Sometimes the checksums are provided securely, but the download is not. Since MD5 is broken, the security MD5 checksums provide are weaker than more secure checksums, but before MD5 was broken, a securely provided MD5 (e.g. one that was signed with PGP or GPG or Gatekeeper, or fetched over HTTPS) that matched the MD5 of the download was strong evidence that the download received was the one the server was making available.

I have been writing about the lamentable lack of secure checksums for years, here.

Users shouldn't download untrusted executables over untrusted networks and run them, because of the risk of MITM attacks. See, e.g. "Insecurities within automatic update systems" by P. Ruissen, R. Vloothuis.

2014 Addendum: No, it's NOT wrong "that checksums posted on web pages are used to detect malicious modifications," because this IS a role they can perform. They do help protect against accidental corruption, and if served over HTTPS or with a verified signature (or better yet, both) help protect against malicious corruption! I have obtained checksums over HTTPS and verified that they matched HTTP downloads many times.

Nowadays, binaries are often distributed with signed, automatically verified hashes, yet even this is not perfectly secure.

Excerpt from above link: "The KeRanger application was signed with a valid Mac app development certificate; therefore, it was able to bypass Apple’s Gatekeeper protection." ... "Apple has since revoked the abused certificate and updated XProtect antivirus signature, and Transmission Project has removed the malicious installers from its website. Palo Alto Networks has also updated URL filtering and Threat Prevention to stop KeRanger from impacting systems. Technical Analysis

The two KeRanger infected Transmission installers were signed with a legitimate certificate issued by Apple. The developer listed this certificate is a Turkish company with the ID Z7276PX673, which was different from the developer ID used to sign previous versions of the Transmission installer. In the code signing information, we found that these installers were generated and signed on the morning of March 4."

2016 Addenda:

@Cornstalks: Re. your comment below: Wrong. As currently noted at the collision attack Wikipedia article you link to, "In 2007, a chosen-prefix collision attack was found against MD5" and "the attacker can choose two arbitrarily different documents, and then append different calculated values that result in the whole documents having an equal hash value." Thus, even if the MD5 is provided securely and an attacker can't modify it, an attacker still CAN use a chosen-prefix collision attack with a chosen-prefix containing malware, which means MD5 is NOT secure for crypto purposes. This is largely why US-CERT said MD5 "should be considered cryptographically broken and unsuitable for further use."

A couple more things: CRC32 is a checksum. MD5, SHA, etc. are more than checksums; they're intended to be secure hashes. That means they're supposed to be very resistant to collision attacks. Unlike a checksum, a securely communicated secure hash protects against a man-in-the-middle (MITM) attack where the MITM is between the server and the user. It doesn't protect against an attack where the server itself is compromised. To protect against that, people typically rely on something like PGP, GPG, Gatekeeper, etc.

Solution 5

This is really a problem. Showing checksums on the same site as the file to download is insecure. A person who can change the file can also change the checksum. The checksum should be shown through a complete separated system but this is hardly feasible, because how to tell the user in a safe way where the checksum can be found.

A possible solution is the use of signed files.

(BTW: MD5 is unsafe anywhere and shouldn't be used anymore.)

View more solutions

9,071

Austin ''Danger'' Powers

Updated on September 18, 2022

Comments

Austin ''Danger'' Powers almost 2 years

Downloads on websites sometimes have an MD5 checksum, allowing people to confirm the integrity of the file. I have heard this is to allow not only corrupted files to be instantly identified before they cause a problem but also for for any malicious changes to be easily detected.

I follow the logic as far as file corruption is concerned but if someone deliberately wants to upload a malicious file, then they could generate a corresponding MD5 checksum and post that on the download site along with the altered file. This would deceive anyone downloading the file into thinking it was unaltered.

How can MD5 checksums provide any protection against deliberately altered files if there is no way of knowing if the checksum itself has been compromised?
- Austin ''Danger'' Powers over 9 years
  
  Why would a file need to be the same size after it's been altered? I'm saying a file could be changed, a new hash generated for the malicious version... then the hash posted on the website could be replaced with the new one by the malicious entity.
- fixer1234 over 9 years
  
  Most download sites give the file size and often the creation date. I suppose those could also be altered on the web site. However, wouldn't the web site owner detect all of the hacking to the site?
- Austin ''Danger'' Powers over 9 years
  
  If we are relying on the website host noticing subtle timestamp discrepancies instead of the MD5 hash acting as a seal of authenticity... then the protection provided by the checksum has pretty much evaporated.
- fixer1234 over 9 years
  
  I'm referring to things like logs of site access rather than noticing subtle content change, although the web page could have its own hash known to the site owner.
- Austin ''Danger'' Powers over 9 years
  
  The point is that people accessing the site have no way of knowing how proactive the website host is in checking those logs. The MD5 checksum is supposed to provide a way for people to check the integrity of their own downloads, without relying on the actions of any other parties.
- Kinnectus over 9 years
  
  MD5 doesn't generate against file contents so there will never be a way of checking file integrity - apart from corruption, but this often results in a different file size so the hash will be different. If a malicious file has a valid hash then there'll be no way of telling at this stage.
- Matt Nordhoff over 9 years
  
  @BigChris I'm not sure what you mean, but it sounds wrong. Cryptographic hash algorithms like MD5 are completely about the message data. Two random messages of the same length will almost certainly have different hashes.
- Austin ''Danger'' Powers over 9 years
  
  @MattNordhoff exactly. If an MD5 checksum isn't generated based on file data, then what is it based on?
- user over 9 years
  
  @MattNordhoff Where "almost certainly" = 2^(n/2) where n is the number of bits in the output hash value. Birthday attacks.
- el.pescado - нет войне over 9 years
  
  Sometimes hashes are published on first-party server whereas actual downloads are hosted on third-party mirrors and/or CDNs.
- that other guy over 9 years
  
  It's said that encryption is all about leverage -- instead of hiding the entire file, you can just hide a tiny key. Cryptographic hashing is the same way -- instead of verifying the entire file, you can just verify a tiny key.
Wayne Jhukie over 9 years

"MD5 checksum of MD5 checksums" is known as a Merkle Tree.
David Spillett over 9 years

+1. They are primarily used to protect against accidental corruption (network transfer errors, bad sectors on disk, and so on). To protect against malicious corruption the checksum needs to come form a trusted unconnected location. The same with PGP/GPG/similar signed messages: they only completely assure the content if you trust where you obtained the public key from.
David Spillett over 9 years

Regarding gpg: remember that this has similar problems if you don't entirely trust the public key hasn't been replaced by a compromised one and the content signed with the private key corresponding to that.
ratchet freak over 9 years

MD5 is in fact blatantly broken against malicious changes assuming the original file was prepared for it.
atk over 9 years

You might also want to add to your answer that difital signatures address this limitation (assuming that you trust the certificate/certifying authority)
supercat over 9 years

Hashes can protect against malicious alteration if one knows via some independent source what the expected hash of a trustworthy version of a file should be. The value of having the web site list the hash values of its files doesn't lie in letting people who download files from a site check the hash of the downloaded file against the same site, but rather in letting people who know from some other source the hash of the file they want, know whether the file in question will match it before they download it. BTW, one thing I'd like to see...
supercat over 9 years

...would be a form of URL/URI that included an expected hash value (probably SHA rather than MD5), and would specify that a browser should only accept a file if the hash matches what's specified. In cases where the same large file will need to be accessed by many people, giving all of those people a URL via https:// but having them download the file from a proxy could be more efficient than having them all use https:// directly from the source.
cpast over 9 years

@supercat That's what I meant by "keep people from messing with the checksum" -- something must be securely transferred, and if that's the checksum then the checksum can help protect against maliciously tampering with the file.
cpast over 9 years

It's even worse than this -- if someone can tamper with your traffic to/from the server, then even if the server isn't compromised they can modify both file and checksum that you receive.
supercat over 9 years

An MD5 checksum transmitted via some path other than a file itself would provide protection against tampering unless the file was deliberately created to facilitate such tampering. By contrast, something like CRC32 would provide almost no protection against tampering even if the original source of the file was trustworthy and the CRC32 was delivered securely.
Giacomo1968 over 9 years

@pjc50 Thanks! Edited the answer to reference that.
Charles Duffy over 9 years

It only takes one person who checks the certificate chain to cause a scandal. If you think the NSA would use a government CA key for MITM rather than using one stolen from a privately held or -- even better -- foreign certificate authority (thus providing plausible deniability), I have a bridge to sell you.
Mark K Cowan over 9 years

I was suggesting the possibility that it could be use to target a MITM at a particular user. As to whether it's actually likely, that's for the tin-foil people to debate on
Charles Duffy over 9 years

I'm not questioning whether a targeted MITM is likely. I'm questioning whether being careless enough to use a readily traced and attributed CA key to perform it is likely. Particularly for a sufficiently high-value target, outbound 'net traffic is liable to be recorded in enough detail to include metadata up to and including the public part of the SSL handshake, so even if the user doesn't look, their security staff or automated infrastructure might do so in retrospective analysis.
Daniel B over 9 years

@cpast Yeah, so what? Like I said, MD5 (by itself) isn’t about security at all.
cpast over 9 years

@DanielB So it doesn't guarantee that you have the same file as the server.
cpast over 9 years

To expand: If it did guarantee that you had the same file as the server had, it would be a legitimate security measure, because it would mean you don't have to trust the network. That's exactly what the MACs in TLS do -- prove that what you got is what the server sent, but TLS can't do anything about a compromised server either. If a good hash is transmitted over a trusted connection, it can provide security (which is derived from the trusted connection); if it's sent over the same connection as the file, then it's useless because it's no more tamper-resistant than the file itself was.
WHO's NoToOldRx4CovidIsMurder over 9 years

This is wrong. Sometimes the checksums are provided securely, but the download is not. Since MD5 is broken, the security MD5 checksums provide are weaker than more secure checksums, but before MD5 was broken, a securely provided MD5 (e.g. one that was signed or sent of HTTPs) that matched the MD5 of the download was strong evidence that the download received was the one the server was making available. I'll add an answer with more detail below now.
Breakthrough over 9 years

I like this answer because it highlights a fundamental part of a checksum - it's simply one metric, of many, to check the validity of a file's contents. If the network itself is untrusted, it's not that unfeasible to imagine one replacing the MD5-hashes and patching binaries on the fly (as we've already seen on some Tor exit nodes)... Of course then, MD5 provides no protection against deliberately modified files because you're already placing your trust in the provider of said files to begin with.
Cornstalks over 9 years

MD5 isn't totally broken: the attack on it is a collision attack, not a preimage attach (which would be much, much worse). If the MD5 is provided securely and an attacker can't modify it, then an attacker can't use a collision attack (and must use a preimage attack), which means MD5 is still pretty secure for that purpose. MD5 is worth being phased out because of its collision vulnerability, but it doesn't have a (known) preimage vulnerability so it's not totally broken. Just half broken.
Raestloz over 9 years

@MatthewElvey MD5 guarantees you get the correct file, it does not guarantee that you get a safe file. If the file itself is malicious from the very beginning, you're screwed, because MD5 does not tell you anything about the file itself
matpop over 9 years

@Raestloz Your observation goes beyond the actual question. If ever a well known and trusted organization starts distributing malicious software (and we all know it never happened, don't we? :) then it would (should) soon become untrusted!
matpop over 9 years

+1! But... Is a signed hash really just as secure (trustable) as an unsigned hash fetched over https (ssl/tls)? I think it's still preferable that the hash itself is signed anyway...
Raestloz over 9 years

@matpop actually, I was addressing the question. Read carefully: the question posits a "what if" situation where a maliciously modified program is posted along with its (already malicious) MD5. MD5 is a security measure of "this is correct", not "this is safe", therefore in this case MD5 is worthless ("what is the value?" the title says), unless during the transmission somebody altered it to include even more malicious software, but you're already screwed anyway. Thus I responded to Matthew, who says that this answer is wrong (it's not)
matpop over 9 years

@Raestloz In the first place you wrote: "If the file itself is malicious from the very beginning"... Sorry, but to me those words do not mean the same thing as a deliberately ALTERED file. The OP knows that hashes are not part of any antivirus system and can't be used for malware detection. That said, MatthewElvey is right, this answer is somewhat radical and incomplete, as it misses an important exception: if you can verify a digital signature of the hash sum, then the hash can also be used to demonstrate that the downloaded file is UNALTERED (afaik though, HTTPS itself is insufficient).
Raestloz over 9 years

@matpop sorry, but my words cannot be any more exact. If a file has been deliberately altered (say, an innocent exe that has been altered to execute malicious code) AND uploaded as a new entry, the malicious file is a separate entity from the original, innocent file and is therefore malicious from the very beginning. This is different from an innocent file that gets intercepted and altered in transit between server and downloading client.
Raestloz over 9 years

@matpop OP is asking "what good is MD5 if it isn't generated from the original, unaltered file to begin with?", the answer is "worthless". This answer highlights that MD5 can only tell you that you and the server have the exact same file, and nothing more. It doesn't inherently carry any security benefit. The security "benefit" of knowing the file has been tampered with is a byproduct. In the meantime, perhaps we should move this to chat?
matpop over 9 years

@Raestloz Thanks for your reply :) We clearly disagree on terms to be used but almost think the same. Though the "security benefit" of hashing can be considered a "byproduct", IMO it's important to mention that hashing is actually a fundamental part of the process of authentication (you may want to have a little look at my brief answer to get what I mean). Still our comments add something so let's not move to chat for now.
matpop over 9 years

@Raestloz You may admit however that this answer doesn't provide any "explanation and context" and there are better answers here that got far fewer upvotes.
matpop over 9 years

@Raestloz Not to have the last word, seriously. Don't hate me, allow me one last comment, too. 1. It seems you keep putting authentication and "malware detection" on the same level; the first one is really possible with (signed) hash sums! 2. One does not simply (!) say that the only aim of hashes is to guarantee that you have the same file as the server; the fact that hashes are most of the times provided without signature does not make such statement valid in general. It's not "really as simple as that". Cheers. NRN
Daniel B over 9 years

Well, I guess you’ll all be delighted to know my answer was specifically about checksums next to download links (or in a .*sum file), which I believe is what the question is about. It’s certainly not about Authenticode and the like. ;)
Number945 almost 5 years

@DanielB , If I know md5checksum of say , a firefox download , then can I download that file from untrusted sources (instead of official website) and be assured that if checksum of this file matches the checksum of same file on firefox official website , then I don't need to worry. I am assuming firefox website is not hacked.
Daniel B almost 5 years

@BreakingBenjamin Generally, yes. MD5 and SHA1 however are not suitable for this anymore.