Does an MD5 match between a torrent file and an original file from a provider mean the torrent file is safe?

5,747

Solution 1

if the MD5 I download matches the MD5 of the distrubition's website, does that mean the file has not been tampered with?

Probably hasn't been tampered with. MD5 does have known attacks on it, so it's not as good as something like SHA-1. For something like verifying if you have the correct file, MD5 is ok.

Where MD5 is sort of not ok is if you use as stored password hashes. Websites and such typically don't want to store your password, but a hash of it - and then compare the password you enter to login with a hash. In this way, they don't know your password. However, if a hacker gets a list of accounts with these hashes, they may be able to generate a password that, when put through MD5, generates the same hash (this is called a hash collision). I'm not sure how any of this works, really, just that it with MD5 hash collisions are possible and easier than they've been in the past.

Bittorrent uses SHA-1 to "know" what file it's downloading and whether pieces that peers trade are good or not. So that a malicious peer could inject bad data in the swarm is unlikely - even if one peer knew how to break SHA-1, most other peers would have to cooperate as well. You can be relatively confident that no one on the Bittorrent swarm can corrupt the download, assuming you have the correct .torrent file and it hasn't been maliciously modified to point to a different tracker or the hashes in it modified.

Solution 2

Yes it is likely okay.

I use md5 a lot in my everyday sysadmin work so I did a lot of research on security issues. Found some great links on it and also summed up security concerns and uses of hashes in a blog post

The gist of it: three main things to consider:

1) It is possible for two different files to have the same hash. In their findings, both files need to be created by the attacker i.e. the creator of the files.

2) Also the hash cannot be targeted. You cannot find a hash and then design a file to match that hash. Instead, the creator has to create two different files of identical hashes, and the actual hash values themselves cannot be "chosen" ahead of time

3) Lastly, there are a finite number of hashes, and so by pure dumb luck it's possible two random files will have the same hash (called a "collision"). One of the strengths of a good hashing algorithm is to avoid collisions.

Taking the above into account: So if you use the hash from the vendor, you know that is the actual hash you can check against with a high percentage of confidence

Here are some references for you:

A fantastic article about the security of hashing is right here: http://blog.codinghorror.com/speed-hashing/

Very in-depth article about security vulneability of hashing (and even shows an example of creating two different files with identical hashes) http://www.win.tue.nl/hashclash/SoftIntCodeSign/

My own collection of info on security and use of hash http://geekswing.com/geek/the-magic-of-hash-and-i-mean-of-the-md5-and-sha-1-vintage/

Share:
5,747

Related videos on Youtube

Ng Zhong Qin
Author by

Ng Zhong Qin

Updated on September 18, 2022

Comments

  • Ng Zhong Qin
    Ng Zhong Qin over 1 year

    I wanted to download a certain distro of Linux. However it is only using bittorent for download. I'm not to keen on bittorent. If I do use it, however, if the MD5 I download matches the MD5 of the distrubition's website, does that mean the file has not been tampered with? That it matches exactly what the provider has and, assuming the original is "safe," the torrent downloaded file is as well?

    • Frank Thomas
      Frank Thomas almost 10 years
      there is a very high probability that the file is exactly as it was when published. Other than that, no one can say, with as broken a hash algorithm as MD5 is these days. It is not likely that any one peer could have injected data in such a way that the overall hash is identical but that the data is differant (it would be almost impossible to find a collision that would do that). Safety aside, I would assume the downloaded version matches the original. I would also point out that Bittorrent isn't a bad thing. some bad files are distributed that way but....
    • ganesh
      ganesh almost 10 years
      Within reason: Yes, it would be safe. MD5 is not 100% thus using SHA-1 or something more modern would be better, but it would be very hard to tamped with an iso in such a way that you could make it unsafe. 2) As nitpick: Just checking MD5 does not means it is safe. I can create a virus, torrent it and you the correct checksums. ;)
  • Ajedi32
    Ajedi32 almost 10 years
    I was under the impression that MD5 is considered safe for guarding against corrupted files, but not against files that have been intentionally tampered with.
  • LawrenceC
    LawrenceC almost 10 years
    It's best to prefer SHA-1 or better.
  • Alexis Beingessner
    Alexis Beingessner almost 10 years
    Would the other peers have to cooperate, though? I'm not certain, but I believe that you don't download a particular block from multiple peers, you request multiple entire blocks from each peer (having trouble checking this, though). The torrent protocol doesn't include a hash of the whole file, just hashes of the individual blocks, and potentially hashes of those hashes. If an attacker did somehow break SHA-1 (which would be an exceptional feat), they could send tampered blocks all day long and no one would notice or complain.
  • LawrenceC
    LawrenceC almost 10 years
    It's true, I looked into it and the .torrent file doesn't have a hash of the entire file. So yes, I guess peers don't have to cooperate if they have broken SHA-1. Don't know why I thought it did have a hash of the whole file. I think it should ...Though I don't know, hashing something like 100GBytes of data would take too long.
  • Alexis Beingessner
    Alexis Beingessner almost 10 years
    I was surprised too. I had thought that there was actually an entire tree of hashes down from the individual blocks up to the whole file, but it turns out that if there is any tree involved, it's just a merkle tree, which is just hashes of the hashes. I had also thought that magnet links use the file's whole hash as the URI, but it turns out that it's just another hash of the relevant torrent's hashes. I'm disappointed, I had thought torrents had more robust checksumming that would require creating a whole hierarchy of hash collisions -- infeasible even for MD5. It's "only" as secure as SHA-1.
  • ComFreek
    ComFreek almost 10 years
    "Collisions are not an issue for password hashing", see this post (esp. the section titled "Collisions and MD5").
  • LawrenceC
    LawrenceC almost 10 years
    Maybe it's rainbow tables I'm thinking of.