Can two different firmware files have same md5 sum?

24,770

Solution 1

What the existing answers fail to point out is why a collision is deemed to be vanishingly unlikely in this case.

MD5, like any hashing algorithm, was deliberately designed so that a collision won't happen if you just change a handful of characters. You have to change most if not all of them in order to cycle back around to the same hash. That's because the whole point of a hash is to detect single-bit (or few-bit) errors; in this problem domain, you want the smaller changes to definitely trigger a hash change. Flawed though we now know MD5 to be, that property holds to this day.

So, unless the new version of firmware is completely different, and unless you just witnessed a once-in-a-million phenomenon, the odds are huge that you simply received the old version again. Congratulations, because this is the hash-check process working precisely as intended. :)

Solution 2

Of course. MD5's collision vulnerability is well known (see Crypto.SE, Wikipedia). Though it is still a low-probability event, it can be done. However, in your case, I'd suspect a mistake in copying the files.

Solution 3

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value.

There is one caveat, though: An md5 sum is 128 bits (16 bytes). Since the number of different possible file contents is infinite, and the number of different possible md5 sums is finite, there is a possibility (though small probability in most cases) of collision of hashes. In other words, two different files can produce the same sum when hashed with md5.

Because of this, it's better in some cases to use a higher bit hash (more possible different outputs), to reduce the (already low) probability of an accidental hash collision, and increase the difficulty of creating a deliberate hash collision through brute force.

Examples of higher bit hashes include the SHA-2 family of hashes, especially sha256, sha384, or sha512 (which is the best.) The number after sha indicates the number of bits the corresponding hash algorithm generates.

Solution 4

All of the above answers ignore the most important detail:

An MD5 checksum is defined to have 128 bits. That means, there are only 2^128 different MD5 values. How many different firmware images are possible? Well, that depends on how big they are, and it depends on what percentage of random byte sequences could be considered valid firmware. Chances are though, there are more than 2^128 possible firmware images.

A lot more, which means there must be duplicates.

But, the chance of any given firmware image matching a given MD5 checksum is only 1 in 2^128 which is a very small number.

VERY small.

Like, the chance of any two developers accidentally creating different images that have the same MD5 checksum at any time during the existence of human civilization is too small for you to worry about.

That's accidentally. Deliberately is a different question. If you're working for the NSA, then 128 bits is not going to be enough bits of security to satisfy your bosses, and MD5 has known vulnerabilities that make it weaker than 128 bits.

But if you were working for the NSA, then you probably already knew that.

Share:
24,770

Related videos on Youtube

minto
Author by

minto

Updated on September 18, 2022

Comments

  • minto
    minto almost 2 years

    Can two different binary files have the same md5 sum? One binary firmware file have different version number and marked as revised, small bug fixed. But both files have same md5 sum, I would assume that revised file can't have the same md5 sum - can this be a mistake?

  • minto
    minto over 7 years
    If one firmware file was revised, and there was some changes done, then md5 sum can't be the same in any case. I used diff -q <file1> <file2> to compare files, and it show the files are identical.
  • ski
    ski over 7 years
    "Two files can have the same md5 hash only if their contents are exactly the same, even a single bit of variation would generate a completely different hash value." – This is wrong. Because of the Pigeonhole Principle, there are in fact an infinite number of files which have the same hash.
  • heemayl
    heemayl over 7 years
    @JörgWMittag Check the caveat paragraph right after :)
  • muru
    muru over 7 years
    @minto assuming you didn't make a mistake in downloading the files, it must be a mistake at the server end (or they silently replaced the buggy firmware version as well)
  • Rmano
    Rmano over 7 years
    @mint you can check if the two files are the same or not with diff file1 file2. If the command is silent, then they are the same.
  • Wildcard
    Wildcard over 7 years
    heemayl, @JörgWMittag is right. I've suggested an edit which improves this already very good answer to be precisely accurate and no longer subject to nitpicks. ;) (I hope.) :)
  • heemayl
    heemayl over 7 years
    @Wildcard Approved, hope so. Although i thought the second paragraph covered that, but i like you wording better. Thanks.
  • goo
    goo over 7 years
    For binary (non-text) files, use cmp rather than diff.
  • Milan
    Milan over 7 years
    "I'd suspect a mistake in copying the files" - The chances of this happening by accident are basically 0. If two files have the same md5, either they are the same file, or one are intentionally (maliciously) altered to have the same MD5 as the other.
  • muru
    muru over 7 years
    @BlueRaja-DannyPflughoeft I disagree. I have downloaded files and accidentally saved the same file to different filenames (when trying to download them again, because the original download didn't work out for some reason). Human error is also entirely possible on the server side.
  • SQB
    SQB over 7 years
    Perhaps the fix didn't alter these files, so they remained the same between versions. This, of course, supposes there's a set of files to be downloaded. Alternatively, "they" may have patched the previous version as well, because of the bug's severity.
  • David Foerster
    David Foerster about 7 years
    "once-in-a-million" is an incredible understatement. 2^128 is about 10^38.5, so it's closer to one-in-a-duodecillion, an event so unlikely that it's not worth to consider it within the (projected) life time of our solar system.
  • Lightness Races in Orbit
    Lightness Races in Orbit about 7 years
    @DavidFoerster: "One-in-a-million" is an English idiom, meaning "very rare". It is not intended to be mathematically accurate.
  • David Foerster
    David Foerster about 7 years
    I'm aware of that but I wanted to highlight that the chance is, in fact, 32 orders of magnitudes lower than the proverbial one-millionth which matters to illustrate the situation.
  • WinEunuuchs2Unix
    WinEunuuchs2Unix over 6 years
    +1 =9, 1 more and you get a trinket at 10 :)
  • Lightness Races in Orbit
    Lightness Races in Orbit over 6 years
    MD5's collision vulnerability isn't particularly relevant; this has more to do with the fundamental fact that MD5, by design and like all hashes, has collisions. The vulnerability only plays into it when you suspect someone has been deliberately crafting collisions, but there is no reason to think that's in play here.
  • Lightness Races in Orbit
    Lightness Races in Orbit over 6 years
    Also I think you misunderstood BlueRaja's comment; that is exactly what he or she was agreeing with.
  • muru
    muru over 6 years
    @LightnessRacesinOrbit now that I read it again, yes I did misinterpret "this" in that comment.
  • Lightness Races in Orbit
    Lightness Races in Orbit over 6 years
    @muru: In your defence, I also read it backwards the first time :) It was a little cryptically worded in that sense.