How can it be impossible to "decrypt" an MD5 hash?

22,061

Solution 1

Basically it's because the output of MD5 contains less information than the input. This is basically what distinguishes a hash algorithm from an encryption algorithm.

Here's a simple example: imagine an algorithm to compute the hash of a 10-digit number. The algorithm is "return the last 2 digits." If I take the hash of 8023798734, I get 34, but if all you had is the 34, you would have no way to tell what the original number is because the hashing algorithm discarded 8 digits worth of information. It's similar with MD5, except that the hash is computed via a complex procedure instead of just chopping off part of the data.

So then how can one hash be better than another? For one thing, different hash algorithms can be more or less resistant to collisions (when two inputs produce the same output). The probability of a collision is inversely related to the number of possible hash outputs. Collisions are an undesirable feature of hashes because if your data changes, you want the hash to change too, so one way to get a better hash algorithm is to use a hash with more possible outputs. In the digits example above, taking the last 4 digits instead of the last 2 digits reduces the probability of a collision with a given hash (technically called a preimage) to 1 in 10000 instead of 1 in 100, so it's more likely that all the 10-digit numbers in whatever set you have will have different hash values.

There's also the issue of cryptographic security. When you want to use a hash to make sure that some data is not tampered with, it's desirable that whoever's doing the tampering can't predict what inputs will produce a given output. If they could, they would be able to alter the input data in such a way that the output (the hash) remains the same. Going back to the digits example again, let's say I'm going to email you the number 1879483129 and it is critically important that this number gets to you unaltered. I might call you up and tell you the hash of the number, which would be 29, but since the "last 2 digits" algorithm is not cryptographically secure, a nefarious hacker could change the number en route to, say, 5555555529 and you wouldn't know the difference.

It's been shown that MD5 is not cryptographically secure (and SHA-1 is also compromised). That means that it is possible to find different inputs which correspond to any given output. It's still a fine algorithm for protecting against random bit flips and the like, but if there's a chance someone might want to intentionally corrupt your data, you should really use something more secure, like SHA-256 or greater, probably as part of an HMAC scheme.

Solution 2

I just can't understand how you convert something to one thing using some algorithm, and there being no way to convert it back using the algorithm in reverse.

You can turn a cow into hamburger, but you cannot turn hamburger into a cow.

The transformation reduces data that exists by destroying it, and that data cannot be recovered.

Solution 3

Here's a parallel:

Add up the ages of everyone in your family. Only keep the last two digits.

Now tell me everyone's ages based on that one number.

Solution 4

Think about this:

I have a numeric string, say it's "12345678".

I have a hash algorithm, it just returns the sum of all the single numbers, let's call it f()

so, f("12345678") = 1 + 2+ .. + 8 = 36.

Then the question:

known f(x) = 36, is it possible to get the original value of x ?

We can't, because f() is an algorithm causes information loss.

The MD5 is a hash algorithm like f(), but much more complexer.

Solution 5

Hmm, not to be rude, but it seems to me that all the answers about "less information coming out than going in" miss the point.

The main use of MD5, and similar cryptographic hash codes, is to encrypt passwords. In that case, I don't care whether it's possible to reconstruct the original string. All I care is whether I can construct any string that would hash to the same value.

Take a simplified example: Suppose our hash algorithm was "take the last two digits". So if my password is "12345678", the hash code is "78". Is there any way to go from "78" back to "12345678"? No. But if I'm hacking passwords, I don't care if I know what your original password was. I just want a password to let me in. So if I knew this was the algorithm, I'd say great, I'll use the password "99978". It hashes to "78", so the password validation algorithm will pass it, and I'm in.

Obviously MD5 is much more difficult to reverse, even in this "anything that will hash to the right value" sense, then a simplistic algorithm like "take the last two digits". But is it literally impossible? That puzzles me, too. So sure, information is discarded along the way. But couldn't I reverse to to an "any" value by filling in any random value at any point where information is discarded? I haven't looked at the actual algorithm for MD5. I presume it's not something easy to reverse, like change all the pluses to minues or something trivial like that, or somebody would have done it a long time ago. From the fact that there are millions of hackers out there who have tried to crack this, even if it is theoretically possible, it must be incredibly difficult.

Share:
22,061

Related videos on Youtube

Rob
Author by

Rob

Wannabe programmer

Updated on December 22, 2020

Comments

  • Rob
    Rob over 3 years

    Possible Duplicate:
    How come MD5 hash values are not reversible?

    I was reading a question about MD5, and it made me remember something that boggles me. Very simple question, and I'm sorry if it's not a good one. I just can't understand how you convert something to one thing using some algorithm, and there being no way to convert it back using the algorithm in reverse.

    So how is this possible?

    Also, since multiple strings can create the same MD5 hash, due to it being less data than the input string, how would any other hashing system be any better?

  • Sebastian Paaske Tørholm
    Sebastian Paaske Tørholm about 14 years
    Technically, it might be the case that there for some hashes may be only a finite amount of different plaintexts that hash to it, may it not?
  • Rob
    Rob about 14 years
    Edited my original post, please read the new section at the bottom.
  • ig0774
    ig0774 about 14 years
    @Sebastian P: Not really. Given an (theoretically infinite) set of inputs, MD5 should generate a (theoretically infinite) set of outputs. (It is theoretically impossible, given a non-1:1 hashing function and an infinite number of inputs, that any given hash aligns to a finite number of hashes; this is an implication of infinite sets, rather than an implication of MD5, i.e. there are a fixed number of valid MD5 sums, 2 ^ 128; given an infinite set of values with an MD5 sum, there are an infinite number of collisions for each hash).
  • David Z
    David Z about 14 years
    @ig0774: I guess technically it's possible to construct a weird hash algorithm that maps only a finite set of inputs to one output. In an extreme example, hash(x) = x<64 ? x : 63 for a 6-bit hash of non-negative integers x. But as far as hashing algorithms that are actually used in practice, I would be very surprised to find one that mapped a finite number of inputs to one output. (There are probably mathematical proofs about this for the major algorithms)
  • ig0774
    ig0774 about 14 years
    @David: I guess you are correct about hash algorithms in general (and such an algorithm might meet some technical need or another). My point was about MD5 in particular.
  • ig0774
    ig0774 about 14 years
    +1 for the most inspired analogy I've seen. Mmm... hash functions and hamburgers.
  • Remus Rusanu
    Remus Rusanu about 14 years
    Of course you can turn hamburgers into cows, just as you can turn scrambled eggs into chicken. You feed hamburgers to a calf and grow it into a full cow. It's inefficient, you waste a lot of tasty hamburger in the process, but it works.
  • Devanshu Mevada
    Devanshu Mevada about 14 years
    Even Google can help: md5crack.com
  • Nathan Ernst
    Nathan Ernst about 14 years
    @Remus, and then you run a non-insignificant chance of mad cow disease. @Rob, would potatoes into hashbrowns be less offensive? Myself, I like cows, too. They taste great. +1 to @tangurena for initial analogy!
  • David Z
    David Z about 14 years
    The original question was about the existence of algorithms for which it's not possible to recover the original input given the output, and MD5 definitely qualifies as one. As for your idea of simply finding any input that will hash to a given output, it's definitely possible if you want to build a lookup table, but that takes massive amounts of computing power. I don't think you could just run the algorithm in reverse, filling in random data as necessary, because MD5 scrambles bits at every stage. (I used to know the algorithm but I've long forgotten it)
  • Rob
    Rob about 14 years
    Its not offensive, I just really like cows =[ Not a vegetarian or anything, you're right, they DO taste great. But still =[
  • Sebastian Paaske Tørholm
    Sebastian Paaske Tørholm about 14 years
    @ig0774: There are a fixed number of MD5 hashes, yes, and an infinite number of texts that can be hashed by the algorithm, yes. I am just curious as to why you say that each of these 2^128 hashes is guaranteed to have an infinite number of plaintexts that hash to that hash; that fact is not obvious to me. It is clear that it would be a desirable property that most likely is fulfilled in MD5, but I am curious about wherein the proof of the statement lies.
  • Jay
    Jay about 14 years
    I agree these answerws are technically correct, I was just trying to point out that they are not really relevant to the real problem of protecting passwords. To take it to a ridiculous extreme, an algorithm that threw out ALL input data and mapped everything to the string "foobar" would make it impossible to have any clue what the original input was, but it would be pretty useless as a password-encrypting algorithm!
  • Jay
    Jay about 14 years
    Anyway, as I say, I haven't looked at the MD5 algorithm. No matter how it scrambles bits, I would think that, IN PRINCIPLE, you could always fill in any discarded bits with, say, constant zeros. I presume, though, that the nature of the function is such that you can't simply reverse each step. An algorithm that said multiply by 2 and add 5 could easily be reversed by subtracting 5 and dividing by 2. But more complex manipulations might be difficult to reverse step-by-step like that. Someday I'll have to actually study the algorithm. When I'm sitting around with nothing to do.
  • Thomas Pornin
    Thomas Pornin about 14 years
    Actually CRC-32 is not that fast; on some systems (e.g. ARM), some hash functions (MD4, Panama...) turn out to be faster than CRC-32. CRC-32 is very fast in hardware, as a circuit, but when implemented in software, one must use look-up tables and many RISC processors are rather uncomfortable with tables.
  • Randolpho
    Randolpho about 14 years
    And just like hamburgers, hashes frequently require salt.
  • Peter Štibraný
    Peter Štibraný over 13 years
    Cryptographic hash functions must have several properties: 1) it is easy to compute the hash value for any given message, 2) it is infeasible to find a message that has a given hash, 3) it is infeasible to modify a message without hash being changed, 4) it is infeasible to find two different messages with the same hash. (source: en.wikipedia.org/wiki/Cryptographic_hash_function) There are other hash functions, which don't have these properties, and subsequently, they are not used in cryptography.
  • Peter Štibraný
    Peter Štibraný over 13 years
    In other words, it's not "literally impossible" (and some people are trying), but it's not trivial either. Check out that wikipedia page, and you will find that some crypto hash functions that were in use previously, were successfully attacked (e.g. there are ways to find another message with same hash), and therefore are not used anymore. (Look for "Collision attacks" and "Preimage attacks").
  • Michael
    Michael over 13 years
    Surely you could use the DNA avaiable in the hamburger to recreate your cow? The problem is, once it starts growing the environmental conditions (how it's fed and raised etc) may cause it to have physical differences to your original cow - but it would be genetically identical.
  • Kalle Richter
    Kalle Richter over 7 years
    You know that f(x_1x_2_x_3...)=1+2+3+... because you know how MD5 works from - let's say Wikipedia. That's not an explanation!
  • Colin Niu
    Colin Niu over 7 years
    @KarlRichter You are right, I should mention this: "The MD5 is a hash algorithm like f(), but much more complexer."
  • Mateus Viccari
    Mateus Viccari about 7 years
    Someone would say it's age.