Java hashcode reverse calculation

12,475

Solution 1

Hash functions are not reversible functions. As proof, consider the Pigeonhole principle, you can only store values in the integer range in a hashCode but there are an infinite number of Strings. Therefore, there must be multiple strings with the same hashCode (and there are).

Solution 2

Multiplying a big number by 31 leads to integer overflow, which cannot be reversed using division by 31.

But hold on though: arithmetic using int in Java is equivalent to arithmetic modulo 232. Since 31 is odd it has a multiplicative inverse modulo 232. Therefore multiplication by 31 can be reversed with multiplication by the said "inverse." That is:

int inverse = -1108378657;
// This will print the hashCode of mlkjihgfedcb
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') * inverse); 

Solution 3

The specific reason this fails is

1687846330 * 31 + 97 = 52323236327

The number 52,323,236,327 is much larger than what an int can hold, so information is lost off the left-hand end. That information cannot be recovered by inverting the hashCode() algorithm.

If you do the arithmetic you'll see that the hash code you got for the second string is exactly 52323236327 mod 231

More important, the mapping from objects to hash codes is intended to be opaque and non-reversible. Just because it's implemented in a specific way today does not mean the next version of the library has to do it the same way. The only guarantee is that the possibility of collision is extremely low (but non-zero).

Share:
12,475

Related videos on Youtube

peter
Author by

peter

Updated on September 16, 2022

Comments

  • peter
    peter over 1 year

    I am trying to test out how String hashcode works by adding a new character and do the reverse to subtract it out, however it didn't give me the right answer when i did the calculation.

    For example

    int PRIME = 31;
    
    //this will print 1687846330 
    System.out.println("mlkjihgfedcb".hashCode());   
    
    //this will print 783628775 which equals to "mlkjihgfedcba".hashCode();
    System.out.println(("mlkjihgfedcb".hashCode() * PRIME + (int) 'a')); 
    
    //this will print 25278344 which doesn't equals to "mlkjihgfedcb".hashCode()
    System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') / PRIME); 
    

    I am wondering did i do the math right on my last step above ? overflow shouldn't matter for the calculation right ?

    • BillRobertson42
      BillRobertson42 over 10 years
      It's not reversible. e.g. stackoverflow.com/questions/9406775/…
    • Amir Afghani
      Amir Afghani over 10 years
      this is wrong on so many levels
    • BillRobertson42
      BillRobertson42 over 10 years
      If two different strings can produce the same result, you could not know which one of those strings produced that hash code, so you could not reverse the calculation.
  • peter
    peter over 10 years
    how do we get or calculate the inverse number ?
  • Joni
    Joni over 10 years
    You can use the extended Euclidean algorithm, or use Hensel's lemma. The latter is easiest to implement: first set x = a = 31 and then repeat x = x*(2-a*x) until it converges, up to five times.
  • supercat
    supercat over 10 years
    I don't know any way the behavior of the string's hash function could change without breaking any switch statements that use a string argument, since the compiled code includes hardcoded hash values for the strings.
  • Joni
    Joni over 5 years
    But you don't have to calculate it, it's -1108378657