Java hashcode reverse calculation
Solution 1
Hash functions are not reversible functions. As proof, consider the Pigeonhole principle, you can only store values in the integer range in a hashCode but there are an infinite number of Strings. Therefore, there must be multiple strings with the same hashCode (and there are).
Solution 2
Multiplying a big number by 31 leads to integer overflow, which cannot be reversed using division by 31.
But hold on though: arithmetic using int
in Java is equivalent to arithmetic modulo 232. Since 31 is odd it has a multiplicative inverse modulo 232. Therefore multiplication by 31 can be reversed with multiplication by the said "inverse." That is:
int inverse = -1108378657;
// This will print the hashCode of mlkjihgfedcb
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') * inverse);
Solution 3
The specific reason this fails is
1687846330 * 31 + 97 = 52323236327
The number 52,323,236,327 is much larger than what an int
can hold, so information is lost off the left-hand end. That information cannot be recovered by inverting the hashCode()
algorithm.
If you do the arithmetic you'll see that the hash code you got for the second string is exactly 52323236327 mod 231
More important, the mapping from objects to hash codes is intended to be opaque and non-reversible. Just because it's implemented in a specific way today does not mean the next version of the library has to do it the same way. The only guarantee is that the possibility of collision is extremely low (but non-zero).
Related videos on Youtube
peter
Updated on September 16, 2022Comments
-
peter over 1 year
I am trying to test out how
String
hashcode works by adding a new character and do the reverse to subtract it out, however it didn't give me the right answer when i did the calculation.For example
int PRIME = 31; //this will print 1687846330 System.out.println("mlkjihgfedcb".hashCode()); //this will print 783628775 which equals to "mlkjihgfedcba".hashCode(); System.out.println(("mlkjihgfedcb".hashCode() * PRIME + (int) 'a')); //this will print 25278344 which doesn't equals to "mlkjihgfedcb".hashCode() System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') / PRIME);
I am wondering did i do the math right on my last step above ? overflow shouldn't matter for the calculation right ?
-
BillRobertson42 over 10 yearsIt's not reversible. e.g. stackoverflow.com/questions/9406775/…
-
Amir Afghani over 10 yearsthis is wrong on so many levels
-
BillRobertson42 over 10 yearsIf two different strings can produce the same result, you could not know which one of those strings produced that hash code, so you could not reverse the calculation.
-
-
peter over 10 yearshow do we get or calculate the inverse number ?
-
Joni over 10 yearsYou can use the extended Euclidean algorithm, or use Hensel's lemma. The latter is easiest to implement: first set
x = a = 31
and then repeatx = x*(2-a*x)
until it converges, up to five times. -
supercat over 10 yearsI don't know any way the behavior of the string's hash function could change without breaking any
switch
statements that use a string argument, since the compiled code includes hardcoded hash values for the strings. -
Joni over 5 yearsBut you don't have to calculate it, it's -1108378657