How can i generate a long hash of a String?
Solution 1
long
has 64 bits. A String
of length 9 has 72 bits. from pigeon hole principle - you cannot get a unique hashing for 9 chars long strings to a long
.
If you still want a long
hash: You can just take two standard [different!] hash functions for String->int
, hash1()
and hash2()
and calculate: hash(s) = 2^32* hash1(s) + hash2(s)
Solution 2
This code will calculate pretty good hash:
String s = "some string";
long hash = UUID.nameUUIDFromBytes(s.getBytes()).getMostSignificantBits();
Solution 3
Why don't you have a look a the hashcode()
function of String, and just adopt it to using long values instead?
Btw. if there was a way to create a unique ID for each String, then you would have found a compression algorithm that would be able to pack every String into 8 bytes (not possible by definition).
Solution 4
There are many answers, try the following:
-
http://stackoverflow.com/questions/415953/generate-md5-hash-in-javaEDIT: removed, I've missed thelong
requirement. Mea culpa. - http://en.wikipedia.org/wiki/Perfect_hash_function
Or, as suggested before, check out the sources.
PS. One more technique is to maintain a dictionary of strings: since you're unlikely to get 264 strings any time soon, you can have perfect mapping. Note though that that mapping may as well become a major bottleneck.
Riduidel
Big StackExchange fan. If I mainly lurk on StackOverflow, I can also visit SuperUser, SciFi, Ubuntu, Lego or Role playing
Updated on July 09, 2022Comments
-
Riduidel almost 2 years
I have a java applciation in which I want to generate
long
ids for strings (in order to store those strings in neo4j). In order to avoid data duplication, I would like to generate an id for each string stored in along
integer, which should be unique for each string. How can I do that ? -
Sumit A about 6 yearsWhat could be the percentage of collision?
-
Ran about 6 yearsit use md5, I am not sure about the percentage
-
Repoker almost 5 yearsLogged in just to upvote this. Maybe not the most correct, but practical
-
Olivier Giniaux almost 2 yearsA character in java is UTC-16 thus a
String
of length 9 is 144 bits. Also, using two hash functions make so sense. What you want is simply a hashing algorithm that can operate on data of variable length such as MD5 for instance. For sure there will be collisions, but it will remain close to as minimal as it can be with 64 bits of cardinality.