How can i generate a long hash of a String?

20,586

Solution 1

long has 64 bits. A String of length 9 has 72 bits. from pigeon hole principle - you cannot get a unique hashing for 9 chars long strings to a long.

If you still want a long hash: You can just take two standard [different!] hash functions for String->int, hash1() and hash2() and calculate: hash(s) = 2^32* hash1(s) + hash2(s)

Solution 2

This code will calculate pretty good hash:

String s = "some string";
long hash = UUID.nameUUIDFromBytes(s.getBytes()).getMostSignificantBits();

Solution 3

Why don't you have a look a the hashcode() function of String, and just adopt it to using long values instead?

Btw. if there was a way to create a unique ID for each String, then you would have found a compression algorithm that would be able to pack every String into 8 bytes (not possible by definition).

Solution 4

There are many answers, try the following:

Or, as suggested before, check out the sources.

PS. One more technique is to maintain a dictionary of strings: since you're unlikely to get 264 strings any time soon, you can have perfect mapping. Note though that that mapping may as well become a major bottleneck.

Share:
20,586
Riduidel
Author by

Riduidel

Big StackExchange fan. If I mainly lurk on StackOverflow, I can also visit SuperUser, SciFi, Ubuntu, Lego or Role playing

Updated on July 09, 2022

Comments

  • Riduidel
    Riduidel almost 2 years

    I have a java applciation in which I want to generate long ids for strings (in order to store those strings in neo4j). In order to avoid data duplication, I would like to generate an id for each string stored in a long integer, which should be unique for each string. How can I do that ?

  • Sumit A
    Sumit A about 6 years
    What could be the percentage of collision?
  • Ran
    Ran about 6 years
    it use md5, I am not sure about the percentage
  • Repoker
    Repoker almost 5 years
    Logged in just to upvote this. Maybe not the most correct, but practical
  • Olivier Giniaux
    Olivier Giniaux almost 2 years
    A character in java is UTC-16 thus a String of length 9 is 144 bits. Also, using two hash functions make so sense. What you want is simply a hashing algorithm that can operate on data of variable length such as MD5 for instance. For sure there will be collisions, but it will remain close to as minimal as it can be with 64 bits of cardinality.