Creating a hash from several Java string objects

15,109

Solution 1

Here is the simple implementation using Objects class available from Java 7.

@Override
public int hashCode()
{
    return Objects.hash(this.variable1, this.variable2);
}

Solution 2

Definitely don't use plain addition due to its linearity properties, but you can modify your code just slightly to achieve very good dispersion.

public String hash(String[] values) {
  long result = 17;
  for (String v:values) result = 37*result + v.hashCode();
  return String.valueOf(result);
}

Solution 3

It doesn't provide a 64 bit hash, but given the title of the question it's probably worth mentioning that since Java 1.7 there is java.util.Objects#hash(Object...).

Solution 4

You should watch out for creating weaknesses when combining methods. (The java hash function and your own). I did a little research on cascaded ciphers, and this is an example of it. (the addition might interfere with the internals of hashCode().

The internals of hashCode() look like this:

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }

so adding numbers together will cause the last characters of all strings in the array to just be added, which doesn't lower the randomness (this is already bad enough for a hash function).

If you want real pseudorandomness, take a look at the FNV hash algorithm. It is the fastest hash algorithm out there that is especially designed for use in HashMaps.

It goes like this:

    long hash = 0xCBF29CE484222325L;
    for(String s : strings)
    {
        hash ^= s.hashCode();
        hash *= 0x100000001B3L;
    }

^ This is not the actual implementation of FNV as it takes ints as input instead of bytes, but I think it works just as well.

Solution 5

First, hash code is typically numeric, e.g. int. Moreover your version of hash function create int and then makes its string representation that IMHO does not have any sense.

I'd improve your hash method as following:

public int hash(String[] values) {
    long result = 0;
   for (String v:values) {
        result = result * 31 + v.hashCode();
    }
    return result;
}

Take a look on hashCode() implemented in class java.lang.String

Share:
15,109
PNS
Author by

PNS

Updated on June 26, 2022

Comments

  • PNS
    PNS almost 2 years

    What would be the fastest and more robust (in terms of uniqueness) way for implementing a method like

    public abstract String hash(String[] values);
    

    The values[] array has 100 to 1,000 members, each of a which with few dozen characters, and the method needs to be run about 10,000 times/sec on a different values[] array each time.

    Should a long string be build using a StringBuilder buffer and then a hash method invoked on the buffer contents, or is it better to keep invoking the hash method for each string from values[]?

    Obviously a hash of at least 64 bits is needed (e.g., MD5) to avoid collisions, but is there anything simpler and faster that could be done, at the same quality?

    For example, what about

    public String hash(String[] values)
    {
        long result = 0;
    
        for (String v:values)
        {
            result += v.hashCode();
        }
    
        return String.valueOf(result);
    }