How to calculate the hash code of a string by hand?

35,160

Solution 1

Take a look at the source code of java.lang.String.

/**
 * Returns a hash code for this string. The hash code for a
 * <code>String</code> object is computed as
 * <blockquote><pre>
 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 * </pre></blockquote>
 * using <code>int</code> arithmetic, where <code>s[i]</code> is the
 * <i>i</i>th character of the string, <code>n</code> is the length of
 * the string, and <code>^</code> indicates exponentiation.
 * (The hash value of the empty string is zero.)
 *
 * @return  a hash code value for this object.
 */
public int hashCode() {
    int h = hash;
    int len = count;
    if (h == 0 && len > 0) {
        int off = offset;
        char val[] = value;
        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Solution 2

Most hash functions of this sort calculate the hash value modulo some large number (e.g. a large prime). This avoids overflows and keeps the range of values returned by the function within a specified range. But this also means an infinite range of input values will get a hash value from a finite set of possible values (i.e. [0,modulus)), hence the problem of hash collisions.

In this case, the code would look something like this:

   public int hash(String x){
        int hashcode=0;
        int MOD=10007;
        int shift=29;
        for(int i=0;i<x.length();i++){
            hashcode=((shift*hashcode)%MOD+x.charAt(i))%MOD;
        }
        return hashcode; 
    }

Exercise for the reader:

See the code for the hashCode function for java.util.String. Can you see why it does not use a modulus explicitly?

Solution 3

The following statements will find the string hashCode

String str="Hi";

int a = str.hashCode();//returns 2337

Let's check how exactly its calculated

HashCode = s[0]*31(n-1) + s[1]*31(n-2) + .. s(n-2)

As we all know that the character at position 0 is H, Character at position 1 is i, and the string length is 2.

==> H*31(2-1) + i*31(2-2)

As we all know that, ASCII code of H is 72, and i is 105. It means,

==> 72 * 31 + 105 * 1 (Anything Power 0 is 1)

==> 2232 + 105 = 2337

Source: https://www.tutorialgateway.org/find-string-hashcode-in-java/

Share:
35,160
thomascirca
Author by

thomascirca

Updated on July 09, 2022

Comments

  • thomascirca
    thomascirca almost 2 years

    I was wondering how to calculate the hash code for a given string by hand. I understand that in Java, you can do something like:

    String me = "What you say what you say what?";  
    long whatever = me.hashCode();
    

    That's all good and dandy, but I was wondering how to do it by hand. I know the given formula for calculating the hash code of a string is something like:

    S0 X 31 ^ (n-1) + S1 X 31 ^ (n-2) + .... + S(n-2) X 31 + S(n-1)  
    

    Where S indicates the character in the string, and n is the length of the string. Using 16 bit unicode then, the first character from string me would be computed as:

    87 X (31 ^ 34)
    

    However, that creates an insanely large number. I can't imagine adding all the characters together like that. So, in order to calculate the lowest-order 32 bits result then, what would I do? Long whatever from above equals -957986661 and I'm not how to calculate that?

  • thomascirca
    thomascirca almost 14 years
    I get the basic idea (I can compute small strings) but when the string gets large I'm unsure as to what to do.
  • Vishy
    Vishy almost 14 years
    @user458346, the size of the string isn't important. This is the values of using a loop, it doesn't matter how long the loop is, it does get any more complicated.
  • jjczopek
    jjczopek almost 14 years
    I can't see... Could You explain it?
  • MAK
    MAK almost 14 years
    @jjczopek: Notice that x%2^n = x&(2^n-1). So if you did arithmetic modulo 2^n, you just need to keep the last n bits of your value, discarding any higher bits. Now think what happens when you just use an int to represent your value. Any arithmetic you do will result in only the last 32 bits being left over. Voila, you have arithmetic modulo 2^32.
  • deceleratedcaviar
    deceleratedcaviar over 12 years
    Right. How did you not see that jjczopek >_<.
  • paradox
    paradox over 4 years
    where is offset been initiated?