Java: simplest integer hash

15,166

Solution 1

I did some experimentation (see test program below); computation of 2^n in Galois fields, and floor(A*sin(n)) both did very well to produce a sequence of "random" bits. I tried multiplicative congruential random number generators and some algebra and CRC (which is analogous of k*n in Galois fields), none of which did well.

The floor(A*sin(n)) approach is the simplest and quickest; the 2^n calculation in GF32 takes approx 64 multiplies and 1024 XORs worstcase, but the periodicity of output bits is extremely well-understood in the context of linear-feedback shift registers.

package com.example.math;

public class QuickHash {
    interface Hasher
    {
        public int hash(int n); 
    }
    static class MultiplicativeHasher1 implements Hasher
    {
        /* multiplicative random number generator
         * from L'Ecuyer is x[n+1] = 1223106847 x[n] mod (2^32-5)
         * http://dimsboiv.uqac.ca/Cours/C2012/8INF802_Hiv12/ref/paper/RNG/TableLecuyer.pdf
         */
        final static long a = 1223106847L;
        final static long m = (1L << 32)-5;
        /*
         * iterative step towards computing mod m
         *   (j*(2^32)+k) mod (2^32-5)
         * = (j*(2^32-5)+j*5+k) mod (2^32-5)
         * = (j*5+k) mod (2^32-5)
         * repeat twice to get a number between 0 and 2^31+24 
         */
        private long quickmod(long x)
        {
            long j = x >>> 32;
            long k = x & 0xffffffffL;
            return j*5+k;
        }
        // treat n as unsigned before computation
        @Override public int hash(int n) {
            long h = a*(n&0xffffffffL);
            long h2 = quickmod(quickmod(h));            
            return (int) (h2 >= m ? (h2-m) : h2);
        }       
        @Override public String toString() { return getClass().getSimpleName(); } 
    }

    /** 
     * computes (2^n) mod P where P is the polynomial in GF2
     * with coefficients 2^(k+1) represented by the bits k=31:0 in "poly";
     * coefficient 2^0 is always 1
     */
    static class GF32Hasher implements Hasher
    {
        static final public GF32Hasher CRC32 = new GF32Hasher(0x82608EDB, 32);

        final private int poly;
        final private int ofs;
        public GF32Hasher(int poly, int ofs) {
            this.ofs = ofs;
            this.poly = poly;
        }

        static private long uint(int x) { return x&0xffffffffL; }
        // modulo GF2 via repeated subtraction
        int mod(long n) {
            long rem = n;
            long q = uint(this.poly);
            q = (q << 32) | (1L << 31);
            long bitmask = 1L << 63;
            for (int i = 0; i < 32; ++i, bitmask >>>= 1, q >>>= 1)
            {
                if ((rem & bitmask) != 0)
                    rem ^= q;
            }
            return (int) rem;
        }
        int mul(int x, int y)
        {
            return mod(uint(x)*uint(y));
        }
        int pow2(int n) {
            // compute 2^n mod P using repeated squaring
            int y = 1;
            int x = 2;
            while (n > 0)
            {
                if ((n&1) != 0)
                    y = mul(y,x);
                x = mul(x,x);
                n = n >>> 1;
            }
            return y;
        }
        @Override public int hash(int n) {
            return pow2(n+this.ofs);
        }
        @Override public String toString() { 
            return String.format("GF32[%08x, ofs=%d]", this.poly, this.ofs); 
        }
    }
    static class QuickHasher implements Hasher
    {
        @Override public int hash(int n) {
            return (int) ((131111L*n)^n^(1973*n)%7919);
        }
        @Override public String toString() { return getClass().getSimpleName(); }       
    }

    // adapted from http://www.w3.org/TR/PNG-CRCAppendix.html
    static class CRC32TableHasher implements Hasher
    {
        final private int table[];
        static final private int polyval = 0xedb88320;

        public CRC32TableHasher() 
        {           
            this.table = make_table();
        }

        /* Make the table for a fast CRC. */
        static public int[] make_table()
        {
            int[] table = new int[256]; 
            int c;
            int n, k;

            for (n = 0; n < 256; n++) {
                c = n;
                for (k = 0; k < 8; k++) {
                    if ((c & 1) != 0)
                        c = polyval ^ (c >>> 1);
                    else
                        c = c >>> 1;
                }
                table[n] = (int) c;
            }
            return table;
        }

        public int iterate(int state, int i)
        {
            return this.table[(state ^ i) & 0xff] ^ (state >>> 8);
        }
        @Override public int hash(int n) {
            int h = -1;
            h = iterate(h, n >>> 24); 
            h = iterate(h, n >>> 16); 
            h = iterate(h, n >>> 8); 
            h = iterate(h, n); 
            return h ^ -1;
        }
        @Override public String toString() { return getClass().getSimpleName(); } 
    }

    static class TrigHasher implements Hasher
    {       
        @Override public String toString() { return getClass().getSimpleName(); }
        @Override public int hash(int n) { 
            double s = Math.sin(n);
            return (int) Math.floor((1<<31)*s);
        }       
    }

    private static void test(Hasher hasher) {
        System.out.println(hasher+":");
        for (int i = 0; i < 64; ++i)
        {
            int h = hasher.hash(i);
            System.out.println(String.format("%08x -> %08x   %%2 = %d", 
                    i,h,(h&1)));
        }
        for (int i = 0; i < 256; ++i)
        {
            System.out.print(hasher.hash(i) & 1);
        }
        System.out.println();       
        analyzeBits(hasher);
    }

    private static void analyzeBits(Hasher hasher) {
        final int N = 65536;
        final int maxrunlength=32;
        int[][] runs = {new int[maxrunlength], new int[maxrunlength]};
        int[] count = new int[2];
        int prev = -1;
        System.out.println("Run length test of "+N+" bits");
        for (int i = 0; i < maxrunlength; ++i)
        {
            runs[0][i] = 0;
            runs[1][i] = 0;
        }
        int runlength_minus1 = 0;
        for (int i = 0; i < N; ++i)
        {
            int b = hasher.hash(i) & 0x1;
            count[b]++;
            if (b == prev)
                ++runlength_minus1;
            else if (i > 0)
            {
                ++runs[prev][runlength_minus1];
                runlength_minus1 = 0;
            }
            prev = b;
        }
        ++runs[prev][runlength_minus1];

        System.out.println(String.format("%d zeros, %d ones", count[0], count[1]));
        for (int i = 0; i < maxrunlength; ++i)
        {
            System.out.println(String.format("%d runs of %d zeros, %d runs of %d ones", runs[0][i], i+1, runs[1][i], i+1));         
        }
    }

    public static void main(String[] args) {
        Hasher[] hashers = {
            new MultiplicativeHasher1(), 
            GF32Hasher.CRC32, 
            new QuickHasher(),
            new CRC32TableHasher(),
            new TrigHasher()
        };
        for (Hasher hasher : hashers)
        {
            test(hasher);
        }
    }
}

Solution 2

HashMap, as well as Guava's hash-based utilities, use the following method on hashCode() results to improve bit distributions and defend against weaker hash functions:

  /*
   * This method was written by Doug Lea with assistance from members of JCP
   * JSR-166 Expert Group and released to the public domain, as explained at
   * http://creativecommons.org/licenses/publicdomain
   * 
   * As of 2010/06/11, this method is identical to the (package private) hash
   * method in OpenJDK 7's java.util.HashMap class.
   */
  static int smear(int hashCode) {
    hashCode ^= (hashCode >>> 20) ^ (hashCode >>> 12);
    return hashCode ^ (hashCode >>> 7) ^ (hashCode >>> 4);
  }

Solution 3

So, I read this question, thought hmm this is a pretty math-y question, it's probably out of my league. Then, I ended up spending so much time thinking about it that I actually believe I've got the answer: No function can satisfy the criteria that f(n) & 1 is non-periodic for consecutive values of n.

Hopefully someone will tell me how ridiculous my reasoning is, but until then I believe it's correct.

Here goes: Any binary integer n can be represented as either 1...0 or 1...1, and only the least significant bit of that bitmap will affect the result of n & 1. Further, the next consecutive integer n + 1 will always contain the opposite least significant bit. So, clearly any series of consecutive integers will exhibit a period of 2 when passed to the function n & 1. So then, is there any function f(n) that will sufficiently distribute the series of consecutive integers such that periodicity is eliminated?

Any function f(n) = n + c fails, as c must end in either 0 or 1, so the LSB will either flip or stay the same depending on the constant chosen.

The above also eliminates subtraction for all trivial cases, but I have not taken the time to analyze the carry behavior yet, so there may be a crack here.

Any function f(n) = c*n fails, as the LSB will always be 0 if c ends in 0 and always be equal to the LSB of n if c ends in 1.

Any function f(n) = n^c fails, by similar reasoning. A power function would always have the same LSB as n.

Any function f(n) = c^n fails, for the same reason.

Division and modulus were a bit less intuitive to me, but basically, the LSB of either option ends up being determined by a subtraction (already ruled out). The modulus will also obviously have a period equal to the divisor.

Unfortunately, I don't have the rigor necessary to prove this, but I believe any combination of the above operations will ultimately fail as well. This leads me to believe that we can rule out any transcendental function, because these are implemented with polynomials (Taylor series? not a terminology guy).

Finally, I held out hope on the train ride home that counting the bits would work; however, this is actually a periodic function as well. The way I thought about it was, imagine taking the sum of the digits of any decimal number. That sum obviously would run from 0 through 9, then drop to 1, run from 1 to 10, then drop to 2... It has a period, the range just keeps shifting higher the higher we count. We can actually do the same thing for the sum of the binary digits, in which case we get something like: 0,1,1,2,2,....5,5,6,6,7,7,8,8....

Did I leave anything out?

TL;DR I don't think your question has an answer.

Solution 4

[SO decided to convert my "trivial answer" to comment. Trying to add little text to it to see if it can be fooled]

Unless you need the ranger of hashing function to be wider..

The NumberOfSetBits function seems to vary quite a lot more then the hashCode, and as such seems more appropriate for your needs. Turns out there is already a fairly efficient algorithm on SO.

See Best algorithm to count the number of set bits in a 32-bit integer.

Share:
15,166
Jason S
Author by

Jason S

Updated on June 18, 2022

Comments

  • Jason S
    Jason S almost 2 years

    I need a quick hash function for integers:

     int hash(int n) { return ...; }
    

    Is there something that exists already in Java?

    The minimal properties that I need are:

    • hash(n) & 1 does not appear periodic when used with a bunch of consecutive values of n.
    • hash(n) & 1 is approximately equally likely to be 0 or 1.