What is Folding technique in hashing and how to implement it?

35,607

Solution 1

Following the answers from Tony and Sumeet, I did some more research on digit folding and decided to implement the technique explained by Robert Lafore in his Data Structures book.

For example, suppose you want to hash 10-digit machine numbers. If the array size is 1,000, you would divide the 10-digit number into three groups of three digits and one group of one-digit. In out example in question machine number is 424-124-9675, so you would calculate a key value of 424 + 124 + 967 + 5 = 1520. You can use the % operator to trim such sums so the highest index is 999. In this case, 1520 % 1000 = 520.

If the array size is 100, you would need to break the 10-digit key into five two-digit numbers: 42 + 41 + 24 + 96 + 75 = 278, and 278 % 100 = 78.

It’s easier to imagine how this works when the array size is a multiple of 10. However, for best results it should be a prime number.

Here's the Java code of digit folding technique I implemented:

public class DigitFolder {
    public static void main(String[] args) {
        int hashVal = hashFunc(424124967);
        System.out.println(hashVal);
    }
    public static int hashFunc(int key) {
        int arraySize = 1021;
        int keyDigitCount = getDigitCount(key);
        int groupSize = getDigitCount(arraySize);
        int groupSum = 0;
        String keyString = Integer.toString(key);
        int i;
        for (i = 0; i < keyString.length(); i += groupSize) {
            if (i + groupSize <= keyString.length()) {
                String group = keyString.substring(i, i + groupSize);
                groupSum += Integer.parseInt(group);
            }
        }
        // There is no remaining part if count is divisible by groupsize.
        if (keyDigitCount % groupSize != 0) {
            String remainingPart = 
                   keyString.substring(i - groupSize, keyString.length());
            groupSum += Integer.parseInt(remainingPart);
        }
        return groupSum % arraySize;
    }
    public static int getDigitCount(int n) {
        int numDigits = 1;
        while (n > 9) {
            n /= 10;
            numDigits++;
        }
        return numDigits;
    }
}

I found the group making method here. But it makes groups right to left. So, I used String#subString() method to make left to right groups.

Solution 2

There are 2 types of folding methods used Fold shift and Fold boundary.

Fold Shift

You divide the key in parts whose size matches the size of required address. The parts are simply added to get the required address.

Key:123456789 and size of required address is 3 digits.

123+456+789 = 1368. To reduce the size to 3, either 1 or 8 is removed and accordingly the key would be 368 or 136 respectively.

Fold Boundary

You again divide the key in parts whose size matches the size of required address.But now you also applying folding, except for the middle part,if its there.

Key:123456789 and size of required address is 3 digits

321 (folding applied)+456+987 (folding applied) = 1764(discard 1 or 4)

Solution 3

Given 424-124-9675, you decide where you want to break it into groups. For example:

  • every 3 digits from left: hash = 424 + 124 + 967 + 5

  • every 3 digits from right: hash = 675 + 249 + 241 + 4

  • where the dashes are: hash = 424 + 124 + 9675

It's a terribly weak way to hash though - very collision prone.

Share:
35,607
Yogesh Umesh Vaity
Author by

Yogesh Umesh Vaity

A lifelong learner. I'm a refactoring enthusiast. I constantly think about the right names for the things in my code and always look forward to abstracting the complex functionality. I like unit tests! Love to see those beautiful green checkmarks ✅ ticking while the tests are being run! Always eager to learn new stacks. I'm also fascinated with various software architectures and patterns.

Updated on April 18, 2020

Comments

  • Yogesh Umesh Vaity
    Yogesh Umesh Vaity about 4 years

    I heard at a data structures seminar that we can break a key into groups of digits and then do the addition of groups. This ensures that all the digits contribute the hash code. The number of digits in a group correspond to the size of the array.

    For example, I have a machine number say 424-124-9675, how do I make the hash function using the Folding technique?

  • Yogesh Umesh Vaity
    Yogesh Umesh Vaity about 8 years
    Thank for answer +1 for you, that sounds interesting. But I didn't understand how did you derive 321? why not 123? also what is the effect of discarding 1 or 4?
  • Yogesh Umesh Vaity
    Yogesh Umesh Vaity about 8 years
    Hey thanks for answer +1 to you. But the group size depends on the size of our table right? So, for example if we have 1000 size of table, we divide into groups of 3, and when we have size 100 we make group of 2. So it may not be as terrible right?
  • Sumeet
    Sumeet about 8 years
    @YogeshUmeshVaity If its 123 then there will be no difference between 2 techniques.
  • Sumeet
    Sumeet about 8 years
    @YogeshUmeshVaity The effect of 1 or 4 is simply getting the right index. if you have an array of length 999, then definitely an index of size 4 digits is invalid. The address basically means the index of the array at which you would store the key.
  • Tony Delroy
    Tony Delroy about 8 years
    @YogeshUmeshVaity: there are lots of options for grouping... you could make it depend on your table, and that would work ok if your table size was a power of ten, or if say you'd used three digit groups you could still use the mod % operator to pick from a smaller number of buckets, e.g. if your sum-of-groups was 123, and your table size 37, use 123 % 37.