How to efficiently hash the ip-address

20,574

Interesting, that such an interesting question did not have any interesting answer (sorry for tautology).

If you see it as a theoretical matter, then this link is what you need (there is even a superfast hash function written for you and ready to go):

http://www.kfki.hu/~kadlec/sw/netfilter/ct3/

Practical matter may be different. If your hash table is of reasonable size, you will have to handle collisions anyway (with linked lists). So ask yourself what use case will take place in the end? If your code will run within some secluded ecosystem, and the IP address is a-b-c-d, c and d are the most volatile numbers and d won't be null (assuming you don't handle networks), so a hash table of 64K buckets, and cd as a hash may well be satisfactory?

Another use case - TCP connection tracking where a client use ephemeral port that is assigned by kernel randomly (isn't it ideal for hashing?). The problem is the limited range: something like 32768-61000 which renders least significant byte more random than most significant byte. So you can XOR the most significant byte with the most volatile byte in IP address that can be zerro (c) and use it as a hash in your 64K table.

Share:
20,574
Abhishek Gupta
Author by

Abhishek Gupta

Updated on July 27, 2022

Comments

  • Abhishek Gupta
    Abhishek Gupta almost 2 years

    This is an interview question. I thought about some solution like multiway- hashing but could not find some thing elegant. Please suggest some good method.

    Question: You have 10 million IP addresses. (IPv4 4 byte addresses). Create a hash function for these IP addresses.

    Hint: Using the IP's themselves as a key is a bad idea because there will be a lot of wasted space

  • gatopeich
    gatopeich over 8 years
    Cost can be very important for hashing. MD5 and SHA are costly hashes for something as small as an IP address.
  • Aman Verma
    Aman Verma about 7 years
    I don't think that's a tautology, more like a conduplicatio