More than 4 billion key value pairs in Redis?

12,100

Solution 1

You can store 4B items in Redis with no specific degradation of performance, but you need the memory for this (i.e. everything must fit in memory).

The optimal ways to implement this kind of queries with Redis has been described here:

store ip ranges in Redis

and here:

Redis or Mongo for determining if a number falls within ranges?

So the complexity of the optimal solution depends on the fact you consider the ranges of IPs can overlap or not.

Solution 2

I believe it's the wrong way to do it.

Keep IP mapping as integer ranges (From IP - To IP, converted to decimal) and quickly query your subject IP using a traditional DB or using a NoSQL that's strong at comparisons.

Solution 3

Just use geodis. It already does IP to country/location lookups and efficiently store those data for you. You are free to use it only for data loading and request data directly from redis itself.

Solution 4

The approach we use for fast Geo-IP resolution is take all the IP ranges and break them at the /24 (the first three quads), and store a record holding all the matches in those addresses. This gives you 16 million keys and O(1) access. If you'll tolerate the client-side complexity of breaking up the stored record, it's performant without taking up lots of RAM.

In more detail:

  • take all ranges, and break them by their first 24 bits.
    • The range 128.100.60.0-128.100.60.9 becomes one record, <128.100.60 | 0 9 | (...recA...)>
    • The range 128.100.60.10 - 128.100.62.80 would become <128.100.60 | 10 255 | (...recB...)>, <128.100.61 | 0 255 | (...recB...)>, and <128.100.62 | 0 80 | (...recB...)>.
  • combine all the records with the same prefix into a hash whose key is the top of its range. So
    • key 128.100.60: {9: {...recA...}, 255: {...recB...}}
    • key 128.100.61: {255: {...recB...}}
    • key 128.100.62: {80: {...recB...}, ...}

To retrieve a specific IP, retrieve the compound record by its 24-bit key, and return the first result whose sub-key is larger than the last part. If I looked up 128.100.60.20, I would find that 9 was not larger, but that 255 was, and so return recB.

This is a common strategy for doing range joins (even spatial joins!) in things like Hadoop: partition on some reasonable chunk, and then index on one end of the range.

Share:
12,100
harshsinghal
Author by

harshsinghal

Twig in a Random Forest

Updated on June 04, 2022

Comments

  • harshsinghal
    harshsinghal almost 2 years

    I am trying to store ip numbers in redis along with associated ISP information. I have Maxmind data and the csv files contain start and end numbers for each ISP.

    When querying in SQL I can check for an IP(after converting it to a number) to be available in a range and get the associated ISP.

    I was thinking of converting all the ranges to individual numbers and submit all the key values pairs in Redis for faster lookup. This approximately will result in 4 billion key value pairs in the Redis store. I have done this for a few hundred million key value pairs but I am looking for advice/suggestions when moving to 4 billion pairs in Redis. Any performance issues I must be aware of or are there ways I can do this better ?

    Thank you for all the suggestions.

    UPDATE: Thanks to the suggestions below I could get this working. Thought I'd share the Python code (quick and dirty) for this here :

    import redis
    import pymysql
    
    conn = pymysql.connect(host='localhost',user='user',passwd='password',db='foo')
    cur = conn.cursor()
    cur.execute('select startipnum,endipnum,isp from wiki.ipisp order by endipnum;')
    result = cur.fetchall()
    
    r = redis.StrictRedis(host='localhost', port=6379, db=0)
    ispctr = 1
    for row in result:
        tempDict = {'ispname':row[2],'fromval':row[0],'toval':row[1]}
        namefield = ispctr
        r.hmset(namefield,tempDict)
        r.zadd('ispmaxindex',row[1],namefield)
        ispctr = ispctr+1
    conn.close()
    
    ipstotest = ['23.23.23.23','24.96.185.10','203.59.91.235','188.66.105.50','99.98.163.93']
    for ip in ipstotest:
        ipvalsList = [int(ipoct) for ipoct in ip.split('.')]
        ipnum = (16777216*ipvalsList[0]) + (65536*ipvalsList[1]) + (256*ipvalsList[2]) + ipvalsList[3]
        ipnum = long(ipnum)
        tempVal1 = r.zrangebyscore('ispmaxindex',ipnum,float('Inf'),0,1)
        tempval2 = r.hgetall(tempval1[0])
        print tempval2['ispname']