save IP address in mongoDB

12,375

Solution 1

Definitely save IP addresses as numbers, if you don't mind the extra bit of work that it takes, especially if you need to do queries on the addresses and you have large tables/collections.

Here's why:

Storage

  • An IPv4 address is 4 bytes if stored as unsigned integer.
  • An IPv4 address varies between 10 bytes and 18 bytes when written out as a string in dotted octed form. (Let's assume the average is 14 bytes.)

That is 7-15 bytes for the characters, plus 2-3 bytes if you're using a variable length string type, which varies based on the database you're using. If you have a fixed length string representation available, then you must use a 15-character fixed width field.

Disk storage is cheap, so that's not a factor in most use cases. Memory, however, is not as cheap, and if you have a large table/collection and you want to do fast queries, then you need an index. The 2-3x storage penalty of string encoding drastically reduces the amount of records you can index while still keeping the index resident in memory.

  • An IPv6 address is 16 bytes if stored as an unsigned integer. (Likely as multiple 4 or 8 byte integers, depending on your platform.)
  • An IPv6 address ranges from 6 bytes to 42 bytes when encoded as a string in abbreviated hex notation.

On the low end, a loop back address (::1) is 3 bytes plus the variable length string overhead. On the high end, an address like 2002:4559:1FE2:1FE2:4559:1FE2:4559:1FE2 uses 39 bytes plus the variable length string overhead.

Unlike with IPv4, it's not safe to assume the average IPv6 string length will be mean of 6 and 42, because the number of addresses with a significant number of consecutive zeroes is a very small fraction of the overall IPv6 address space. Only some special addresses, like loopback and autoconf addresses, are likely to be compressible in this way.

Again, this is a storage penalty of >2x for string encoding versus integer encoding.

Network Math

Do you think routers store IP addresses as strings? Of course they don't.

If you need to do network math on IP addresses, the string representation is a hassle. E.g. if you want to write a query that searches for all addresses on a specific subnet ("return all records with an IP address in 10.7.200.104/27", you can easily do this by masking an integer address with an integer subnet mask. (Mongo doesn't support this particular query, but most RDBMS do.) If you store addresses as strings, then your query will need to convert each row to an integer, then mask it, which is several orders of magnitude slower. (Bitwise masking for an IPv4 address can be done in a few CPU cycles using 2 registers. Converting a string to an integer requires looping over the string.)

Similarly, range queries ("return all records all records between 192.168.1.50 and 192.168.50.100") with integer addresses will be able to use indexes, whereas range queries on string addresses will not.

The Bottom Line

It takes a little bit more work, but not much (there are a million aton() and ntoa() functions out there), but if you're building something serious and solid and you want to future-proof it against future requirements and the possibility of a large dataset, you should store IP addresses as integers, not strings.

If you're doing something quick and dirty and don't mind the possibility of remodeling in the future, then use strings.

For the OP's purpose, if you are optimizing for speed and space and you don't think you want to query it often, then why use a database at all? Just print IP addresses to a file. That would be faster and more storage efficient than storing it in a database (with associated API and storage overhead).

Solution 2

An efficient way to save a ip address as a int. If you want to tag a ip with cidr filter, a demo here:

> db.getCollection('iptag').insert({tags: ['office'], hostmin: 2886991873, hostmax: 2887057406, cidr: '172.20.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173867009, hostmax: 173932542, cidr: '10.93.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173932545, hostmax: 173998078, cidr: '10.94.0.0/16'})

Create tags index.

> db.getCollection('iptag').ensureIndex(tags: 1)

Filter ip with cidr range. ip2int('10.94.25.32') == 173938976.

> db.getCollection('iptag').find({hostmin: {$lte: 173938976}, hostmax: {$gte: 173938976}})
Share:
12,375

Related videos on Youtube

Salvador Dali
Author by

Salvador Dali

I am a Software Engineer in the Google Search Growth team. I use Tensorflow and TFX to analyze search data and Go to write data pipelines. This is my personal profile which has absolutely nothing to do with my employer.

Updated on September 15, 2022

Comments

  • Salvador Dali
    Salvador Dali over 1 year

    Currently in order to save an IP address I am converting it to number and store it in the collection. Basically I am doing this for logging purposes. This means that I care to store information as fast as possible and with smallest amount of space.

    I will be rarely using it for querying.

    My ideas that

    • Storing as strings is for sure inefficient.
    • Storing as 4 digits will be slower and will take more space.

    Nonetheless I think that this is an adequate method, but is there a better one for my purpose?

    • Joe
      Joe about 11 years
      4 ints won't work for IPv6 addresses. 4 digits will not take more space than a string. Honestly, you have to decide whether convertign from the source string or the space loss is more important and decide based on that.
    • WiredPrairie
      WiredPrairie about 11 years
      Do you need to query the resulting structure? MongoDB may not be the best choice for logging if it's competing with other database write operations. Try options and see how they perform. Look at the stats for the collection (docs.mongodb.org/manual/reference/collection-statistics) to see how large the average documents are. You may also want to do some in memory buffering rather than writing many tiny individual documents.
  • Salvador Dali
    Salvador Dali over 10 years
    I think you have not read my question. I know that I can store them this way and I already written it in the question. I am looking for a more elaborate answer then just a one-liner you can store them as an integer.
  • coffee_machine
    coffee_machine over 10 years
    I did read it. You're question sounds like 'I did it the best way, but what else would be better?'. So basically my answer says 'yes I think that's the best way'. And there's no need to write a book for such a simple answer.
  • Salvador Dali
    Salvador Dali almost 10 years
    there is absolutely no point of doing this, because most of the languages have the native function that is doing something really similar. Also the question was not about how to convert IP to integer.
  • user1063287
    user1063287 over 3 years
    anyone know of any good resources that document the best/recommended way to store ip addresses in mongodb? the question states that the user does it, but i'd like to know how they do it. searches for "how to store ip address in mongodb?" etc mainly show results related to whitelisting ip's. thank you.