IPv6 address validation and canonicalization

17,320

Solution 1

On POSIX systems you can use inet_pton and inet_ntop in combination to do canonicalization. You will still have to do your own CIDR parsing. Fortunately, I believe the only valid CIDR syntax for IPv6 is the /number_of_bits notation, so that's fairly easy.

The other issue you will run into is the lack of support for interface specifications. For link-local addresses, you will see things like %eth0 on the end to specify what link they are local too. getaddrinfo will parse that but inet_pton won't.

One strategy you could go for is using getaddrinfo to parse and inet_ntop to canonicalize.

getaddrinfo is available for Windows. inet_pton and inet_ntop aren't. Fortunately, it isn't too hard to write code to produce a canonical form IPv6 address. It will require two passes though because the rule for 0 compression is the biggest string of 0s that occurs first. Also IPv4 form (i.e. ::127.0.0.1) is only used for ::IPv4 or ::ffff:IPv4.

I have no Windows machine to test with, but from the documentation it appears that Python on Windows supports inet_pton and inet_ntop in its socket module.

Writing your own routine for producing a canonical form might not be a bad idea, since even if your canonical form isn't the same as everybody else's, as long as it's valid other people can parse it. But I would under no circumstances write a routine of your own to parse IPv6 addresses.

My advice above is good for Python, C, and C++. I know little or nothing about how to solve this problem in Java or Javascript.

EDIT: I have been examining getaddrinfo and its counterpart, getnameinfo. These are in almost all ways better than inet_pton and inet_ntop. They are thread safe, and you can pass them options (AI_NUMERICHOST in getaddrinfo's case, and NI_NUMERCHOST in getnameinfo's case) to keep them from doing any kind of DNS queries. Their interface is a little complex and reminds me of an ugly Windows interface in some respects, but it's fairly easy to figure out what options to pass to get what you want. I heartily recommend them both.

Solution 2

In Java, You could use

InetAddress.getByName(IP)

and then check for exceptions thrown by this for validating IPv6 addresses

You could also use Sun Propreitary API if thats oK to you. THis will not perform a DNS lookup. ( They might change it/remove it without notice since its their propreitary API.This is a warning that will come when compiling a code using this )

boolean sun.net.util.IPAddressUtil.isIPv6LiteralAddress(String IP)

Solution 3

In Java, the Guava library has utility functions for validating IPv6 (and IPv4) in the com.google.common.net.InetAddresses class. http://goo.gl/RucRU

Solution 4

I wrote javascript-ipv6 for this very purpose. It currently powers v6decode.com.

Here's a short example of the API:

var address = new v6.Address("::ffff:7b2d:4359/64");

if (address.isValid()) {
   // Do something if the address is valid
}

console.log(address.correctForm());         // "::ffff:7b2d:4359"
console.log(address.canonicalForm());       // "0000:0000:0000:0000:0000:ffff:7b2d:4359"
console.log(address.v4Form());              // "::ffff:123.45.67.89"
console.log(address.subnetMask);            // "64"
console.log(address.possibleAddresses(96)); // "4,294,967,296"

Solution 5

I use a regular expression when os support may not be available - RE is available in most languages, including C/C++/Java/Python/Perl/bash/.... The following python code builds the RE at startup, the resulting RE source is a humdinger - but once compiled by the re engine is as fast as native code.

PAT_IP4 = r'\.'.join([r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])']*4)
RE_IP4 = re.compile(PAT_IP4+'$')
RE_IP6 = re.compile(                 '(?:%(hex4)s:){6}%(ls32)s$'
               '|::(?:%(hex4)s:){5}%(ls32)s$'
              '|(?:%(hex4)s)?::(?:%(hex4)s:){4}%(ls32)s$'
'|(?:(?:%(hex4)s:){0,1}%(hex4)s)?::(?:%(hex4)s:){3}%(ls32)s$'
'|(?:(?:%(hex4)s:){0,2}%(hex4)s)?::(?:%(hex4)s:){2}%(ls32)s$'
'|(?:(?:%(hex4)s:){0,3}%(hex4)s)?::%(hex4)s:%(ls32)s$'
'|(?:(?:%(hex4)s:){0,4}%(hex4)s)?::%(ls32)s$'
'|(?:(?:%(hex4)s:){0,5}%(hex4)s)?::%(hex4)s$'
'|(?:(?:%(hex4)s:){0,6}%(hex4)s)?::$'
  % {
'ls32': r'(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|%s)'%PAT_IP4,
'hex4': r'[0-9a-f]{1,4}'
}, re.IGNORECASE)
Share:
17,320
Ramy Mohamed
Author by

Ramy Mohamed

.

Updated on June 04, 2022

Comments

  • Ramy Mohamed
    Ramy Mohamed about 2 years

    What libs have you used for that? How compatible are they with one another? Or did you write your own parsing routine?

    I'm particularly interested in mutually-compatible implementations for Java, C++, Python, and JavaScript, which support:

    • zero compression ("::")
    • IPv4-mapped addresses ("::ffff:123.45.67.89")
    • canonicalization (including to the short form, for human readability)
    • CIDR-style netmasks (like "/64" at the end)
  • Omnifarious
    Omnifarious almost 15 years
    I don't think getaddrinfo supports parsing subnet bit-slicing notation. It also doesn't support canonicalization.
  • Ramy Mohamed
    Ramy Mohamed almost 15 years
    Thanks for the idea. Unfortunately that method does a DNS lookup before throwing the exception.
  • vpram86
    vpram86 almost 15 years
    Okie.. I looked into source code of InetAddress. Its using sun.net.util.IPAddressUtil class. Its a sun propreietary API. But if that's okie to you, you could use it. Use the static method isIPv6LiteralAddress(String IP). It returns true or false
  • Ramy Mohamed
    Ramy Mohamed almost 15 years
    IPAddressUtil works great. I guess I can cope with canonicalization too, via textToNumericFormatV*(String) and some custom formatting back to text. +1 and I wish I could accept more than one answer.
  • vpram86
    vpram86 almost 15 years
    Wow. Glad i could help! :) No prob at all!
  • Omnifarious
    Omnifarious almost 15 years
    I would like to comment that after playing with getaddrinfo myself, the OSX 10.4 and glibc-2.10.1 versions both have interesting bugs. The OSX version has much worse bugs.
  • Jason R. Coombs
    Jason R. Coombs over 14 years
    Windows does support inet_pton and inet_ntop for Vista and later (see msdn.microsoft.com/en-us/library/cc805843%28VS.85%29.aspx). However, Python 2.6 and 3.0 does not (docs indicate Availability Unix).
  • Zan Lynx
    Zan Lynx over 14 years
    Watch out for speed in Windows. For some reason the Windows inet_pton and inet_ntop are horribly horribly slow. Or they were in my experience on Vista and Server 2008.
  • Neil McGuigan
    Neil McGuigan over 10 years
    You can use Guava's InetAddresses to avoid the lookup
  • Ron Maupin
    Ron Maupin almost 9 years
    The output for first and last usable addresses is incorrect. IPv6, unlike IPv4, can use all the addresses in a subnet. A standard IPv6 subnet is /64 (a few special cases use other mask lengths), and the usable addresses are from <subnet>:: to <subnet>:ffff:ffff:ffff:ffff. There is no reservation for the subnet (<subnet>::), and IPv6 doesn't have the concept of broadcast, so no broadcast address exists at <subnet>:ffff:ffff:ffff:ffff.
  • LukeSkywalker
    LukeSkywalker almost 9 years
    Thanks Ron for pointing this out.
  • Ron Maupin
    Ron Maupin over 8 years
    @Omnifarious, you wrote, "But I would under no circumstances write a routine of your own to parse IPv6 addresses." I did exactly that that fairly easily using RegEx, and it accepts all legal forms of IPv6 addressing (compressed, uncompressed, hybrid compressed, hybrid uncompressed, and upper or lower case versions of each, with or without the /<mask length>). The reverse routine will return the RFC 5259 "official" canonical representation of IPv6 addresses, including hybrid for /96 addresses.