IPv6 address validation and canonicalization
Solution 1
On POSIX systems you can use inet_pton
and inet_ntop
in combination to do canonicalization. You will still have to do your own CIDR parsing. Fortunately, I believe the only valid CIDR syntax for IPv6 is the /number_of_bits notation, so that's fairly easy.
The other issue you will run into is the lack of support for interface specifications. For link-local addresses, you will see things like %eth0
on the end to specify what link they are local too. getaddrinfo
will parse that but inet_pton
won't.
One strategy you could go for is using getaddrinfo
to parse and inet_ntop
to canonicalize.
getaddrinfo
is available for Windows. inet_pton
and inet_ntop
aren't. Fortunately, it isn't too hard to write code to produce a canonical form IPv6 address. It will require two passes though because the rule for 0 compression is the biggest string of 0s that occurs first. Also IPv4 form (i.e. ::127.0.0.1
) is only used for ::IPv4
or ::ffff:IPv4
.
I have no Windows machine to test with, but from the documentation it appears that Python on Windows supports inet_pton
and inet_ntop
in its socket module.
Writing your own routine for producing a canonical form might not be a bad idea, since even if your canonical form isn't the same as everybody else's, as long as it's valid other people can parse it. But I would under no circumstances write a routine of your own to parse IPv6 addresses.
My advice above is good for Python, C, and C++. I know little or nothing about how to solve this problem in Java or Javascript.
EDIT: I have been examining getaddrinfo and its counterpart, getnameinfo. These are in almost all ways better than inet_pton
and inet_ntop
. They are thread safe, and you can pass them options (AI_NUMERICHOST
in getaddrinfo
's case, and NI_NUMERCHOST
in getnameinfo
's case) to keep them from doing any kind of DNS queries. Their interface is a little complex and reminds me of an ugly Windows interface in some respects, but it's fairly easy to figure out what options to pass to get what you want. I heartily recommend them both.
Solution 2
In Java, You could use
InetAddress.getByName(IP)
and then check for exceptions thrown by this for validating IPv6 addresses
You could also use Sun Propreitary API if thats oK to you. THis will not perform a DNS lookup. ( They might change it/remove it without notice since its their propreitary API.This is a warning that will come when compiling a code using this )
boolean sun.net.util.IPAddressUtil.isIPv6LiteralAddress(String IP)
Solution 3
In Java, the Guava library has utility functions for validating IPv6 (and IPv4) in the com.google.common.net.InetAddresses class. http://goo.gl/RucRU
Solution 4
I wrote javascript-ipv6 for this very purpose. It currently powers v6decode.com.
Here's a short example of the API:
var address = new v6.Address("::ffff:7b2d:4359/64");
if (address.isValid()) {
// Do something if the address is valid
}
console.log(address.correctForm()); // "::ffff:7b2d:4359"
console.log(address.canonicalForm()); // "0000:0000:0000:0000:0000:ffff:7b2d:4359"
console.log(address.v4Form()); // "::ffff:123.45.67.89"
console.log(address.subnetMask); // "64"
console.log(address.possibleAddresses(96)); // "4,294,967,296"
Solution 5
I use a regular expression when os support may not be available - RE is available in most languages, including C/C++/Java/Python/Perl/bash/.... The following python code builds the RE at startup, the resulting RE source is a humdinger - but once compiled by the re engine is as fast as native code.
PAT_IP4 = r'\.'.join([r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])']*4)
RE_IP4 = re.compile(PAT_IP4+'$')
RE_IP6 = re.compile( '(?:%(hex4)s:){6}%(ls32)s$'
'|::(?:%(hex4)s:){5}%(ls32)s$'
'|(?:%(hex4)s)?::(?:%(hex4)s:){4}%(ls32)s$'
'|(?:(?:%(hex4)s:){0,1}%(hex4)s)?::(?:%(hex4)s:){3}%(ls32)s$'
'|(?:(?:%(hex4)s:){0,2}%(hex4)s)?::(?:%(hex4)s:){2}%(ls32)s$'
'|(?:(?:%(hex4)s:){0,3}%(hex4)s)?::%(hex4)s:%(ls32)s$'
'|(?:(?:%(hex4)s:){0,4}%(hex4)s)?::%(ls32)s$'
'|(?:(?:%(hex4)s:){0,5}%(hex4)s)?::%(hex4)s$'
'|(?:(?:%(hex4)s:){0,6}%(hex4)s)?::$'
% {
'ls32': r'(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|%s)'%PAT_IP4,
'hex4': r'[0-9a-f]{1,4}'
}, re.IGNORECASE)
Comments
-
Ramy Mohamed about 2 years
What libs have you used for that? How compatible are they with one another? Or did you write your own parsing routine?
I'm particularly interested in mutually-compatible implementations for Java, C++, Python, and JavaScript, which support:
- zero compression ("
::
") - IPv4-mapped addresses ("
::ffff:123.45.67.89
") - canonicalization (including to the short form, for human readability)
- CIDR-style netmasks (like "
/64
" at the end)
- zero compression ("
-
Omnifarious almost 15 yearsI don't think getaddrinfo supports parsing subnet bit-slicing notation. It also doesn't support canonicalization.
-
Ramy Mohamed almost 15 yearsThanks for the idea. Unfortunately that method does a DNS lookup before throwing the exception.
-
vpram86 almost 15 yearsOkie.. I looked into source code of InetAddress. Its using sun.net.util.IPAddressUtil class. Its a sun propreietary API. But if that's okie to you, you could use it. Use the static method isIPv6LiteralAddress(String IP). It returns true or false
-
Ramy Mohamed almost 15 yearsIPAddressUtil works great. I guess I can cope with canonicalization too, via textToNumericFormatV*(String) and some custom formatting back to text. +1 and I wish I could accept more than one answer.
-
vpram86 almost 15 yearsWow. Glad i could help! :) No prob at all!
-
Omnifarious almost 15 yearsI would like to comment that after playing with getaddrinfo myself, the OSX 10.4 and glibc-2.10.1 versions both have interesting bugs. The OSX version has much worse bugs.
-
Jason R. Coombs over 14 yearsWindows does support inet_pton and inet_ntop for Vista and later (see msdn.microsoft.com/en-us/library/cc805843%28VS.85%29.aspx). However, Python 2.6 and 3.0 does not (docs indicate Availability Unix).
-
Zan Lynx over 14 yearsWatch out for speed in Windows. For some reason the Windows inet_pton and inet_ntop are horribly horribly slow. Or they were in my experience on Vista and Server 2008.
-
Neil McGuigan over 10 yearsYou can use Guava's InetAddresses to avoid the lookup
-
Ron Maupin almost 9 yearsThe output for first and last usable addresses is incorrect. IPv6, unlike IPv4, can use all the addresses in a subnet. A standard IPv6 subnet is /64 (a few special cases use other mask lengths), and the usable addresses are from
<subnet>::
to<subnet>:ffff:ffff:ffff:ffff
. There is no reservation for the subnet (<subnet>::
), and IPv6 doesn't have the concept of broadcast, so no broadcast address exists at<subnet>:ffff:ffff:ffff:ffff
. -
LukeSkywalker almost 9 yearsThanks Ron for pointing this out.
-
Ron Maupin over 8 years@Omnifarious, you wrote, "But I would under no circumstances write a routine of your own to parse IPv6 addresses." I did exactly that that fairly easily using RegEx, and it accepts all legal forms of IPv6 addressing (compressed, uncompressed, hybrid compressed, hybrid uncompressed, and upper or lower case versions of each, with or without the
/<mask length>
). The reverse routine will return the RFC 5259 "official" canonical representation of IPv6 addresses, including hybrid for/96
addresses.