Extract Server Name Indication (SNI) from TLS client hello

21,266

Solution 1

I did this in sniproxy, examining a TLS client hello packet in Wireshark while reading that RFC is a pretty good way to go. It's not too hard, just lots of variable length fields you have to skip past and check checking if you have the correct element type.

I'm working on my tests right now, and have this annotated sample packet that might help:

const unsigned char good_data_2[] = {
    // TLS record
    0x16, // Content Type: Handshake
    0x03, 0x01, // Version: TLS 1.0
    0x00, 0x6c, // Length (use for bounds checking)
        // Handshake
        0x01, // Handshake Type: Client Hello
        0x00, 0x00, 0x68, // Length (use for bounds checking)
        0x03, 0x03, // Version: TLS 1.2
        // Random (32 bytes fixed length)
        0xb6, 0xb2, 0x6a, 0xfb, 0x55, 0x5e, 0x03, 0xd5,
        0x65, 0xa3, 0x6a, 0xf0, 0x5e, 0xa5, 0x43, 0x02,
        0x93, 0xb9, 0x59, 0xa7, 0x54, 0xc3, 0xdd, 0x78,
        0x57, 0x58, 0x34, 0xc5, 0x82, 0xfd, 0x53, 0xd1,
        0x00, // Session ID Length (skip past this much)
        0x00, 0x04, // Cipher Suites Length (skip past this much)
            0x00, 0x01, // NULL-MD5
            0x00, 0xff, // RENEGOTIATION INFO SCSV
        0x01, // Compression Methods Length (skip past this much)
            0x00, // NULL
        0x00, 0x3b, // Extensions Length (use for bounds checking)
            // Extension
            0x00, 0x00, // Extension Type: Server Name (check extension type)
            0x00, 0x0e, // Length (use for bounds checking)
            0x00, 0x0c, // Server Name Indication Length
                0x00, // Server Name Type: host_name (check server name type)
                0x00, 0x09, // Length (length of your data)
                // "localhost" (data your after)
                0x6c, 0x6f, 0x63, 0x61, 0x6c, 0x68, 0x6f, 0x73, 0x74,
            // Extension
            0x00, 0x0d, // Extension Type: Signature Algorithms (check extension type)
            0x00, 0x20, // Length (skip past since this is the wrong extension)
            // Data
            0x00, 0x1e, 0x06, 0x01, 0x06, 0x02, 0x06, 0x03,
            0x05, 0x01, 0x05, 0x02, 0x05, 0x03, 0x04, 0x01,
            0x04, 0x02, 0x04, 0x03, 0x03, 0x01, 0x03, 0x02,
            0x03, 0x03, 0x02, 0x01, 0x02, 0x02, 0x02, 0x03,
            // Extension
            0x00, 0x0f, // Extension Type: Heart Beat (check extension type)
            0x00, 0x01, // Length (skip past since this is the wrong extension)
            0x01 // Mode: Peer allows to send requests
};

Solution 2

For anyone interested, this is a tentative version of the C/C++ code. It has worked so far. The function returns the position of the server name in a the byte array containing the Client Hello and the length of the name in the len parameter.

char *get_TLS_SNI(unsigned char *bytes, int* len)
{
    unsigned char *curr;
    unsigned char sidlen = bytes[43];
    curr = bytes + 1 + 43 + sidlen;
    unsigned short cslen = ntohs(*(unsigned short*)curr);
    curr += 2 + cslen;
    unsigned char cmplen = *curr;
    curr += 1 + cmplen;
    unsigned char *maxchar = curr + 2 + ntohs(*(unsigned short*)curr);
    curr += 2;
    unsigned short ext_type = 1;
    unsigned short ext_len;
    while(curr < maxchar && ext_type != 0)
    {
        ext_type = ntohs(*(unsigned short*)curr);
        curr += 2;
        ext_len = ntohs(*(unsigned short*)curr);
        curr += 2;
        if(ext_type == 0)
        {
            curr += 3;
            unsigned short namelen = ntohs(*(unsigned short*)curr);
            curr += 2;
            *len = namelen;
            return (char*)curr;
        }
        else curr += ext_len;
    }
    if (curr != maxchar) throw std::exception("incomplete SSL Client Hello");
    return NULL; //SNI was not present
}

Solution 3

I noticed that the domain is always prepend by two zero bytes and one length byte. Maybe it's unsigned 24 bit integer, but I can't test it, as my DNS server won't allow domain names beyond 77 characters.

Upon that knowledge I came up with this (Node.js) code.

function getSNI(buf) {
  var sni = null
    , regex = /^(?:[a-z0-9-]+\.)+[a-z]+$/i;
  for(var b = 0, prev, start, end, str; b < buf.length; b++) {
    if(prev === 0 && buf[b] === 0) {
      start = b + 2;
      end   = start + buf[b + 1];
      if(start < end && end < buf.length) {
        str = buf.toString("utf8", start, end);
        if(regex.test(str)) {
          sni = str;
          continue;
        }
      }
    }
    prev = buf[b];
  }
  return sni;
}

This code looks for a sequence of two zero bytes. If it finds one, it assumes the following byte is a length parameter. It checks if the length is still in the boundary of the buffer and if so reads the byte sequence as UTF-8. Later on, one could RegEx the array and extract the domain.

Works amazingly well! Still, I noticed something odd.

'�\n�\u0014\u0000�\u0000�\u00009\u00008�\u000f�\u0005\u0000�\u00005�\u0007�\t�\u0011�\u0013\u0000E\u0000D\u0000f\u00003\u00002�\f�\u000e�\u0002�\u0004\u0000�\u0000A\u0000\u0005\u0000\u0004\u0000/�\b�\u0012\u0000\u0016\u0000\u0013�\r�\u0003��\u0000\n'
'\u0000\u0015\u0000\u0000\u0012test.cubixcraft.de'
'test.cubixcraft.de'
'\u0000\b\u0000\u0006\u0000\u0017\u0000\u0018\u0000\u0019'
'\u0000\u0005\u0001\u0000\u0000'

Always, no matter what subdomain I choose, the domain is targeted twice. It seems like the SNI field is nested inside another field.

I am open to suggestions and improvements! :)

I turned this into a Node module, for everyone, who cares: sni.

Share:
21,266
buschtoens
Author by

buschtoens

Ember.js expert &amp; Frontend Software Architect at Clark Germany.

Updated on January 17, 2020

Comments

  • buschtoens
    buschtoens over 4 years

    How would you extract the Server Name Indication from a TLS Client Hello message. I'm curently struggling to understand this very cryptic RFC 3546 on TLS Extensions, in which the SNI is defined.

    Things I've understood so far:

    • The host is utf8 encoded and readable when you utf8 enocde the buffer.
    • Theres one byte before the host, that determines it's length.

    If I could find out the exact position of that length byte, extracting the SNI would be pretty simple. But how do I get to that byte in the first place?