Parse HTTP Request Line In C

12,553

Solution 1

A more elegant solution.

#include <stdio.h>
#include <string.h>

int parse(const char* line)
{
    /* Find out where everything is */
    const char *start_of_path = strchr(line, ' ') + 1;
    const char *start_of_query = strchr(start_of_path, '?');
    const char *end_of_query = strchr(start_of_query, ' ');

    /* Get the right amount of memory */
    char path[start_of_query - start_of_path];
    char query[end_of_query - start_of_query];

    /* Copy the strings into our memory */
    strncpy(path, start_of_path,  start_of_query - start_of_path);
    strncpy(query, start_of_query, end_of_query - start_of_query);

    /* Null terminators (because strncpy does not provide them) */
    path[sizeof(path)] = 0;
    query[sizeof(query)] = 0;

    /*Print */
    printf("%s\n", query, sizeof(query));
    printf("%s\n", path, sizeof(path));
}

int main(void)
{
    parse("GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1");
    return 0;
}

Solution 2

I wrote some functions in C a while back that manually parse c-strings up to a delimiter, similar to getline in C++.

// Trims all leading whitespace along with consecutive whitespace from provided cstring into destination char*. WARNING: ensure size <= sizeof(destination)
void Trim(char* destination, char* source, int size)
{
    bool trim = true;
    int index = 0;
    int i;
    for (i = 0; i < size; ++i)
    {
        if (source[i] == '\n' || source[i] == '\0')
        {
            destination[index++] = '\0';
            break;
        }
        else if (source[i] != ' ' && source[i] != '\t')
        {
            destination[index++] = source[i];
            trim = false;
        }
        else if (trim)
            continue;
        else
        {
            if (index > 0 && destination[index - 1] != ' ')
                destination[index++] = ' ';
        }
    }
}

// Parses text up to the provided delimiter (or newline) into the destination char*. WARNING: ensure size <= sizeof(destination)
void ParseUpToSymbol(char* destination, char* source, int size, char delimiter)
{
    int index = 0;
    int i;
    for (i = 0; i < size; ++i)
    {
        if (source[i] != delimiter && source[i] != '\n' && source[i] != '\0'  && source[i] != ' '))
        {
            destination[index++] = source[i];
        }
        else
        {
            destination[i] = '\0';
            break;
        }
    }

    Trim(destination, destination, size);
}

Then you could parse your c-string with something along these lines:

char* buffer = (char*)malloc(64);
char* temp = (char*)malloc(256);
strcpy(temp, "GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1");
Trim(temp, temp, 256);
ParseUpToSymbol(buffer, cstr, 64, '?');
temp = temp + strlen(buffer) + 1;
Trim(temp, temp, 256);

The code above trims any leading and trailing whitespace from the target string, in this case "GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1", and then stores the parsed value into the variable buffer. Running this the first time should put the word "GET" inside of buffer. When you do the "temp = temp + strlen(buffer) + 1" you are readjusting the temp char-pointer so you can call ParseUpToSymbol again with the remaining part of the string. If you were to call it again, you should get the absolute path leading up to the first question mark. You could repeat this to get each individual query string or change the delimiter to a space and get the entire query string portion of the URL. I think you get the idea. This is just one of many solutions of course.

Share:
12,553

Related videos on Youtube

Ryan
Author by

Ryan

Developing developer leveraging quantitative background in finance (i.e., valuation) into software engineering career. Languages include Python, JavaScript, SQL, C, &amp; the usual web-dev suspects (e.g., HTML, CSS, Jinja, JQuery). A little in love with data analytics, statistics, databases, back-end development, visualisation, and web scraping.

Updated on June 04, 2022

Comments

  • Ryan
    Ryan almost 2 years

    This is the problem that will never end. The task is to parse a request line in a web server -- of indeterminate length -- in C. I pulled the following off of the web as an example with which to work.

    GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.1
    

    I must extract the absolute path: /path/script.cgi and the query: ?field1=value1&field2=value2. I'm told the following functions hold the key: strchr, strcpy, strncmp, strncpy, and/or strstr.

    Here's what has happened so far: I've learned that using functions like strchr and strstr will absolutely allow me to truncate the request line at certain points, but will never allow me to get rid of portions of the request line I do not want, and it doesn't matter how I layer them.

    For example, here's some code that get's me close to isolating the query, but I can't eliminate the http version.

    bool parse(const char* line)
    {
        // request line w/o method
        const char ch = '/';
        char* lineptr = strchr(line, ch);
    
        // request line w/ query and HTTP version
        char ch_1 = '?';
        char* lineptr_1 = strchr(lineptr, ch_1);
    
        // request line w/o query
        char ch_2 = ' ';
        char* lineptr_2 = strchr(lineptr_1, ch_2);
    
        printf("%s\n", lineptr_2);
    
        if (lineptr_2 != NULL)
            return true;
        else
            return false;
    }
    

    Needless to say, I have a similar issue trying to isolate the absolute path (I can ditch the method, but not the ? or anything thereafter), and I see no occasion on which I can use the functions that require me to know a priori how many chars I'd like to copy from one location (usually an array) to another because, when this is run in real time, I will have no clue what the request line will look like in advance. If someone sees something that I am missing and could point me in the right direction, I would be most grateful!

    • h0r53
      h0r53 over 7 years
      That's the beauty of programming, especially in low-level languages such as C. If something you need doesn't exist, you can create it!
  • Ryan
    Ryan over 7 years
    Interesting. There's a lot to unpack here. It looks like you're performing pointer subtraction. I wasn't aware you could do that in C. Should I assume the char array lengths of 60 are arbitrary? Or is there a convention I am unaware of?
  • Dellowar
    Dellowar over 7 years
    I should have added more commentary, my bad. the 60 was arbitrary. You can actually get the right amount of memory by also using pointer subtraction... let me make some edits.
  • h0r53
    h0r53 over 7 years
    To add to this conversation, you can certainly perform pointer arithmetic in C. Keeping in mind that each pointer is essentially a memory address and pointers to arrays reference sequential data, you can add to the memory address an offset equal to the size of the type referenced by the pointer. That being said, be careful not to perform pointer arithmetic on pointers of different types because they may not be aligned appropriately. The compiler can translate decimal values such as "1,2,3" into the appropriate pointer offset, however, so you need not worry about things such as "line+4".
  • h0r53
    h0r53 over 7 years
    I think I should also note that it isn't guaranteed that the start_of_path will be line + 4 (what if someone is performing an HTTP POST?).
  • Ryan
    Ryan over 7 years
    Good point. For my purposes, it will always be GET. Some of the code not show here checks for things like that and rejects all POST requests. In real life, however, it would have to handle both cases.
  • Dellowar
    Dellowar over 7 years
    @RyanD. just added POST compatibility.
  • h0r53
    h0r53 over 7 years
    @SanchkeDellowar a nice solution you have provided. I like how you referenced a known library with built in capacity to perform the task while my answer shows how someone could write the functions to do it themselves. Nice variety.
  • Ryan
    Ryan over 7 years
    This is really good stuff and both you and @CaitLANJenner gave me quite a bit to think about. Nice change from banging my head against a wall for the last several days and getting nowhere. Thanks so much!!