Parsing text in C

19,959

Solution 1

Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.

int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';

You could read the entire line into a buffer and then use:

char *pNum;
if (pNum = strrchr(buf, ' ')) {
  pNum++;
}

to get a pointer to the number field.

Solution 2

fscanf(file, "%s %d", word, &value);

This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.

Edit

Ooops, I forgot that you had spaces between the words. In that case, I'd do the following. (Note that it truncates the original text in 'line')

// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
    if (*p == ' ')
        lastSpace = p;
    p++;
}


if (lastSpace == null)
    return("parse error");

// Replace the last space in the line with a NUL
*lastSpace = '\0';

// Advance past the NUL to the first character of the number field
lastSpace++;

char *word = text;
int number = atoi(lastSpace);

You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.

Share:
19,959
Admin
Author by

Admin

Updated on June 11, 2022

Comments

  • Admin
    Admin almost 2 years

    I have a file like this:

    ...
    words 13
    more words 21
    even more words 4
    ...
    

    (General format is a string of non-digits, then a space, then any number of digits and a newline)

    and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.

  • p4bl0
    p4bl0 over 14 years
    That's what i was writing, thanks to Stack Overflow's orange ajaxy alert :-)
  • Rob Jones
    Rob Jones over 14 years
    Heh, I'm usually on the other side of the alert too.
  • Admin
    Admin over 14 years
    That works, but what about the alphabetical part? How do I copy it up to the last space?
  • E.M.
    E.M. over 14 years
    Just looking at the first character of the token isn't a very robust check. I wouldn't trust data from a file that much.
  • Amber
    Amber over 14 years
    Depends on the source of the file. If these are internal files generated by the application (or pre-existing files for which the format is strict and already known), then it's quite possible that a robust check isn't needed.
  • Rob Jones
    Rob Jones over 14 years
    The %s will only match up to the next whitespace character.
  • Jason Williams
    Jason Williams over 14 years
    Duh, I read the example, then read the format description below it and forgot that the format could have multiple spaces. (blush!)
  • Jonathan Leffler
    Jonathan Leffler over 14 years
    Generally, strtok() is not a particularly good way to go about things. Doubly not in a threaded program. Also, if the required storage is 'string possibly containing spaces' plus number, strtok is likely to break things up into too many parts.