Parsing text in C
Solution 1
Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.
int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';
You could read the entire line into a buffer and then use:
char *pNum;
if (pNum = strrchr(buf, ' ')) {
pNum++;
}
to get a pointer to the number field.
Solution 2
fscanf(file, "%s %d", word, &value);
This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.
Edit
Ooops, I forgot that you had spaces between the words. In that case, I'd do the following. (Note that it truncates the original text in 'line')
// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
if (*p == ' ')
lastSpace = p;
p++;
}
if (lastSpace == null)
return("parse error");
// Replace the last space in the line with a NUL
*lastSpace = '\0';
// Advance past the NUL to the first character of the number field
lastSpace++;
char *word = text;
int number = atoi(lastSpace);
You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.
Admin
Updated on June 11, 2022Comments
-
Admin almost 2 years
I have a file like this:
... words 13 more words 21 even more words 4 ...
(General format is a string of non-digits, then a space, then any number of digits and a newline)
and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.
-
p4bl0 over 14 yearsThat's what i was writing, thanks to Stack Overflow's orange ajaxy alert :-)
-
Rob Jones over 14 yearsHeh, I'm usually on the other side of the alert too.
-
Admin over 14 yearsThat works, but what about the alphabetical part? How do I copy it up to the last space?
-
E.M. over 14 yearsJust looking at the first character of the token isn't a very robust check. I wouldn't trust data from a file that much.
-
Amber over 14 yearsDepends on the source of the file. If these are internal files generated by the application (or pre-existing files for which the format is strict and already known), then it's quite possible that a robust check isn't needed.
-
Rob Jones over 14 yearsThe %s will only match up to the next whitespace character.
-
Jason Williams over 14 yearsDuh, I read the example, then read the format description below it and forgot that the format could have multiple spaces. (blush!)
-
Jonathan Leffler over 14 yearsGenerally, strtok() is not a particularly good way to go about things. Doubly not in a threaded program. Also, if the required storage is 'string possibly containing spaces' plus number, strtok is likely to break things up into too many parts.