Parsing a comma separated file using C using fscanf()
Solution 1
OP's
fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]", ...
does not consume the ','
nor the '\n'
in the text file. Subsequent fscanf()
attempts also fail and return a value of 0, which not being EOF
, causes an infinite loop.
Although OP requested a fscanf()
solution, a fgets()/sscanf()
better handles potential IO and parsing errors.
FILE *input_fp;
FILE *output_fp;
char buf[100];
while (fgets(buf, sizeof buf, input_fp) != NULL) {
char name[30]; // Insure this size is 1 more than the width in scanf format.
char age_array[30];
char occupation[30];
#define VFMT " %29[^ ,\n\t]"
int n; // Use to check for trailing junk
if (3 == sscanf(buf, VFMT "," VFMT "," VFMT " %n", name, age_array,
occupation, &n) && buf[n] == '\0') {
// Suspect OP really wants this width to be 1 more
if (fprintf(output_fp, "%-30s%-30s%-30s\n", name, age_array, occupation) < 0)
break;
} else
break; // format error
}
fclose(input_fp);
fclose(output_fp);
Rather than call ferror()
, check return values of fgets()
, fprintf()
.
Suspect OP's undeclared field buffers were [30]
and adjusted scanf()
accordingly.
[edit]
Details about if (3 == sscanf(buf, VFMT "," ...
The if (3 == sscanf(...) && buf[n] == '\0') {
becomes true when:
1) exactly the 3 "%29[^ ,\n\t]"
format specifiers each scanf in at least 1 char
each.
2) buf[n]
is the end of the string. n
is set via the "%n"
specifier. The preceding ' '
in " %n"
causes any following white-space after the last "%29[^ ,\n\t]"
to be consumed. scanf()
sees "%n"
, which directs it to set the current offset from the beginning of scanning to be assign to the int
pointed to by &n
.
"VFMT "," VFMT "," VFMT " %n"
is concatenated by the compiler to
" %29[^ ,\n\t], %29[^ ,\n\t], %29[^ ,\n\t] %n"
.
I find the former easier to maintain than the latter.
The first space in " %29[^ ,\n\t]"
directs sscanf()
to scan over (consume and not save) 0 or more white-spaces (' '
, '\t'
, '\n'
, etc.). The rest directs sscanf()
to consume and save any 1 to 29 char
except ','
, '\n'
, '\t'
, then append a '\0'
.
Solution 2
You're not skipping the actual commas and spaces between the values.
Once the first %30[^ ,\n\t]
specifier has matched, the input probably contains a comma and a space, which aren't matched by the following thing in the format string.
Add comma and space to the formatting string where expected in the input:
while(fscanf(input_fp, "%30[^ ,\n\t], %30[^ ,\n\t], %30[^ ,\n\t]", name, age_array, occupation) == 3)
^ ^
| |
\ /
add these to make
fscanf() skip them
in the input!
Also, your check of fscanf()
's return value is sub-optimal: before relying on the values to have been converted, you should check that the return value equals the number of conversions.
Plus, your use of the backslash line-continuation character is completely pointless and should be removed.
yadav_vi
FullStack developer with backend in Java and UI in React-Redux. Have worked as an Android developer developing both framework and apps. Github profile - https://github.com/yadavvi91
Updated on June 13, 2022Comments
-
yadav_vi almost 2 years
I have a file with data something like this -
Name, Age, Occupation John, 14, Student George, 14, Student William, 23, Programmer
Now, I want to read the data such that each value (e.g. Name, Age etc.) are read as a string.
This is my code snippet -.... if (!(ferror(input_fp) || ferror(output_fp))) { while(fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]", name, age_array, occupation) != EOF){ fprintf(stdout, "%-30s%-30s%-30s\n", name, age_array, occupation); } fclose(input_fp); fclose(output_fp); } ....
However, this goes into an infinite loop giving some random output.
This is how I understand myinput conversion specifiers
.
%30[^ ,\n\t]
-> read a string that is at the maximum 30 characters long and that
DOES NOT include either a space, a comma, a newline or a tab character.
And I am reading 3 such strings.
Where am I going wrong? -
yadav_vi about 10 yearsCan you explain how exactly?
-
yadav_vi about 10 yearsI get the part that the first string may contains a ',', but how do I skip the 'non-essential character'? Could you explain the fscanf()'s return value part. In many examples that I have gone through, this is how the ending of file has been checked. (by 'non essential character' I mean any whitespace or comma)
-
BLUEPIXY about 10 yearsFail at the input of the second line after the input of the first line has been completed.
-
chux - Reinstate Monica about 10 yearsNeeds to consume the final
'\n'
somewhere. Maybe space before first 30" %30[^ ,\n\t]...
or...\n\t]%*c"
at the end. -
yadav_vi about 10 yearsCan you explain this statement -
(3 == sscanf(buf, VFMT "," VFMT "," VFMT " %n", name, age_array, occupation, &n) && buf[n] == '\0')
. Also, why is there a space inside" %29[^ ,\n\t]"
. I am not able to get the last column too. -
chux - Reinstate Monica about 10 years@Vishal Yadav See edit. That space consumes the spaces between
","
and"Age"
. The "the last column" problem was likely a missings
in"%-30s%-30s%-30\n"
, which is now corrected.