Parsing a comma separated file using C using fscanf()

16,562

Solution 1

OP's

fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]", ...

does not consume the ',' nor the '\n' in the text file. Subsequent fscanf() attempts also fail and return a value of 0, which not being EOF, causes an infinite loop.


Although OP requested a fscanf() solution, a fgets()/sscanf() better handles potential IO and parsing errors.

FILE *input_fp;
FILE *output_fp;
char buf[100];
while (fgets(buf, sizeof buf, input_fp) != NULL) {
  char name[30];  // Insure this size is 1 more than the width in scanf format.
  char age_array[30];
  char occupation[30];
  #define VFMT " %29[^ ,\n\t]"
  int n;  // Use to check for trailing junk

  if (3 == sscanf(buf, VFMT "," VFMT "," VFMT " %n", name, age_array,
      occupation, &n) && buf[n] == '\0') {
    // Suspect OP really wants this width to be 1 more
    if (fprintf(output_fp, "%-30s%-30s%-30s\n", name, age_array, occupation) < 0)
      break;
  } else
    break;  // format error
}
fclose(input_fp);
fclose(output_fp);

Rather than call ferror(), check return values of fgets(), fprintf().

Suspect OP's undeclared field buffers were [30] and adjusted scanf() accordingly.


[edit]

Details about if (3 == sscanf(buf, VFMT "," ...

The if (3 == sscanf(...) && buf[n] == '\0') { becomes true when:
1) exactly the 3 "%29[^ ,\n\t]" format specifiers each scanf in at least 1 char each.
2) buf[n] is the end of the string. n is set via the "%n" specifier. The preceding ' ' in " %n" causes any following white-space after the last "%29[^ ,\n\t]" to be consumed. scanf() sees "%n", which directs it to set the current offset from the beginning of scanning to be assign to the int pointed to by &n.

"VFMT "," VFMT "," VFMT " %n" is concatenated by the compiler to
" %29[^ ,\n\t], %29[^ ,\n\t], %29[^ ,\n\t] %n".
I find the former easier to maintain than the latter.

The first space in " %29[^ ,\n\t]" directs sscanf() to scan over (consume and not save) 0 or more white-spaces (' ', '\t', '\n', etc.). The rest directs sscanf() to consume and save any 1 to 29 char except ',', '\n', '\t', then append a '\0'.

Solution 2

You're not skipping the actual commas and spaces between the values.

Once the first %30[^ ,\n\t] specifier has matched, the input probably contains a comma and a space, which aren't matched by the following thing in the format string.

Add comma and space to the formatting string where expected in the input:

while(fscanf(input_fp, "%30[^ ,\n\t], %30[^ ,\n\t], %30[^ ,\n\t]", name, age_array, occupation) == 3)
                                    ^             ^
                                    |             |
                                    \             /
                                   add these to make
                                   fscanf() skip them
                                      in the input!

Also, your check of fscanf()'s return value is sub-optimal: before relying on the values to have been converted, you should check that the return value equals the number of conversions.

Plus, your use of the backslash line-continuation character is completely pointless and should be removed.

Share:
16,562
yadav_vi
Author by

yadav_vi

FullStack developer with backend in Java and UI in React-Redux. Have worked as an Android developer developing both framework and apps. Github profile - https://github.com/yadavvi91

Updated on June 13, 2022

Comments

  • yadav_vi
    yadav_vi almost 2 years

    I have a file with data something like this -

    Name, Age, Occupation
    John, 14, Student
    George, 14, Student
    William, 23, Programmer
    

    Now, I want to read the data such that each value (e.g. Name, Age etc.) are read as a string.
    This is my code snippet -

    ....
    if (!(ferror(input_fp) || ferror(output_fp))) {
        while(fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]", 
                    name, age_array, occupation) != EOF){
            fprintf(stdout, "%-30s%-30s%-30s\n", name, age_array, occupation);
        }
        fclose(input_fp);
        fclose(output_fp);
    }
    ....
    

    However, this goes into an infinite loop giving some random output.
    This is how I understand my input conversion specifiers.
    %30[^ ,\n\t] -> read a string that is at the maximum 30 characters long and that
    DOES NOT include either a space, a comma, a newline or a tab character.
    And I am reading 3 such strings.
    Where am I going wrong?

  • yadav_vi
    yadav_vi about 10 years
    Can you explain how exactly?
  • yadav_vi
    yadav_vi about 10 years
    I get the part that the first string may contains a ',', but how do I skip the 'non-essential character'? Could you explain the fscanf()'s return value part. In many examples that I have gone through, this is how the ending of file has been checked. (by 'non essential character' I mean any whitespace or comma)
  • BLUEPIXY
    BLUEPIXY about 10 years
    Fail at the input of the second line after the input of the first line has been completed.
  • chux - Reinstate Monica
    chux - Reinstate Monica about 10 years
    Needs to consume the final '\n' somewhere. Maybe space before first 30 " %30[^ ,\n\t]... or ...\n\t]%*c" at the end.
  • yadav_vi
    yadav_vi about 10 years
    Can you explain this statement - (3 == sscanf(buf, VFMT "," VFMT "," VFMT " %n", name, age_array, occupation, &n) && buf[n] == '\0'). Also, why is there a space inside " %29[^ ,\n\t]". I am not able to get the last column too.
  • chux - Reinstate Monica
    chux - Reinstate Monica about 10 years
    @Vishal Yadav See edit. That space consumes the spaces between "," and "Age". The "the last column" problem was likely a missing s in "%-30s%-30s%-30\n", which is now corrected.