How can I move a file pointer to the next line in a file?

29,882

How can I move a file pointer to the next line in a file?

Files are a collection of bytes, where the meaning of the bytes depend on the file format.

"Plain text" is a group of many different file formats; with different ways to encode characters (EBCDIC, ASCII, many variations of "extended ASCII", UTF-8, UCS-2, UTF-16, ...) and different ways to represent "end of line" ("\n", "\r\n\", "\r").

The first step is to decide if your software will assume one specific flavor of "plain text" file format (and be broken for everything else - e.g. when someone transfers a file from a different operating system), or support multiple file formats with explicit control (with a command line argument/s so the user can tell it which file format) and/or if it will try to auto-detect (e.g. assume UTF-8, which will work for ASCII too, and then auto-detect what "end of line" is, possibly by accepting either "\r" or "\n" and then checking to see if '\n" follows "\r" or if "\r" follows "\n").

The next step is to convert characters from whatever the file format happens to use into some kind of "standard for you" character set (which might or might not be whatever character set the compiler happens to use) while discarding junk (e.g. things like Unicode "byte order marks") and dealing with the possibility of malformed/corrupt data (e.g. a sequence of bytes that is illegal for UTF-8, a byte that is illegal for ASCII, ...) and dealing with unwanted valid characters (NULL, BELL, DELETE, ...).

Immediately after "character set validation, conversion and filtering", you can do "end of line detection" (maybe using a state machine to track "previous character was '\r'" and "previous character was '\n'"; and maybe counting white-space characters and deleting/removing all trailing white-space at the end of the line); and can store the character in the array for later (if it wasn't discarded or "end of line") or call a "process this line" function (if it was an "end of line"). Also don't forget "end of file" - you may reach the end of the file while you're still in the middle of a line (and can handle that by pretending the last line in the file ended with an "end of line" when it didn't).

Note that fscanf(inptr, "%c", &in); is extremely bad (you might spend most of your CPU time in this function parsing the format string "%c") and you can use fgetc() as a "less awful" alternative; and all of these functions (fscanf(), fgetc(), fgets(), ...) are mostly unusable anyway (unless you're making unknown compiler specific assumptions about which file format "plain text" actually is and then being broken and wrong for everything else), and most of those functions are slow. Instead; you want to consider using read() (so that you can process a whole buffer full of bytes and avoid the overhead of C library functions and/or kernel API calls for every single byte), or maybe mmap().

Finally; you need to make sure that a malicious attacker can't (intentionally) provide a file that has too many characters in a single line. A safety check (e.g. if(i >= MAX) { // Array is full, can't add the next character to the array) is necessary; and can be followed by outputting an error message ("Line too long at line number ...") or by using a dynamically resized array (e.g. use the realloc() function to increase the size of the array).

Share:
29,882
jkm
Author by

jkm

Updated on September 29, 2020

Comments

  • jkm
    jkm over 3 years

    I am attempting to write a function that reads through a line, saves each character to an array, manipulates that character array, prints the results to another file, then moves on to the next line in the file.

    Some sample input/output would be as follows (the purpose of the program is to find the derivative - but that part of the code is working fine.):

    INPUT:
    x
    4x^4
    91
    sinx
    
    OUTPUT:
    1
    16x^3
    0
    cosx
    

    The function that I have written so far:

    int main(){
    
        FILE *inptr = fopen("functions.txt", "r");
        FILE *outptr = fopen( "derive.txt", "w");
    
        if(inptr)
            derive(inptr,outptr);
    
        return 0;
    }
    
    void derive(FILE *inptr, FILE *outptr){
        int i;
        char in = '0';
        char array[MAX];
    
        while((in = fgetc(inptr)) != EOF){
            for(i = 0; in != '\n'; i++){
                fscanf(inptr, "%c", &in);
                array[i] = in;
            }
            manipulate(array, outptr); // Function that finds the derivative and prints to output file
        }
    }
    

    My question is: How can I move the file pointer inptr to the next line?

  • jpenna
    jpenna about 5 years
    Welcome to Stack Overflow! You should explain better your answer. Even though it can answer the question, you should give more information so the community can benefit more from it How to Answer.