Using fread() to read text file into a buffer - why are the values in the buffer not each character's respective ASCII value?

27,905

Solution 1

The behaviour is not surprising:

  • You have a file containing 11 characters. sizeof(char) is 1.
  • Now you allocate an array of int with 11 int. sizeof(int) is very likely to be 4 on your machine
  • You instruct fread to read up to 11 ints (up to 44 bytes). So the first 4 characters will be read as an int and stored in array[0] and the next 4 in array[1].
    • If you had checked the return of fread it would tell you that it actually only read 2 elements (as the content is 11 bytes it can only read 2 ints and the last 3 remaining bytes cannot be successfully read as an int).
  • Now you loop over the array and print the number which is the int you get build up by the first 4 characters.
  • In your alternative solution you pretent to point to a sequence of chars so the array index will only increment in 1 byte offsets

The memory layout basically looks like this:

array[0]
|       array[1]
|       |
1 2 3 4 5 6 7 8 9 10 11
| |
| ((char *)array)[1]
((char *)array)[0]

Solution 2

Your ftell returns the current value of the position indicator of the stream.

And it returns number of byte the file has. And you are reading file as the sequence of int 4-byte and ofcourse the later element will be 0. For more detail, you are reading 4 x size bytes from a file with size bytes.

Your array should be type of char.

Something like

char* array = malloc(sizeOfFile * sizeof(char));
if(array == NULL) {
  ...
}

fread(array, sizeOf(char), sizeOfFile, filePointer);
// ..

Just the idea, not the code. Hope this help;

Share:
27,905

Related videos on Youtube

user2809475
Author by

user2809475

Updated on September 26, 2020

Comments

  • user2809475
    user2809475 over 3 years

    First off, this isn't homework. Just trying to understand why I'm seeing what I'm seeing on my screen.

    The stuff below (my own work) currently takes an input file and reads it as a binary file. I want it to store each byte read in an array (for later use). For the sake of brevity the input file (Hello.txt) just contains 'Hello World', without the apostrophes.

    int main(int argc, char *argv[]) {
    
        FILE *input;
        int i, size;
        int *array;
    
        input = fopen("Hello.txt", "rb");
        if (input == NULL) {
            perror("Invalid file specified.");
            exit(-1);
        }
    
        fseek(input, 0, SEEK_END);
        size = ftell(input);
        fseek(input, 0, SEEK_SET);
    
        array = (int*) malloc(size * sizeof(int));
        if (array == NULL) {
            perror("Could not allocate array.");
            exit(-1);
        }
        else {
            input = fopen("Hello.txt", "rb");
            fread(array, sizeof(int), size, input);
            // some check on return value of fread?
            fclose(input);
        }
    
        for (i = 0; i < size; i++) {
            printf("array[%d] == %d\n", i, array[i]);
        }
    

    Why is it that having the print statement in the for loop as it is above causes the output to look like this

    array[0] == 1819043144
    array[1] == 1867980911
    array[2] == 6581362
    array[3] == 0
    array[4] == 0
    array[5] == 0
    array[6] == 0
    array[7] == 0
    array[8] == 0
    array[9] == 0
    array[10] == 0
    

    while having it like this

    printf("array[%d] == %d\n", i, ((char *)array)[i]);
    

    makes the output look like this (decimal ASCII value for each character)

    array[0] == 72
    array[1] == 101
    array[2] == 108
    array[3] == 108
    array[4] == 111
    array[5] == 32
    array[6] == 87
    array[7] == 111
    array[8] == 114
    array[9] == 108
    array[10] == 100
    

    ? If I'm reading it as a binary file and want to read byte by byte, why don't I get the right ASCII value using the first print statement?

    On a related note, what happens if the input file I send in isn't a text document (e.g., jpeg)?

    Sorry is this is an entirely trivial matter, but I can't seem to figure out why.

    • user694733
      user694733
      Why are you opening the input file twice?
  • user2809475
    user2809475 over 10 years
    I guess something else isn't clicking... are you saying that, if the input size is 10 bytes, I'm reading 40 bytes from it? As for the int array... isn't what I'm reading in just a bunch of ints? I thought that was why I was reading it as a binary file. What would happen if I sent in a non-text file and tried to put it in a char array?
  • simpletron
    simpletron over 10 years
    Yes. I guess so. I have updated my answer, you should take a look at my idea.
  • user2809475
    user2809475 over 10 years
    Just edited my last response. I see and understand what you're doing and thought about that approach the first time around, but I'm considering the scenario where a file sent in won't necessarily be comprised of just letters (and/or numbers).
  • simpletron
    simpletron over 10 years
    If you think so, it must be more complicated. Your binary stream is just a sequence of bytes. If you want a int it will get 4 next bytes and add to specified memory, if you want a char it get next 1 byte .... You can think about using a text file instead of binary file. In such situation, number will be the sequence of chars.
  • Rachael Dawn
    Rachael Dawn about 7 years
    "If you had checked the return of fread" Incredibly useful. I was initializing an fopen, where ftell was returning a value larger than what there was there (in terms of size), which meant that it was reading null characters with a printf. Setting the ReadBuffer[/*output of fread*/] = 0; was the trick for me. Commenting for those who have a similar problem.

Related