Reading a large file using C (greater than 4GB) using read function, causing problems

22,721

Solution 1

In the first place, why do you need lseek() in your cycle? read() will advance the cursor in the file by the number of bytes read.

And, to the topic: long, and, respectively, chunk, have a maximum value of 2147483647, any number greater than that will actually become negative.

You want to use off_t to declare chunk: off_t chunk, and size as size_t. That's the main reason why lseek() fails.

And, then again, as other people have noticed, you do not want to free() your buffer inside the cycle.

Note also that you will overwrite the data you have already read. Additionally, read() will not necessarily read as much as you have asked it to, so it is better to advance chunk by the amount of the bytes actually read, rather than amount of bytes you want to read.

Taking everything in regards, the correct code should probably look something like this:

// Edited: note comments after the code
#ifndef O_LARGEFILE
#define O_LARGEFILE 0
#endif

int read_from_file_open(char *filename,size_t size)
{
int fd;
long *buffer=(long*) malloc(size * sizeof(long));
fd = open(filename, O_RDONLY|O_LARGEFILE);
   if (fd == -1)
    {
       printf("\nFile Open Unsuccessful\n");
       exit (0);;
    }
off_t chunk=0;
lseek(fd,0,SEEK_SET);
printf("\nCurrent Position%d\n",lseek(fd,size,SEEK_SET));
while ( chunk < size )
  {
   printf ("the size of chunk read is  %d\n",chunk);
   size_t readnow;
   readnow=read(fd,((char *)buffer)+chunk,1048576);
   if (readnow < 0 )
     {
        printf("\nRead Unsuccessful\n");
        free (buffer);
        close (fd);
        return 0;
     }

   chunk=chunk+readnow;
  }

printf("\nRead Successful\n");

free(buffer);
close(fd);
return 1;

}

I also took the liberty of removing result variable and all related logic since, I believe, it can be simplified.

Edit: I have noted that some systems (most notably, BSD) do not have O_LARGEFILE, since it is not needed there. So, I have added an #ifdef in the beginning, which would make the code more portable.

Solution 2

The lseek function may have difficulty in supporting big file sizes. Try using lseek64

Please check the link to see the associated macros which needs to be defined when you use lseek64 function.

Share:
22,721
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I have to write C code for reading large files. The code is below:

    int read_from_file_open(char *filename,long size)
    {
        long read1=0;
        int result=1;
        int fd;
        int check=0;
        long *buffer=(long*) malloc(size * sizeof(int));
        fd = open(filename, O_RDONLY|O_LARGEFILE);
        if (fd == -1)
        {
           printf("\nFile Open Unsuccessful\n");
           exit (0);;
        }
        long chunk=0;
        lseek(fd,0,SEEK_SET);
        printf("\nCurrent Position%d\n",lseek(fd,size,SEEK_SET));
        while ( chunk < size )
        {
            printf ("the size of chunk read is  %d\n",chunk);
            if ( read(fd,buffer,1048576) == -1 )
            {
                result=0;
            }
            if (result == 0)
            {
                printf("\nRead Unsuccessful\n");
                close(fd);
                return(result);
            }
    
            chunk=chunk+1048576;
            lseek(fd,chunk,SEEK_SET);
            free(buffer);
        }
    
        printf("\nRead Successful\n");
    
        close(fd);
        return(result);
    }
    

    The issue I am facing here is that as long as the argument passed (size parameter) is less than 264000000 bytes, it seems to be able to read. I am getting the increasing sizes of the chunk variable with each cycle.

    When I pass 264000000 bytes or more, the read fails, i.e.: according to the check used read returns -1.

    Can anyone point me to why this is happening? I am compiling using cc in normal mode, not using DD64.