How to read the content of a file to a string in C?

185,410

Solution 1

I tend to just load the entire buffer as a raw memory chunk into memory and do the parsing on my own. That way I have best control over what the standard lib does on multiple platforms.

This is a stub I use for this. you may also want to check the error-codes for fseek, ftell and fread. (omitted for clarity).

char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");

if (f)
{
  fseek (f, 0, SEEK_END);
  length = ftell (f);
  fseek (f, 0, SEEK_SET);
  buffer = malloc (length);
  if (buffer)
  {
    fread (buffer, 1, length, f);
  }
  fclose (f);
}

if (buffer)
{
  // start to process your data / extract strings here...
}

Solution 2

Another, unfortunately highly OS-dependent, solution is memory mapping the file. The benefits generally include performance of the read, and reduced memory use as the applications view and operating systems file cache can actually share the physical memory.

POSIX code would look like this:

int fd = open("filename", O_RDONLY);
int len = lseek(fd, 0, SEEK_END);
void *data = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);

Windows on the other hand is little more tricky, and unfortunately I don't have a compiler in front of me to test, but the functionality is provided by CreateFileMapping() and MapViewOfFile().

Solution 3

If "read its contents into a string" means that the file does not contain characters with code 0, you can also use getdelim() function, that either accepts a block of memory and reallocates it if necessary, or just allocates the entire buffer for you, and reads the file into it until it encounters a specified delimiter or end of file. Just pass '\0' as the delimiter to read the entire file.

This function is available in the GNU C Library, http://www.gnu.org/software/libc/manual/html_mono/libc.html#index-getdelim-994

The sample code might look as simple as

char* buffer = NULL;
size_t len;
ssize_t bytes_read = getdelim( &buffer, &len, '\0', fp);
if ( bytes_read != -1) {
  /* Success, now the entire file is in the buffer */

Solution 4

If you are reading special files like stdin or a pipe, you are not going to be able to use fstat to get the file size beforehand. Also, if you are reading a binary file fgets is going to lose the string size information because of embedded '\0' characters. Best way to read a file then is to use read and realloc:

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

int main () {
    char buf[4096];
    ssize_t n;
    char *str = NULL;
    size_t len = 0;
    while (n = read(STDIN_FILENO, buf, sizeof buf)) {
        if (n < 0) {
            if (errno == EAGAIN)
                continue;
            perror("read");
            break;
        }
        str = realloc(str, len + n + 1);
        memcpy(str + len, buf, n);
        len += n;
        str[len] = '\0';
    }
    printf("%.*s\n", len, str);
    return 0;
}

Solution 5

Note: This is a modification of the accepted answer above.

Here's a way to do it, complete with error checking.

I've added a size checker to quit when file was bigger than 1 GiB. I did this because the program puts the whole file into a string which may use too much ram and crash a computer. However, if you don't care about that you could just remove it from the code.

#include <stdio.h>
#include <stdlib.h>

#define FILE_OK 0
#define FILE_NOT_EXIST 1
#define FILE_TOO_LARGE 2
#define FILE_READ_ERROR 3

char * c_read_file(const char * f_name, int * err, size_t * f_size) {
    char * buffer;
    size_t length;
    FILE * f = fopen(f_name, "rb");
    size_t read_length;
    
    if (f) {
        fseek(f, 0, SEEK_END);
        length = ftell(f);
        fseek(f, 0, SEEK_SET);
        
        // 1 GiB; best not to load a whole large file in one string
        if (length > 1073741824) {
            *err = FILE_TOO_LARGE;
            
            return NULL;
        }
        
        buffer = (char *)malloc(length + 1);
        
        if (length) {
            read_length = fread(buffer, 1, length, f);
            
            if (length != read_length) {
                 free(buffer);
                 *err = FILE_READ_ERROR;

                 return NULL;
            }
        }
        
        fclose(f);
        
        *err = FILE_OK;
        buffer[length] = '\0';
        *f_size = length;
    }
    else {
        *err = FILE_NOT_EXIST;
        
        return NULL;
    }
    
    return buffer;
}

And to check for errors:

int err;
size_t f_size;
char * f_data;

f_data = c_read_file("test.txt", &err, &f_size);

if (err) {
    // process error
}
else {
    // process data
    free(f_data);
}
Share:
185,410
tkokoszka
Author by

tkokoszka

I'm a software engineer at Google in Mountain View, California. I love programming and learning new programming languages. Some open source projects I've been involved in: AppScale - an open source implementation of the Google App Engine APIs. Runs App Engine apps written in Python, Java, Go, or PHP over Amazon EC2, Google Compute Engine, Eucalyptus, Xen, or KVM. Active Cloud DB - a software-as-a-service that exposes a REST API to any of the databases that AppScale supports (e.g., HBase, Cassandra, MongoDB) or the Datastore that App Engine supports. Neptune - a domain specific language that automatically configures and deploys high performance computing apps over AppScale. Run your MPI, MapReduce, X10, and other codes automatically over EC2 without needing to know how to start them and configure them!

Updated on December 19, 2021

Comments

  • tkokoszka
    tkokoszka over 2 years

    What is the simplest way (least error-prone, least lines of code, however you want to interpret it) to open a file in C and read its contents into a string (char*, char[], whatever)?

    • Andy Lester
      Andy Lester over 15 years
      "simplest way" and "least error-prone" are often opposites of each other.
    • Mark Lakata
      Mark Lakata about 10 years
      "simplest way" and "least error prone" are actually synonymous in my book. For example, the answer in C# is string s = File.ReadAllText(filename);. How could that be simpler and more error prone?
  • tkokoszka
    tkokoszka over 15 years
    Awesome, that worked like a charm (and is pretty simple to follow along). Thanks!
  • freespace
    freespace over 15 years
    I would also check the return value of fread, since it might not actually read the entire file due to errors and what not.
  • rmeador
    rmeador over 15 years
    Along the lines of what freespace said, you might want to check to ensure the file isn't huge. Suppose, for instance, that someone decided to feed a 6GB file into that program...
  • tkokoszka
    tkokoszka over 15 years
    Definitely, just like Nils said originally, I'm going to go look up the error codes on fseek, ftell, and fread and act accordingly.
  • dicroce
    dicroce over 15 years
    Seeking to the end just so you can call ftell? Why not just call stat?
  • KPexEA
    KPexEA over 15 years
    like rmeador said, fseek will fail on files >4GB.
  • Nils Pipenbrinck
    Nils Pipenbrinck over 15 years
    True. For large files this solution sucks.
  • Nils Pipenbrinck
    Nils Pipenbrinck over 15 years
    I haven't suggested using stat simply because it's not ANSI C. (At least I think so). Afaik the "recommended" way to get a file-size is to seek to the end and get the file offset.
  • Dan Lenski
    Dan Lenski over 15 years
    This is good and easy... but it will choke if you need to read from a pipe rather than an ordinary file, which is something that most UNIX programs will want to do at some point.
  • ephemient
    ephemient over 15 years
    I've used this before! It works very nicely, assuming the file you're reading is text (does not contain \0).
  • ivan-k
    ivan-k over 9 years
    Since this is a landing page, I would like to point out that fread does not zero-terminate your string. This can lead to some trouble.
  • Clark Gaebel
    Clark Gaebel about 8 years
    This is O(n^2), where n is the length of your file. All solutions with more upvotes than this are O(n). Please don't use this solution in practice, or use a modified version with multiplicative growth.
  • Jake
    Jake about 8 years
    realloc() can extend the existing memory to the new size without copying the old memory to a new larger piece of memory. only if there are intervening calls to malloc() will it need to move memory around and make this solution O(n^2). here, there's no calls to malloc() that happen in between the calls to realloc() so the solution should be fine.
  • soywod
    soywod over 7 years
    As @Manbroski said, buffer need to be '\0' terminated. So I would change buffer = malloc (length + 1); and add after fclose : buffer[length] = '\0'; (validated by Valgrind)
  • anthony
    anthony over 7 years
    NICE! Saves a lot of problems when slurping in whole text files. Now if there was a similar ultra simple way of reading a binary file stream until EOF without needing any delimiting character!
  • anthony
    anthony over 7 years
    This will only with with disk based files. It will fail for named pipes, standard input, or network streams.
  • anthony
    anthony over 7 years
    You could read directly into the "str" buffer (with an appropriate offset), without needing to copy from a intermediate "buf". That technique however that will generally over allocate memory needed for the file contents. Also watch out for binary files, the printf will not handle them correctly, and you probably don't want to print binary anyway!
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com about 7 years
    Make this answer into a nice function with error checking + call example for copy pasters :-)
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com about 7 years
    Ha, also why I came here! But I think you need to either null terminate the string, or return the length which glShaderSource optionally takes.
  • Gerhardh
    Gerhardh over 6 years
    This is no C code. The question is not tagged as C++.
  • BaiJiFeiLong
    BaiJiFeiLong over 6 years
    @Gerhardh So rapid response to the question nine years ago when i am editing! Although the function part is pure C, I am sorry for my will-not-run-on-c answer.
  • Gerhardh
    Gerhardh over 6 years
    This ancient question was listed at the top of active questions. I didn't search for it.
  • Andrew Henle
    Andrew Henle over 6 years
    fseek (f, 0, SEEK_END); is explicitly undefined behavior for a binary stream. 7.21.9.2 The fseek function, paragraph 3: ... A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END. And per footnote 268 of the C standard: Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream...
  • Toby Speight
    Toby Speight about 6 years
    Don't forget to check the return values from those system calls!
  • ivan.ukr
    ivan.ukr almost 6 years
    must use off_t instead of int when calling lseek().
  • ericcurtin
    ericcurtin over 5 years
    I don't think this was ever intended to be a large file solution. Reading GBs of files into a single string is not a good idea. But for smaller files it might be just fine :)
  • ericcurtin
    ericcurtin over 5 years
    This code leaks memory, don't forget to free your malloc'd memory :)
  • user001
    user001 almost 5 years
    Note that if the goal is to stably capture in memory the contents of a file at a given moment in time, this solution should be avoided, unless you are certain that the file being read into memory will not be modified by other processes during the interval over which the map will be used. See this post for more information.
  • Zap
    Zap over 4 years
    How do you separate the lines in the buffer though? Checking for new lines?
  • Jack G
    Jack G almost 4 years
    Please don't allocate all the memory you think you'll need upfront. This is a perfect example of bad design. You should allocate memory as-you-go whenever it is possible to do so. It would be good design if you expect the file to be 10,000 bytes long, your program can't handle a file that's any other size, and you're checking the size and erroring out anyway, but that's not what is going on here. You really should learn how to code C correctly.
  • Pablosproject
    Pablosproject over 3 years
    Just one question: the buffer you allocated with malloc(length +1), is not being freed. Is that something the consumer of this method shall do, or there is no need for free() the allocated memory?
  • Joe Cool
    Joe Cool over 3 years
    if an error has not occurred, free(f_data); should be called. thks for pointing that out
  • user11171
    user11171 over 2 years
    You misspelled "too" in FILE_TO_LARGE