How to read the content of a file to a string in C?
Solution 1
I tend to just load the entire buffer as a raw memory chunk into memory and do the parsing on my own. That way I have best control over what the standard lib does on multiple platforms.
This is a stub I use for this. you may also want to check the error-codes for fseek, ftell and fread. (omitted for clarity).
char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");
if (f)
{
fseek (f, 0, SEEK_END);
length = ftell (f);
fseek (f, 0, SEEK_SET);
buffer = malloc (length);
if (buffer)
{
fread (buffer, 1, length, f);
}
fclose (f);
}
if (buffer)
{
// start to process your data / extract strings here...
}
Solution 2
Another, unfortunately highly OS-dependent, solution is memory mapping the file. The benefits generally include performance of the read, and reduced memory use as the applications view and operating systems file cache can actually share the physical memory.
POSIX code would look like this:
int fd = open("filename", O_RDONLY);
int len = lseek(fd, 0, SEEK_END);
void *data = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);
Windows on the other hand is little more tricky, and unfortunately I don't have a compiler in front of me to test, but the functionality is provided by CreateFileMapping()
and MapViewOfFile()
.
Solution 3
If "read its contents into a string" means that the file does not contain characters with code 0, you can also use getdelim() function, that either accepts a block of memory and reallocates it if necessary, or just allocates the entire buffer for you, and reads the file into it until it encounters a specified delimiter or end of file. Just pass '\0' as the delimiter to read the entire file.
This function is available in the GNU C Library, http://www.gnu.org/software/libc/manual/html_mono/libc.html#index-getdelim-994
The sample code might look as simple as
char* buffer = NULL;
size_t len;
ssize_t bytes_read = getdelim( &buffer, &len, '\0', fp);
if ( bytes_read != -1) {
/* Success, now the entire file is in the buffer */
Solution 4
If you are reading special files like stdin or a pipe, you are not going to be able to use fstat to get the file size beforehand. Also, if you are reading a binary file fgets is going to lose the string size information because of embedded '\0' characters. Best way to read a file then is to use read and realloc:
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
int main () {
char buf[4096];
ssize_t n;
char *str = NULL;
size_t len = 0;
while (n = read(STDIN_FILENO, buf, sizeof buf)) {
if (n < 0) {
if (errno == EAGAIN)
continue;
perror("read");
break;
}
str = realloc(str, len + n + 1);
memcpy(str + len, buf, n);
len += n;
str[len] = '\0';
}
printf("%.*s\n", len, str);
return 0;
}
Solution 5
Note: This is a modification of the accepted answer above.
Here's a way to do it, complete with error checking.
I've added a size checker to quit when file was bigger than 1 GiB. I did this because the program puts the whole file into a string which may use too much ram and crash a computer. However, if you don't care about that you could just remove it from the code.
#include <stdio.h>
#include <stdlib.h>
#define FILE_OK 0
#define FILE_NOT_EXIST 1
#define FILE_TOO_LARGE 2
#define FILE_READ_ERROR 3
char * c_read_file(const char * f_name, int * err, size_t * f_size) {
char * buffer;
size_t length;
FILE * f = fopen(f_name, "rb");
size_t read_length;
if (f) {
fseek(f, 0, SEEK_END);
length = ftell(f);
fseek(f, 0, SEEK_SET);
// 1 GiB; best not to load a whole large file in one string
if (length > 1073741824) {
*err = FILE_TOO_LARGE;
return NULL;
}
buffer = (char *)malloc(length + 1);
if (length) {
read_length = fread(buffer, 1, length, f);
if (length != read_length) {
free(buffer);
*err = FILE_READ_ERROR;
return NULL;
}
}
fclose(f);
*err = FILE_OK;
buffer[length] = '\0';
*f_size = length;
}
else {
*err = FILE_NOT_EXIST;
return NULL;
}
return buffer;
}
And to check for errors:
int err;
size_t f_size;
char * f_data;
f_data = c_read_file("test.txt", &err, &f_size);
if (err) {
// process error
}
else {
// process data
free(f_data);
}
tkokoszka
I'm a software engineer at Google in Mountain View, California. I love programming and learning new programming languages. Some open source projects I've been involved in: AppScale - an open source implementation of the Google App Engine APIs. Runs App Engine apps written in Python, Java, Go, or PHP over Amazon EC2, Google Compute Engine, Eucalyptus, Xen, or KVM. Active Cloud DB - a software-as-a-service that exposes a REST API to any of the databases that AppScale supports (e.g., HBase, Cassandra, MongoDB) or the Datastore that App Engine supports. Neptune - a domain specific language that automatically configures and deploys high performance computing apps over AppScale. Run your MPI, MapReduce, X10, and other codes automatically over EC2 without needing to know how to start them and configure them!
Updated on December 19, 2021Comments
-
tkokoszka over 2 years
What is the simplest way (least error-prone, least lines of code, however you want to interpret it) to open a file in C and read its contents into a string (char*, char[], whatever)?
-
Andy Lester over 15 years"simplest way" and "least error-prone" are often opposites of each other.
-
Mark Lakata about 10 years"simplest way" and "least error prone" are actually synonymous in my book. For example, the answer in C# is
string s = File.ReadAllText(filename);
. How could that be simpler and more error prone?
-
-
tkokoszka over 15 yearsAwesome, that worked like a charm (and is pretty simple to follow along). Thanks!
-
freespace over 15 yearsI would also check the return value of fread, since it might not actually read the entire file due to errors and what not.
-
rmeador over 15 yearsAlong the lines of what freespace said, you might want to check to ensure the file isn't huge. Suppose, for instance, that someone decided to feed a 6GB file into that program...
-
tkokoszka over 15 yearsDefinitely, just like Nils said originally, I'm going to go look up the error codes on fseek, ftell, and fread and act accordingly.
-
dicroce over 15 yearsSeeking to the end just so you can call ftell? Why not just call stat?
-
KPexEA over 15 yearslike rmeador said, fseek will fail on files >4GB.
-
Nils Pipenbrinck over 15 yearsTrue. For large files this solution sucks.
-
Nils Pipenbrinck over 15 yearsI haven't suggested using stat simply because it's not ANSI C. (At least I think so). Afaik the "recommended" way to get a file-size is to seek to the end and get the file offset.
-
Dan Lenski over 15 yearsThis is good and easy... but it will choke if you need to read from a pipe rather than an ordinary file, which is something that most UNIX programs will want to do at some point.
-
ephemient over 15 yearsI've used this before! It works very nicely, assuming the file you're reading is text (does not contain \0).
-
ivan-k over 9 yearsSince this is a landing page, I would like to point out that
fread
does not zero-terminate your string. This can lead to some trouble. -
Clark Gaebel about 8 yearsThis is O(n^2), where n is the length of your file. All solutions with more upvotes than this are O(n). Please don't use this solution in practice, or use a modified version with multiplicative growth.
-
Jake about 8 yearsrealloc() can extend the existing memory to the new size without copying the old memory to a new larger piece of memory. only if there are intervening calls to malloc() will it need to move memory around and make this solution O(n^2). here, there's no calls to malloc() that happen in between the calls to realloc() so the solution should be fine.
-
soywod over 7 yearsAs @Manbroski said, buffer need to be '\0' terminated. So I would change
buffer = malloc (length + 1);
and add after fclose :buffer[length] = '\0';
(validated by Valgrind) -
anthony over 7 yearsNICE! Saves a lot of problems when slurping in whole text files. Now if there was a similar ultra simple way of reading a binary file stream until EOF without needing any delimiting character!
-
anthony over 7 yearsThis will only with with disk based files. It will fail for named pipes, standard input, or network streams.
-
anthony over 7 yearsYou could read directly into the "str" buffer (with an appropriate offset), without needing to copy from a intermediate "buf". That technique however that will generally over allocate memory needed for the file contents. Also watch out for binary files, the printf will not handle them correctly, and you probably don't want to print binary anyway!
-
Ciro Santilli OurBigBook.com about 7 yearsMake this answer into a nice function with error checking + call example for copy pasters :-)
-
Ciro Santilli OurBigBook.com about 7 yearsHa, also why I came here! But I think you need to either null terminate the string, or return the length which
glShaderSource
optionally takes. -
Gerhardh over 6 yearsThis is no C code. The question is not tagged as C++.
-
BaiJiFeiLong over 6 years@Gerhardh So rapid response to the question nine years ago when i am editing! Although the function part is pure C, I am sorry for my will-not-run-on-c answer.
-
Gerhardh over 6 yearsThis ancient question was listed at the top of active questions. I didn't search for it.
-
Andrew Henle over 6 years
fseek (f, 0, SEEK_END);
is explicitly undefined behavior for a binary stream. 7.21.9.2 Thefseek
function, paragraph 3: ... A binary stream need not meaningfully support fseek calls with awhence
value ofSEEK_END
. And per footnote 268 of the C standard: Setting the file position indicator to end-of-file, as withfseek(file, 0, SEEK_END)
, has undefined behavior for a binary stream... -
Toby Speight about 6 yearsDon't forget to check the return values from those system calls!
-
ivan.ukr almost 6 yearsmust use off_t instead of int when calling lseek().
-
ericcurtin over 5 yearsI don't think this was ever intended to be a large file solution. Reading GBs of files into a single string is not a good idea. But for smaller files it might be just fine :)
-
ericcurtin over 5 yearsThis code leaks memory, don't forget to free your malloc'd memory :)
-
user001 almost 5 yearsNote that if the goal is to stably capture in memory the contents of a file at a given moment in time, this solution should be avoided, unless you are certain that the file being read into memory will not be modified by other processes during the interval over which the map will be used. See this post for more information.
-
Zap over 4 yearsHow do you separate the lines in the buffer though? Checking for new lines?
-
Jack G almost 4 yearsPlease don't allocate all the memory you think you'll need upfront. This is a perfect example of bad design. You should allocate memory as-you-go whenever it is possible to do so. It would be good design if you expect the file to be 10,000 bytes long, your program can't handle a file that's any other size, and you're checking the size and erroring out anyway, but that's not what is going on here. You really should learn how to code C correctly.
-
Pablosproject over 3 yearsJust one question: the
buffer
you allocated withmalloc(length +1)
, is not being freed. Is that something the consumer of this method shall do, or there is no need forfree()
the allocated memory? -
Joe Cool over 3 yearsif an error has not occurred, free(f_data); should be called. thks for pointing that out
-
user11171 over 2 yearsYou misspelled "too" in
FILE_TO_LARGE