How do I write a UTF-8 encoded string to a file in windows, in C++
Solution 1
Yes, when you specify that the text file should be encoded in UTF-8, the CRT implicitly assumes that you'll be writing Unicode text to the file. Not doing so doesn't make sense, you wouldn't need UTF-8. This will work proper:
wchar_t* x = L"Fool";
FILE* outFile = fopen( "Serialize.txt", "w+,ccs=UTF-8");
fwrite(x, wcslen(x) * sizeof(wchar_t), 1, outFile);
fclose(outFile);
Or:
char* x = "Fool";
FILE* outFile = fopen( "Serialize.txt", "w+,ccs=UTF-8");
fwprintf(outFile, L"%hs", x);
fclose(outFile);
Solution 2
It is easy if you use the C++11
standard (because there are a lot of additional includes like "utf8"
which solves this problems forever).
But if you want to use multi-platform code with older standards, you can use this method to write with streams:
- Read the article about UTF converter for streams
- Add
stxutif.h
to your project from sources above -
Open the file in ANSI mode and add the BOM to the start of a file, like this:
std::ofstream fs; fs.open(filepath, std::ios::out|std::ios::binary); unsigned char smarker[3]; smarker[0] = 0xEF; smarker[1] = 0xBB; smarker[2] = 0xBF; fs << smarker; fs.close();
-
Then open the file as
UTF
and write your content there:std::wofstream fs; fs.open(filepath, std::ios::out|std::ios::app); std::locale utf8_locale(std::locale(), new utf8cvt<false>); fs.imbue(utf8_locale); fs << .. // Write anything you want...
Franken Wallace
Updated on July 09, 2022Comments
-
Franken Wallace almost 2 years
I have a string that may or may not have unicode characters in it, I am trying to write that to a file on windows. Below I have posted a sample bit of code, my problem is that when I fopen and read the values back out windows, they are all being interpreted as UTF-16 characters.
char* x = "Fool"; FILE* outFile = fopen( "Serialize.pef", "w+,ccs=UTF-8"); fwrite(x,strlen(x),1,outFile); fclose(outFile); char buffer[12]; buffer[11]=NULL; outFile = fopen( "Serialize.pef", "r,ccs=UTF-8"); fread(buffer,1,12,outFile); fclose(outFile);
The characters are also interpreted as UTF-16 if I open the file in wordpad etc. What am I doing wrong?