How do I write a UTF-8 encoded string to a file in windows, in C++

23,758

Solution 1

Yes, when you specify that the text file should be encoded in UTF-8, the CRT implicitly assumes that you'll be writing Unicode text to the file. Not doing so doesn't make sense, you wouldn't need UTF-8. This will work proper:

wchar_t* x = L"Fool";
FILE* outFile = fopen( "Serialize.txt", "w+,ccs=UTF-8");
fwrite(x, wcslen(x) * sizeof(wchar_t), 1, outFile);
fclose(outFile);

Or:

char* x = "Fool";
FILE* outFile = fopen( "Serialize.txt", "w+,ccs=UTF-8");
fwprintf(outFile, L"%hs", x);
fclose(outFile);

Solution 2

It is easy if you use the C++11 standard (because there are a lot of additional includes like "utf8" which solves this problems forever).

But if you want to use multi-platform code with older standards, you can use this method to write with streams:

  1. Read the article about UTF converter for streams
  2. Add stxutif.h to your project from sources above
  3. Open the file in ANSI mode and add the BOM to the start of a file, like this:

    std::ofstream fs;
    fs.open(filepath, std::ios::out|std::ios::binary);
    
    unsigned char smarker[3];
    smarker[0] = 0xEF;
    smarker[1] = 0xBB;
    smarker[2] = 0xBF;
    
    fs << smarker;
    fs.close();
    
  4. Then open the file as UTF and write your content there:

    std::wofstream fs;
    fs.open(filepath, std::ios::out|std::ios::app);
    
    std::locale utf8_locale(std::locale(), new utf8cvt<false>);
    fs.imbue(utf8_locale); 
    
    fs << .. // Write anything you want...
    
Share:
23,758
Franken Wallace
Author by

Franken Wallace

Updated on July 09, 2022

Comments

  • Franken Wallace
    Franken Wallace almost 2 years

    I have a string that may or may not have unicode characters in it, I am trying to write that to a file on windows. Below I have posted a sample bit of code, my problem is that when I fopen and read the values back out windows, they are all being interpreted as UTF-16 characters.

    char* x = "Fool";
    FILE* outFile = fopen( "Serialize.pef", "w+,ccs=UTF-8");
    fwrite(x,strlen(x),1,outFile);
    fclose(outFile);
    
    char buffer[12];
    buffer[11]=NULL;
    outFile = fopen( "Serialize.pef", "r,ccs=UTF-8");
    fread(buffer,1,12,outFile);
    fclose(outFile);
    

    The characters are also interpreted as UTF-16 if I open the file in wordpad etc. What am I doing wrong?