What most correct way to set the encoding in C++?

10,990

Solution 1

I need that any Unicode symbol/string was correctly inputed and outputed.

This is certainly possible, although making the Windows command prompt console properly Unicode-aware takes some special magic. I seriously doubt that any of the implementations of the standard library functions are going to do this, unfortunately.

You'll find a number of questions about it on Stack Overflow, but this one is a good one. Basically, the console uses what is called (somewhat erroneously) the "OEM" code page by default. You want to change that to the UTF-8 code page, the value of which is defined by CP_UTF8. To do this, you'll need to call both the SetConsoleCP function (to set the input code page) and the SetConsoleOutputCP function (to set the output code page). The code would look something like this:

if (!SetConsoleCP(CP_UTF8))
{
    // An error occurred; handle it. Call GetLastError() for more information.
    // ...
}
if (!SetConsoleOutputCP(CP_UTF8))
{
    // An error occurred; handle it. Call GetLastError() for more information.
    // ...
}

For extra robustness, you might also want to make sure that the UTF-8 code page is supported first, before trying to set and use it. You would do that by calling the IsValidCodePage function. For example:

if (IsValidCodePage(CP_UTF8))
{
    // We're all good, so set the console code page...
}

You will also have to change the font from the default ("Raster Fonts") to something that contains the requisite Unicode character glyphs—e.g., Lucida Console or Consolas (reference). That's trivial to do using the SetCurrentConsoleFontEx function.

Unfortunately, this function does not exist in versions of Windows prior to Vista. If you absolutely need to support these older operating systems, the only thing I know to do is to call the undocumented SetConsoleFont function. Normally, I would advise strongly against using undocumented functions, but I think it's less of a problem here since you would only be using it in old versions of the operating system. You know those aren't going to change. On the newer versions where it is available, you call the supported function. Sample untested code:

bool IsWinVistaOrLater()
{
    OSVERSIONINFOEX osvi;
    osvi.dwOSVersionInfoSize = sizeof(osvi);
    GetVersionEx(reinterpret_cast<LPOSVERSIONINFO>(&osvi));

    if (osvi.dwPlatformId == VER_PLATFORM_WIN32_NT)
    {
        return osvi.dwMajorVersion >= 6;
    }
    return false;
}

void SetConsoleToUnicodeFont()
{
    HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
    if (IsWinVistaOrLater())
    {
        // Call the documented function.
        typedef BOOL (WINAPI * pfSetCurrentConsoleFontEx)(HANDLE, BOOL, PCONSOLE_FONT_INFOEX);
        HMODULE hMod = GetModuleHandle(TEXT("kernel32.dll"));
        pfSetCurrentConsoleFontEx pfSCCFX = (pfSetCurrentConsoleFontEx)GetProcAddress(hMod, "SetCurrentConsoleFontEx");

        CONSOLE_FONT_INFOEX cfix;
        cfix.cbSize       = sizeof(cfix);
        cfix.nFont        = 12;
        cfix.dwFontSize.X = 8;
        cfix.dwFontSize.Y = 14;
        cfix.FontFamily   = FF_DONTCARE;
        cfix.FontWeight   = 400;  // normal weight
        lstrcpy(cfix.FaceName, TEXT("Lucida Console"));

        pfSCCFX(hConsole,
                FALSE, /* set font for current window size */
                &cfix);
    }
    else
    {
        // There is no supported function on these older versions,
        // so we have to call the undocumented one.
        typedef BOOL (WINAPI * pfSetConsoleFont)(HANDLE, DWORD);
        HMODULE hMod = GetModuleHandle(TEXT("kernel32.dll"));
        pfSetConsoleFont pfSCF = (pfSetConsoleFont)GetProcAddress(hMod, "SetConsoleFont");
        pfSCF(hConsole, 12);
    }
}

Notice that I've left adding the required error checking as an exercise for the reader. The focus here is on technique and readability; cluttering it up with error handling would just confuse matters.

I have no idea how to do any of this on Linux. I suspect it's a lot less work, since people tell me the OS uses UTF-8 internally. Either way, you're on your own for that; making Windows purr is enough work for one answer!

Solution 2

I've just needed to output Unicode text to the console and only this function WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), ...); helped. For input I assume ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), ...); does the trick.

PS: WriteOutput has a limit in the output string size. So you might want to iterate it in chunks if it's longer.

Share:
10,990
shau-kote
Author by

shau-kote

User Interface Developer: aspire to develop anything user interact with. С++/Qt, JS/HTML/CSS, React Native, iOS (just on the way to).

Updated on July 20, 2022

Comments

  • shau-kote
    shau-kote almost 2 years

    How it is best of all to set the encoding in C++?

    I got used to working with Unicode (and wchar_t, wstring, wcin, wcout and L" ... "). I also save source in UTF-8.

    At the moment I use MinGW (Windows 7) and run my program in Windows console (cmd.exe), but sometimes I can use gcc on GNU\Linux and run promgram in Linux console with UTF-8 encoding.

    At all times I want to compile my source on Windows and on Linux and I want that all Unicode symbols were correctly inputed and outputed.

    When I faced the next problem with encodings, I googled. Also I found the most different councils: setlocale(LC_ALL, "") and setlocale(LC_ALL, "xx_XX.UTF-8"), std::setlocale(LC_ALL, "") and std::setlocale(LC_ALL, "xx_XX.UTF-8") from <clocale>,

    SetConsoleCP() and SetConsoleOutputCP() from <windows.h> and many, many others.

    At last I was bothered by this shamanism and I want to ask you: how it is correct to establish the encoding?