What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?

25,511

Solution 1

You can see it in ProcMon. It seems to set the REG_SZ values ACP, MACCP, and OEMCP in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage to 65001.

I'm not entirely sure but it might be related to the variable gAnsiCodePage in KernelBase.dll, which GetACP reads. If you really want to, you might be able to change it dynamically for your program regardless of the system setting by dynamically disassembling GetACP to find the instruction sequence that reads gAnsiCodePage and obtaining a pointer to it, then updating the variable directly.

(Actually, I see references to an undocumented function named SetCPGlobal that would've done the job, but I can't find that function on my system. Not sure if it still exists.)

Solution 2

Most Windows C APIs come in two different variants:

  • "A" variant that uses 8-bit strings with whatever the systems configured encoding is. This varies depending on the configured country/language. (Microsoft calls the configured encoding the "ANSI Code Page", but it's not really anything to do with ANSI).
  • "W" variant that uses 16-bit strings in a fixed almost-UTF-16 encoding. (The "almost" is because "unpaired surrogates" are allowed; if you don't know what those are then don't worry about them).

The official Microsoft advice is not to use the "A" versions, but to ensure your code always use uses the "W" variants. That way you're supposed to get consistent behaviour no matter what the user's country/language is configured as.

However, it looks like that checkbox is doing more than one thing. It's clear it's supposed to change the "ANSI Code Page" to 65001, which means UTF-8. It looks like it's also changing font rendering to be more Unicody.

I suggest you detect if GetACP() == 65001, then draw the Unicode version of your strings, otherwise draw the old "0r" version. I'm not sure how you do that from .NET...

Solution 3

Please look at this question to see what it solves when it is enabled: How to save to file non-ascii output of program in Powershell?

Also I found explanation written by Ghisler helpful (source):

If you check this option, Windows will use codepage 65001 (Unicode UTF-8) instead of the local codepage like 1252 (Western Latin1) for all plain text files. The advantage is that text files created in e.g. Russian locale can also be read in other locale like Western or Central Europe. The downside is that ANSI-Only programs (most older programs) will show garbage instead of accented characters.


I leave here two ways to enable it, I think they will be helpful for many users:

  1. Win+R -> intl.cpl
  2. Administrative tab
  3. Click the Change system locale button.
  4. Enable Beta: Use Unicode UTF-8 for worldwide language support
  5. Reboot

or alternatively via reg file:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"ACP"="65001"
"OEMCP"="65001"
"MACCP"="65001"
Share:
25,511
Andrew Savinykh
Author by

Andrew Savinykh

Updated on July 15, 2021

Comments

  • Andrew Savinykh
    Andrew Savinykh almost 3 years

    In some Windows 10 builds (insiders starting April 2018 and also "normal" 1903) there is a new option called "Beta: Use Unicode UTF-8 for worldwide language support".

    You can see this option by going to Settings and then: All Settings -> Time & Language -> Language -> "Administrative Language Settings"

    This is what it looks like:

    enter image description here

    When this checkbox is checked I observe some irregularities (below) and I would like to know what exactly this checkbox does and why the below happens.

    Create a brand new Windows Forms application in your Visual Studio 2019. On the main form specify the Paint even handler as follows:

    private void Form1_Paint(object sender, PaintEventArgs e)
    {
        Font buttonFont = new Font("Webdings", 9.25f);
        TextRenderer.DrawText(e.Graphics, "0r", buttonFont, new Point(), Color.Black);
    }
    

    Run the program, here is what you will see if the checkbox is NOT checked:

    enter image description here

    However, if you check the checkbox (and reboot as asked) this changes to:

    enter image description here

    You can look up Webdings font on Wikipedia. According to character table given, the codes for these two characters are "\U0001F5D5\U0001F5D9". If I use them instead of "0r" it works with the checkbox checked but without the checkbox checked it now looks like this:

    enter image description here

    I would like to find a solution that always works that is regardless whether the box checked or unchecked.

    Can this be done?