UTF-16 file output in cmd.exe
Solution 1
Your code is not correct, as 10000 is not a Unicode code page. See Code Page Identifiers.
10000 macintosh MAC Roman; Western European (Mac)
...
1200 utf-16 Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
1201 unicodeFFFE Unicode UTF-16, big endian byte order; available only to managed applications
...
12000 utf-32 Unicode UTF-32, little endian byte order; available only to managed applications
12001 utf-32BE Unicode UTF-32, big endian byte order; available only to managed applications
...
65000 utf-7 Unicode (UTF-7)
65001 utf-8 Unicode (UTF-8)
I'm not sure why, but the Command Prompt seems to interpret them as control characters when pasted, specifically SOH (Start of Heading, 01), STX (Start of Text, 02) and ETX (End of Text, 03).
On the plus side, PowerShell seems to handle this properly. Notepad++ opened the resultant text file as "UCS-2 Little Endian" automatically, and it displays the correct characters.
Ok, I've figured out why UTF-8 wasn't working for me. The font should be set to Lucida Console, since the default Raster Fonts don't have Unicode support.
Solution 2
Both 65001.txt
and 1200.txt
contain the same string:абв™
but in different encoding. The command:
chcp 65001 & type 65001.txt
successfully changes the codepage, but displays the garbage.
The command:
type 1200.txt
displays the correct characters, but the command
for /f %A in ('type 1200.txt') do echo %A
displays абвT
.
So cmd.exe IS able to work with codepage 1200 (with some limits) while I can't get any satisfactory results with codepage 65001.
Related videos on Youtube
0x6B6F77616C74
Updated on September 18, 2022Comments
-
0x6B6F77616C74 over 1 year
chcp 10000 echo hell☺ w☻rld♥! >> "UTF-16 file☺☻♥♦♣♠"
OK, it creates correct file, but in the content there are question marks instead of unicode characters. How to fix it?
-
0x6B6F77616C74 almost 12 yearschcp 1200 : console says: "Invalid code page", as well ass with chcp 1201. Why?
-
tvdo almost 12 years@kutacz
available only to managed applications
. Use PowerShell for proper Unicode support, since the UTF-8 option (65001
) doesn't seem to work in the Command Prompt for this case. -
0x6B6F77616C74 almost 12 yearsIn case of UTF-8(after chcp 65001) it works fine...
-
tvdo almost 12 years@kutacz Well, then, use UTF-8. It didn't seem to work for me, but I'm still not sure what you're trying to do. UTF-16 is not possible in the standard Command Prompt. Managed applications probably refers to the .NET Framework. PowerShell runs on the .NET Framework, so it works for UTF-16.