UTF-16 file output in cmd.exe

10,818

Solution 1

Your code is not correct, as 10000 is not a Unicode code page. See Code Page Identifiers.

10000   macintosh   MAC Roman; Western European (Mac)
...
1200    utf-16      Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
1201    unicodeFFFE Unicode UTF-16, big endian byte order; available only to managed applications
...
12000   utf-32      Unicode UTF-32, little endian byte order; available only to managed applications
12001   utf-32BE    Unicode UTF-32, big endian byte order; available only to managed applications
...
65000   utf-7       Unicode (UTF-7)
65001   utf-8       Unicode (UTF-8)

I'm not sure why, but the Command Prompt seems to interpret them as control characters when pasted, specifically SOH (Start of Heading, 01), STX (Start of Text, 02) and ETX (End of Text, 03).

On the plus side, PowerShell seems to handle this properly. Notepad++ opened the resultant text file as "UCS-2 Little Endian" automatically, and it displays the correct characters.


Ok, I've figured out why UTF-8 wasn't working for me. The font should be set to Lucida Console, since the default Raster Fonts don't have Unicode support.

Solution 2

Both 65001.txt and 1200.txt contain the same string:абв™ but in different encoding. The command:

chcp 65001 & type 65001.txt 

successfully changes the codepage, but displays the garbage.

The command:

type 1200.txt

displays the correct characters, but the command

for /f %A in ('type 1200.txt') do echo %A

displays абвT.

So cmd.exe IS able to work with codepage 1200 (with some limits) while I can't get any satisfactory results with codepage 65001.

Share:
10,818

Related videos on Youtube

0x6B6F77616C74
Author by

0x6B6F77616C74

Updated on September 18, 2022

Comments

  • 0x6B6F77616C74
    0x6B6F77616C74 over 1 year
    chcp 10000
    echo hell☺ w☻rld♥! >> "UTF-16 file☺☻♥♦♣♠"
    

    OK, it creates correct file, but in the content there are question marks instead of unicode characters. How to fix it?

  • 0x6B6F77616C74
    0x6B6F77616C74 almost 12 years
    chcp 1200 : console says: "Invalid code page", as well ass with chcp 1201. Why?
  • tvdo
    tvdo almost 12 years
    @kutacz available only to managed applications. Use PowerShell for proper Unicode support, since the UTF-8 option (65001) doesn't seem to work in the Command Prompt for this case.
  • 0x6B6F77616C74
    0x6B6F77616C74 almost 12 years
    In case of UTF-8(after chcp 65001) it works fine...
  • tvdo
    tvdo almost 12 years
    @kutacz Well, then, use UTF-8. It didn't seem to work for me, but I'm still not sure what you're trying to do. UTF-16 is not possible in the standard Command Prompt. Managed applications probably refers to the .NET Framework. PowerShell runs on the .NET Framework, so it works for UTF-16.