Even on Windows 7, can you do a "dir" and be able to see filenames that has unicode characters?

5,734

Solution 1

This is a very old question, but all of the answers given here are wrong.

You will never see Unicode output on the Windows command line (CMD.exe). The reason is that CMD cannot display Unicode. It can, however, display DBCS (Double-Byte Character Set).

If you want to see Japanese output, for example, you have to change your System Locale to Japanese and reboot. Then, you'll be able to see Japanese DBCS (i.e. Shift-JIS) characters on the command line. Windows supports Japanese Shift-JIS, Simplified Chinese, Korean, and Traditional Chinese "Big5" DBCS code pages.

Incidentally, you can pipe UTF-16 (inaccurately used interchangeably with "Unicode" by Microsoft) to a file, then open that file in, say, Notepad, and view the Unicode characters. You can also mark and copy the gibberish text from CMD.exe and paste it into Notepad and see the Unicode characters. In other words, CMD supports Unicode, but it doesn't display Unicode.

You can find more information in this blog post.

Solution 2

Based on your username I suspect you mainly work with asian languages.

Windows tools operate normally in unicode mode (as you saw by piping the output of dir into a file and opening that file with an editor):

  1. the tool does its stuff
  2. it outputs unicode characters
  3. another program takes this output and has to display it.

to display any character on the screen the program from step 3 has to lookup the glyph appropriate for the given byte sequence. example:

  • 0x65 'a' maps to a different glyph in each font (so the 'a' looks different from font to font)

  • 0x937 'Ω' (greek 'omega') maps to a different glyph in each font as well

this mapping only works IF the font has a glyph for the given byte sequence. otherwise the visual result differs, sometimes you see '?', sometimes diamonds etc.

again: dirproduces bytesequences, which sometimes are purely in the ASCII-range, sometimes they are in the unicode range (depending on what filenames it finds). it sends these sequences to another program which is responsible for actually rendering the bytesequences. to be able to display these sequences, this program has to map the sequence to a glyph. to do that, it has to search in a font for the glyph. if the font does not have a glyph for the given sequence, then the program can not display the byte sequence produced by, for example, dir.

so, the solution to your problem (seeing any unicode-character in the 'console / terminal' of windows) is: use a font for the program which has (almost) every glyph for (almost) any given unicode bytesequence in it.

Share:
5,734

Related videos on Youtube

GeekAbhiGeek
Author by

GeekAbhiGeek

I started with Apple Basic and 6502 machine code and Assembly, then went onto Fortran, Pascal, C, Lisp (Scheme), microcode, Perl, Java, JavaScript, Python, Ruby, PHP, and Objective-C. Originally, I was going to go with an Atari... but it was a big expense for my family... and after months of me nagging, my dad agreed to buy an Apple ][. At that time, the Pineapple was also available. The few months in childhood seem to last forever. A few months nowadays seem to pass like days. Those days, a computer had 16kb or 48kb of RAM. Today, the computer has 16GB. So it is in fact a million times. If you know what D5 AA 96 means, we belong to the same era.

Updated on September 17, 2022

Comments

  • GeekAbhiGeek
    GeekAbhiGeek over 1 year

    This is somewhat related to question

    On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

    Even on Windows 7, I found that the only way I can get unicode to go into a file is by

    > cmd /U
    > dir /B > files.txt
    

    the file will be in "Unicode" when I open in Notepad and try "Save As", and if I dir /B > files.html and open the HTML file in firefox, it can show using Encoding of UTF-16 (or UTF-16 LE).

    but, if I want to see it on the screen instead of having it go to a file, it is still impossible. Is there a way to make it happen? Possibly somehow telling cmd not to show nonprintable characters as "?"

    Update: I tried cmd.exe, cygwin's bash on windows, and PowerShell. They are the same. Except if I change the "Properties -> Font" to Consolas or Lucida Console, there is some improvement -- now it is not question mark but is either square border or square with a question mark in it.

    The more expensive Mac computers with Mac OS X can do it. The free Ubuntu can do it too.

  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    hm, but the cmd, cygwin bash, and PowerShell all are limited to 3 fonts: Raster fonts, Lucida Console, and Consolas... actually Windows usually fall back to a unicode font when it can't display anything with the current font... also, if I redirect the output, like dir > file.txt it is still question mark in the file, even though it is "square box" on the screen.
  • akira
    akira almost 14 years
    @Jian Lin: yes, but that is essentially YOUR problem to provide a font which contains these glyphs. and even if windows falls back to "some" font which holds "some" unicode glyphs in it ... that is not enough to display some of your asian glyphs (you have problems with the asian glyphs, right?).
  • akira
    akira almost 14 years
    according to some websites, "Ascender Uni Duo" seems to be the best font (even for "fixed") ascendercorp.de/fonts/multilingual/ascender-uni but maybe you find something better / cheaper en.wikipedia.org/wiki/Unicode_typefaces
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    @akira there are many fonts on Windows 7 that can display the whole Unicode glyph set. But (1) Cmd window won't let you choose any of them. (2) When windows or the app falls back to the font that can display unicode, such as Lucida Sans Unicode, it can display most any chinese characters.
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    Lucida Sans Unicode used to be much larger... now it is about 300kb on Windows 7. But anyway, even if you set the any web browser to use this font or any other font such as Time New Roman, when you go to news.google.com/news?edchanged=1&ned=tw you can still see the chinese characters if you are using Vista or Win 7. Either the app, or more likely Windows, when cannot find the glyph in that current font, will go find it in the font that has it.
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    besides, when I redirect the output using CMD /U and then DIR /B > file.txt, I can see the correct glyph in Notepad automatically, even using a default English font. So unless Microsoft is saying, oh we just won't show unicode char in Command Prompt, even people still use it and it is part of Win 7, we will make it behave less well than Notepad. PowerShell too. Unicode? out of the question.
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    hm... still won't work... cmd /U, chcp 65001, dir, and dir /B with the font already set to Lucida Console, still the same.
  • ta.speot.is
    ta.speot.is almost 14 years
    You may want to try adding more fonts to the console: support.microsoft.com/kb/247815 and blogs.msdn.com/b/oldnewthing/archive/2007/05/16/2659903.aspx (the latter for some discussion on the issue).
  • akira
    akira almost 14 years
    as you said: cmd.exe only accepts fonts for fixed sizes. it does not matter if you can see all the glyphs in your webbrowser, or in notepad, or in xyz. if the glyph is not in the font used by cmd.exe you can not see it, period. even if windows fallsback to other (fixed size) fonts: if the glyph is not in there either, it can not be displayed. and thats why i said: find a fixed size font for cmd.exe which contains almost all glyphs (as "ascender uni duo", so i was told)
  • akira
    akira almost 14 years
    and no, you are not using an "english only" font in notepad. you are lucky that either the font itself or the fallback provides the glyphs the bytesequences require. anyhow, in notepad the default font is not the fixed size font.
  • akira
    akira almost 14 years
    it all depends on the fonts you are giving the program to render the text. read the support article, good info in it.
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    I think Mac OS X solved it by making the glyph 2 characters wide, and then, no character is overlapping. It works pretty well and at least people can see the unicode filenames. It is not trying to build a rocket here.
  • Philipp
    Philipp almost 14 years
    It really has nothing to do with the operating system or encodings. The Windows console display simply uses just one font and doesn't look for alternatives if a glyph is missing. OTOH, the Windows text box (which Notepad uses) does look for alternative fonts.
  • Philipp
    Philipp almost 14 years
    @taspeotis: The Windows console always uses Unicode internally, regardless of the codepage setting (which is obsolete anyway and only included for backwards compatibility). It is really just a font problem.
  • Philipp
    Philipp almost 14 years
    @akira: Good answer, I'd just replace ”byte sequences” by “16-bit strings” or “UTF-16 strings” since that is what Windows internally uses.
  • akira
    akira almost 14 years
    @Phillip: i wanted to keep it more generic since the underlying mechanism is the same on every OS: bytesequence -> lookp the glyphs in the font -> rendering.
  • akira
    akira almost 14 years
    i ve created a russian filename and cmd.exe displayed the glyphs correctly after switching to lucida. for asian fonts i think OP has to pick a "better" or more "unicode complete fixed font" (even if he does not like that answer :)).
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    Can any of the included font on Win 7 be used? such as MingLiU, DFKai-SB
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    There is no font I can change to and use, even though there are about 12 chinese fonts on Windows 7. The only font I can change to and use is Courier New, which is pure English. The font "Ascender Uni Duo" costs $149 and is almost as expensive as Windows 7 itself. And who knows whether it will work or not...
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    There is no font I can change to and use, even though there are about 12 chinese fonts on Windows 7. The only font I can change to and use is Courier New, which is pure English. The font "Ascender Uni Duo" costs $149 and is almost as expensive as Windows 7 itself. And who knows whether it will work or not...
  • akira
    akira almost 14 years
    i understand your pain. the last time i contacted ascender they were very friendly, just ask them if you can test the font.
  • Danzzz
    Danzzz almost 14 years
    ls in Powershell is actually just an alias for "Get-ChildItem"
  • ta.speot.is
    ta.speot.is almost 14 years
    What's wrong with adding fonts support.microsoft.com/kb/247815
  • GeekAbhiGeek
    GeekAbhiGeek almost 14 years
    It is fine adding font. But (1) I already have 7 or 8 default Chinese fonts on the system that should also have other unicode characters that I don't care as much, but, if you say, add one more, sure, I can do that. (2) which one to add -- is there any free one. Somebody suggested adding one that is $149.
  • GeekAbhiGeek
    GeekAbhiGeek over 13 years
    and then? don't tell me you use Get-ChildItem on the command line every day instead of ls. For example, we usually drink water instead of hydrogen dioxide.
  • Jeff
    Jeff over 9 years
    @Philipp it's not just a font problem. The CMD window is an old-school DBCS program. The command line processor itself supports Unicode, but not the display portion. The only way to show Japanese, Chinese, Korean, and Trad. Chinese in the CMD window (or any old-school DBCS UI) is to change the System Locale.