Printing unicode to console

11,951

Solution 1

windows-1252 charset is the problem here. We need to use UTF-8 charset to print. Following worked for me:

public static void main(String[] args) throws Exception{
    Charset utf8Charset = Charset.forName("UTF-8");
    Charset defaultCharset = Charset.defaultCharset();
    System.out.println(defaultCharset);
    // charset is windows-1252

    String unicodeMessage = "\u4e16\u754c\u4f60\u597d\uff01";

    System.out.println(unicodeMessage);
    // string is printed correctly using System.out (世界你好!)


    byte[] sourceBytes = unicodeMessage.getBytes("UTF-8");
    String data = new String(sourceBytes , defaultCharset.name());

    PrintStream out = new PrintStream(System.out, true, utf8Charset.name());
    out.println(data);
}

Solution 2

You have a number of issues and misunderstandings. Firstly,

byte[] sourceBytes = unicodeMessage.getBytes("UTF-8");
String data = new String(sourceBytes , defaultCharset.name());

data is now full of mojibake - you've decoded UTF-8 as windows-1252. You then print this string to through a UTF-8 encoder. System.out then encodes for your console's codepage. It's got three levels of broken.

Now, the reason System.out.println(unicodeMessage); works is because you set your locale correctly. Java uses this (the codepage of the console), not defaultCharset to setup the console.

The problem, you'll face is the Window console doesn't support UTF-8. You'll be ok printing characters from your codepage but not others. Find another solution, such as writing to a file or sending the results to a web page.

Share:
11,951
HAL
Author by

HAL

Updated on June 14, 2022

Comments

  • HAL
    HAL about 2 years

    I'm trying to create a custom print stream that can print localized messages to the console. I encountered a problem doing this on Windows. Here is what I'm attempting to do

    • I have a unicode string
    • Convert unicode string to bytes using UTF-8 encoding
    • Convert bytes to a new string with console encoding
    • Print new string to console with console encoding

    In this code, I tried to do the above steps but it fails miserably. Strangely the default System.out.println call works correctly. However, I want to use a custom print stream and not rely on the default System.out.

    Can someone explain how I can print unicode to the console using my custom print stream? And why is the default System.out already equipped to print things correctly?

    Here is my code - I compiled it and ran it from the command line. I set my system locale to zh-CN beforehand.

    public static void main(String[] args) throws Exception{
        Charset defaultCharset = Charset.defaultCharset();
        System.out.println(defaultCharset);
        // charset is windows-1252
    
        String unicodeMessage =
                "\u4e16\u754c\u4f60\u597d\uff01";
    
        System.out.println(unicodeMessage);
        // string is printed correctly using System.out (世界你好!)
    
    
        byte[] sourceBytes = unicodeMessage.getBytes("UTF-8");
        String data = new String(sourceBytes , defaultCharset.name());
    
        PrintStream out = new PrintStream(System.out, true, defaultCharset.name());
        out.println(data);
        // prints gibberish: ??–????????????
    }