Java, UTF-8, and Windows console

21,656

Solution 1

Try chcp 65001 && start.bat

The chcp command changes the code page, and 65001 is the Win32 code page identifier for UTF-8 under Windows 7 and up. A code page, or character encoding, specifies how to convert a Unicode code point to a sequence of bytes or back again.

Solution 2

Java on windows does NOT support unicode ouput by default. I have written a workaround method by calling Native API with JNA library.The method will call WriteConsoleW for unicode output on the console.

import com.sun.jna.Native;
import com.sun.jna.Pointer;
import com.sun.jna.ptr.IntByReference;
import com.sun.jna.win32.StdCallLibrary;

/** For unicode output on windows platform
 * @author Sandy_Yin
 * 
 */
public class Console {
    private static Kernel32 INSTANCE = null;

    public interface Kernel32 extends StdCallLibrary {
        public Pointer GetStdHandle(int nStdHandle);

        public boolean WriteConsoleW(Pointer hConsoleOutput, char[] lpBuffer,
                int nNumberOfCharsToWrite,
                IntByReference lpNumberOfCharsWritten, Pointer lpReserved);
    }

    static {
        String os = System.getProperty("os.name").toLowerCase();
        if (os.startsWith("win")) {
            INSTANCE = (Kernel32) Native
                    .loadLibrary("kernel32", Kernel32.class);
        }
    }

    public static void println(String message) {
        boolean successful = false;
        if (INSTANCE != null) {
            Pointer handle = INSTANCE.GetStdHandle(-11);
            char[] buffer = message.toCharArray();
            IntByReference lpNumberOfCharsWritten = new IntByReference();
            successful = INSTANCE.WriteConsoleW(handle, buffer, buffer.length,
                    lpNumberOfCharsWritten, null);
            if(successful){
                System.out.println();
            }
        }
        if (!successful) {
            System.out.println(message);
        }
    }
}
Share:
21,656
tofcoder
Author by

tofcoder

Updated on July 09, 2022

Comments

  • tofcoder
    tofcoder almost 2 years

    We try to use Java and UTF-8 on Windows. The application writes logs on the console, and we would like to use UTF-8 for the logs as our application has internationalized logs.

    It is possible to configure the JVM so it generates UTF-8, using -Dfile.encoding=UTF-8 as arguments to the JVM. It works fine, but the output on a Windows console is garbled.

    Then, we can set the code page of the console to 65001 (chcp 65001), but in this case, the .bat files do not work. This means that when we try to launch our application through our script (named start.bat), absolutely nothing happens. The command simple returns:

    C:\Application> chcp 65001
    Activated code page: 65001
    C:\Application> start.bat
    
    C:\Application>
    

    But without chcp 65001, there is no problem, and the application can be launched.

    Any hints about that?

  • KarolDepka
    KarolDepka over 14 years
    Seems like a step backwards to stick (and modify things) to iso-8859-1 instead of utf-8 . But probably You had your reasons.
  • Hakanai
    Hakanai about 11 years
    PowerShell still uses the same console, so it is just as old and crap as cmd.exe.
  • Axel Fontaine
    Axel Fontaine about 10 years
    This must be used in conjunction with -Dfile.encoding=UTF-8 to work correctly.
  • Cj1m
    Cj1m over 9 years
    @AxelFontaine I tried using -Dfile.encoding=UTF-8 but when using the square root symbol, the last 2 numbers after the symbol would repeat. E.g instead of √125 the output would be √12525
  • brady
    brady almost 4 years
    It started supporting it with Windows 7.