Java encoding for Japanese characters

10,341

Solution 1

Let's see what your code actually does:

//Assign to bytes the UTF-16 String fileName Encoded in Shift_JIS
//bytes now contains the binary Shift_JIS representation of your String
final byte[] bytes = fileName.getBytes("Shift_JIS");

//Create a new String UTF-16 by interpreting bytes as ISO8859_1
//Takes the Shift_JIS encoded bytes and interprets it as ISO8859_1
new String(bytes,"ISO8859_1");

Java strings use UTF-16 for their internal representation. You cannot specify a target encoding when you create a string as UTF-16 is fixed, you have to Specify the correct source encoding which is "Shift_JIS" for the bytes array.

The fileNameX should come out correct without converting.

Solution 2

This is the mapping problem both Shift_JIS code and Unicode. Shift_JIS doesn't have all the characters of Unicode so some characters become "?".

Following is the result of conversion from Unicode to Shift_JIS.

RESULT  UNICODE
[NG]    U+2012 (FIGURE DASH)
[NG]    U+2013 (EN DASH)
<OK>    U+2014 (EM DASH)
[NG]    U+2015 (HORIZONTAL BAR)
<OK>    U+2212 (MINUS SIGN)
[NG]    U+FF0D (FULLWIDTH HYPHEN-MINUS)

One solution is a replacement of the code.

U+2012,U+2013,U+2015 --> U+2014
U+FF0D               --> U+2212
Share:
10,341
Prasanna
Author by

Prasanna

Updated on August 23, 2022

Comments

  • Prasanna
    Prasanna over 1 year

    I have a file name with Japanese characters. file name: S-最終条件.pdf. In Java, file name: S-最終条件.pdf.

    // Support for Japanese file name
    fileNameX = new String(fileName.getBytes("Shift_JIS"),"ISO8859_1");
    

    The output fileNameX is coming out S?最終条件.pdf. Hence it is throwing an error. I am trying to outstream the file in PDF format, but the particular Japanese character "-" is not recognised and it is throwing error while streaming.

    Please help me solve this issue.
    Thanks, Prasanna