How to read UTF 8 encoded file in java with turkish characters

11,676

Solution 1

You appear to be correctly decoding the file data from UTF-8 to UTF-16 strings.

System.out performs transcoding operations from UTF-16 strings to the default JRE character encoding. If this does not match the encoding used by the device receiving the character data is corrupted. So, the console should be set to the default character encoding or data corruption occurs. How this is done is device-dependent.

If you are using a terminal, the Console does a better job of determining the device encoding.

Note: it is better to use the try-with-resources or at least try-finally to close streams; use the standard encoding constants if available.

Solution 2

Make sure the console you use to display the output is also encoded in UTF-8. In Eclipse for example, you need to go to Run Configuration > Common to do this.

enter image description here

Share:
11,676
Juned Ahsan
Author by

Juned Ahsan

#SOreadytohelp I am neither a geek nor a guru. I simply answer what I know, and learn what I don't know. I find SO a great place to learn and share. My book as a co-author https://www.amazon.com/MongoDB-Workshop-Interactive-Approach-Learning/dp/1839210648

Updated on June 15, 2022

Comments

  • Juned Ahsan
    Juned Ahsan over 1 year

    I am trying to read a UTF-8 encoded txt file, which has some turkish characters. Basically I am have written an axis based web service, which reads this file and send the output back as a string. Somehow I am not able to read the characters properly. The code is very simple as mentioned here:

    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.nio.charset.Charset;
    import java.nio.charset.CharsetDecoder;
    import java.nio.charset.CodingErrorAction;
    
    public class TurkishWebService {
    
        public String generateTurkishString() throws IOException {
            InputStream isr = this.getClass().getResourceAsStream(
                    "/" + "turkish.txt");
    
            BufferedReader in = new BufferedReader(new InputStreamReader(isr,
                    "UTF8"));
            String str;
    
            while ((str = in.readLine()) != null) {
                System.out.println(str);
            }
    
            in.close();
            return str;
        }
    
        public String normalString() {
            System.out.println("webService normal text");
            return "webService normal text";
        }
    
        public static void main(String args[]) throws IOException {
            new TurkishWebService().generateTurkishString();
        }
    }
    

    Here are the contents of turkish.txt, just one line

    Assalğçğıİİööşş
    

    I am getting the stdout as

    Assal?τ????÷÷??
    

    Please suggest what am I doing wrong here.