Reading file with bad encoding. CP1252 vs UTF-8

13,180

Solution 1

Try to use InputStreamReader(InputStream in, String charsetName) constructor and set charset by yourself.

Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr), "UTF-8");

Solution 2

I had exactly the same error and finally solved the issue by adding this to the JVM startup options :

-Dfile.encoding=UTF8
Share:
13,180
Evgeny Mironenko
Author by

Evgeny Mironenko

I like write code using Java, Kotlin, Spring and React: by day, by night, for fun.

Updated on June 04, 2022

Comments

  • Evgeny Mironenko
    Evgeny Mironenko almost 2 years

    I have byte array, which put in InputStreamReader and do some manipulations with it.

    Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr));
    

    JVM has default cp1252 encoding, but file, which I translating to byte array has utf-8 encoding. Also this file has german umlauts. And when I put byte array in InputStreamReader, java decode umlauts to wrong symbols. For example ü represent as ü. I'm tried to put "UTF-8" and Charset.forName("UTF-8").newDecoder()); to InputStreamReader constructor, translate strings from reader to string with new encoding via new String(oldStr.getBytes("cp1252"), "UTF-8); but it's not helped. In debugger in reader variable I see StreamDecoder parameter, which has "decoder" with MS1252$Decoder value. Maybe It's solving of my problem, but I not understand, how I can fix it.