Reading file with bad encoding. CP1252 vs UTF-8
Solution 1
Try to use InputStreamReader(InputStream in, String charsetName)
constructor and set charset by yourself.
Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr), "UTF-8");
Solution 2
I had exactly the same error and finally solved the issue by adding this to the JVM startup options :
-Dfile.encoding=UTF8
Evgeny Mironenko
I like write code using Java, Kotlin, Spring and React: by day, by night, for fun.
Updated on June 04, 2022Comments
-
Evgeny Mironenko almost 2 years
I have byte array, which put in InputStreamReader and do some manipulations with it.
Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr));
JVM has default cp1252 encoding, but file, which I translating to byte array has utf-8 encoding. Also this file has german umlauts. And when I put byte array in InputStreamReader, java decode umlauts to wrong symbols. For example ü represent as ü. I'm tried to put "UTF-8" and Charset.forName("UTF-8").newDecoder()); to InputStreamReader constructor, translate strings from reader to string with new encoding via new String(oldStr.getBytes("cp1252"), "UTF-8); but it's not helped. In debugger in reader variable I see StreamDecoder parameter, which has "decoder" with MS1252$Decoder value. Maybe It's solving of my problem, but I not understand, how I can fix it.