Setting the default Java character encoding

741,977

Solution 1

Unfortunately, the file.encoding property has to be specified as the JVM starts up; by the time your main method is entered, the character encoding used by String.getBytes() and the default constructors of InputStreamReader and OutputStreamWriter has been permanently cached.

As Edward Grech points out, in a special case like this, the environment variable JAVA_TOOL_OPTIONS can be used to specify this property, but it's normally done like this:

java -Dfile.encoding=UTF-8 … com.x.Main

Charset.defaultCharset() will reflect changes to the file.encoding property, but most of the code in the core Java libraries that need to determine the default character encoding do not use this mechanism.

When you are encoding or decoding, you can query the file.encoding property or Charset.defaultCharset() to find the current default encoding, and use the appropriate method or constructor overload to specify it.

Solution 2

From the JVM™ Tool Interface documentation…

Since the command-line cannot always be accessed or modified, for example in embedded VMs or simply VMs launched deep within scripts, a JAVA_TOOL_OPTIONS variable is provided so that agents may be launched in these cases.

By setting the (Windows) environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8, the (Java) System property will be set automatically every time a JVM is started. You will know that the parameter has been picked up because the following message will be posted to System.err:

Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8

Solution 3

I have a hacky way that definitely works!!

System.setProperty("file.encoding","UTF-8");
Field charset = Charset.class.getDeclaredField("defaultCharset");
charset.setAccessible(true);
charset.set(null,null);

This way you are going to trick JVM which would think that charset is not set and make it to set it again to UTF-8, on runtime!

Solution 4

I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName"). That way your application is not dependent on things beyond its control.

I personally feel that String.getBytes() should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.

Solution 5

I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.

Share:
741,977
Admin
Author by

Admin

Updated on December 08, 2021

Comments

  • Admin
    Admin over 2 years

    How do I properly set the default character encoding used by the JVM (1.5.x) programmatically?

    I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don't have that luxury for reasons I wont get into.

    I have tried:

    System.setProperty("file.encoding", "UTF-8");
    

    And the property gets set, but it doesn't seem to cause the final getBytes call below to use UTF8:

    System.setProperty("file.encoding", "UTF-8");
    
    byte inbytes[] = new byte[1024];
    
    FileInputStream fis = new FileInputStream("response.txt");
    fis.read(inbytes);
    FileOutputStream fos = new FileOutputStream("response-2.txt");
    String in = new String(inbytes, "UTF8");
    fos.write(in.getBytes());
    
  • Alan Moore
    Alan Moore over 15 years
    DataInputStream and DataOutputStream are special-purpose classes that should never be used with plain text files. The modified UTF-8 they employ is not compatible with real UTF-8. Besides, if the OP could use your solution, he could also use the right tool for this job: an OutputStreamWriter.
  • Michael Borgwardt
    Michael Borgwardt about 15 years
    Except, of course, if you're writing a desktop app and processing some user-specified text that does not have any encoding metadata - then the platform default encoding is your best guess as to what the user might be using.
  • Stijn de Witt
    Stijn de Witt about 13 years
    For completeness I would like to add that with a bit of trickery you can get to the actually used default encoding (as is cached), thanks to Gary Cronin: byte [] byteArray = {'a'}; InputStream inputStream = new ByteArrayInputStream(byteArray); InputStreamReader reader = new InputStreamReader(inputStream); String defaultEncoding = reader.getEncoding(); lists.xcf.berkeley.edu/lists/advanced-java/1999-October/…
  • Raedwald
    Raedwald about 12 years
    @MichaelBorgwardt "then the platform default encoding is your best guess" you seem to be advising that wanting to change the default is not such a good idea. Do you mean, use an explicit encoding wherever possible, using the supplied dafault when nothing else is possible?
  • Michael Borgwardt
    Michael Borgwardt about 12 years
    @Raedwald: yes, that's what I meant. The platform default encoding is (at least on an end user machine) what users in the locale the system is set to are typically using. That is information you should use if you have no better (i.e. document-specific) information.
  • thatidiotguy
    thatidiotguy over 11 years
    Do you know that "Picked up..." statement would be printed in Tomcat logs?
  • Christophe Roussy
    Christophe Roussy over 11 years
    The client.encoding.override property seems to be WebSphere specific.
  • Smaug
    Smaug about 11 years
    Hi Edward Grech I thank you for your solution. It was resolved my problmem in another forum post. stackoverflow.com/questions/14814230/…
  • SparK
    SparK about 11 years
    NoSuchFieldException for me
  • Yonatan
    Yonatan over 10 years
    For the hack to work, you need to assume the security manager is off. If you don't have a way to set a JVM flag, you might (probably) have a security manager enabled system as well.
  • Aleksandr Dubinsky
    Aleksandr Dubinsky over 10 years
    @MichaelBorgwardt Nonsense. Use a library to auto-detect the input encoding, and save as Unicode with BOM. That is the only way to deal with and fight encoding hell.
  • Caspar
    Caspar over 9 years
    JDK-4163515 has some more info on setting the file.encoding sysprop after JVM startup.
  • WesternGun
    WesternGun over 8 years
    I think you two are not in the same page. Michael talks about decoding while Raedwald you talk about processing after decoding.
  • DLight
    DLight about 8 years
    @Tiny Java understands both. stackoverflow.com/questions/6031877/…
  • Krish Nakum R
    Krish Nakum R over 7 years
    Though i haven't understood what it is, it works fine for me! Thanks. Hope it doesn't create any new issues to my app. Cheers!
  • Fish Biscuit
    Fish Biscuit about 7 years
    This worked for me, but the underlying issue was the ssh connections to spin up or jars had its LC_* set wrong (in the profile).
  • cabaji99
    cabaji99 over 6 years
    I was scratching my head cause that command was not working on Windows, linux and mac perfectly... then i put " around the value like this: java -D"file.encoding=UTF-8" -jar
  • dotwin
    dotwin over 6 years
    JDK9 does not approve of this hack anymore. WARNING: An illegal reflective access operation has occurred • WARNING: Illegal reflective access by [..] • WARNING: Please consider reporting this to the maintainers of [..] • WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations • WARNING: All illegal access operations will be denied in a future release
  • Michail Michailidis
    Michail Michailidis about 6 years
    check my answer in case of Java Spring Boot: stackoverflow.com/a/48952844/986160
  • sleske
    sleske about 6 years
    @Enerccio: That's not a good answer, that's a dirty hack, and a problem waiting to happen. That should only be used as an emergency measure.
  • Enerccio
    Enerccio about 6 years
    @sleske problem is that java should have a way to override this but alas they don't, so this is a good answer because it is the ONLY answer
  • sleske
    sleske about 6 years
    @Enerccio: It's arguable whether Java "should" have a way to set this - one could also argue that developers "should" explicitly specify encoding whenever it's relevant. At any rate, this solution has the potential to cause serious trouble in the longer run, hence the "for emergency use only" caveat. Actually, even emergency use is questionable, because there is a supported way of doing it, setting JAVA_TOOL_OPTIONS as explained in another answer.
  • Enerccio
    Enerccio about 6 years
    @sleske all other solutions can't change during the runtime... and if you have library that uses default encoding and you might not have sources and you must use that library, this is only working solution...
  • sleske
    sleske about 6 years
    @Enerccio: If this solution works, using JAVA_TOOL_OPTIONS should work too, and is actually a supported solution.
  • BullyWiiPlaza
    BullyWiiPlaza almost 6 years
    For me, just setting the system property helped fix my encoding problem where the IDE used UTF-8 and the JAR file the default system encoding which bugged my resource bundle strings.
  • Cheung
    Cheung almost 4 years
    I confirm this work on JRE 1.8 and chinese windows 10!