Handle French Characters in Java
Solution 1
This is an encoding problem, and the Ã
clearly identify that this is UTF-8 text interpreted as ISO-Latin-1 (or one of its cousins).
Ensure that your JSP-page at the top show that it uses UTF-8 encoding.
Solution 2
You get "ABC Farmacéutica Corporation"
because the string you receive from the client is ISO-8859-1
, you need to convert it into UTF-8
before you URL decode it. Like this :
bbb = URLDecoder.decode(new String(bbb.getBytes("ISO-8859-1"), "UTF-8"), "UTF-8");
NOTE : some encodings cannot be converted from and to different encodings without risking data loss. For example, you cannot convert Thaï characters (TIS-620
) to another encoding, not even UTF-8
. For this reason, avoid converting from one encoding to another, unless ultimately necessary (ie. the data comes from an external, third perty, or proprietary source, etc.) This is only a solution on how to convert from one source to another, knowing the source encoding.
Solution 3
I suspect the problem is with character encoding on the page. Make sure the page you submit from and the one you display to use the same character set and make sure that you set it explicitely. for instance if your server runs on Linux the default encoding will be UTF-8 but if you view the page on Windows it will assume (if no encoding is specified) it to be ISO-8859-1. Also when you are receiving the submitted text on your server side, the server will assume the default character set when building the string -- whereas your user might have used a differrent encoding if you didn't specify one.
Solution 4
As I understand it, the text is hardcoded in controller code like this:
ModelAndView mav = new ModelAndView("hello");
mav.addObject("message", "ABC Farmacéutica Corporation");
return mav;
I expect this would work:
ModelAndView mav = new ModelAndView("hello");
mav.addObject("message", "ABC Farmac\u00e9utica Corporation");
return mav;
If so, the problem is due to a mismatch between the character encoding your Java editor is using and the encoding your compiler uses to read the source code.
For example, if your editor saves the Java file as UTF-8 and you compile on a system where UTF-8 is not the default encoding, then you would need to tell your compiler to use that encoding:
javac -cp foo.jar -encoding UTF-8 Bar.java
Your build scripts and IDE settings need to be consistent when handling character data.
If your text editor saved your file as UTF-8 then, in a hex editor, é would be the byte sequence C3 A9
; in many other encodings, it would have the value E9
. ISO-8859-1 and windows-1252 would encode é as C3 A9
. You can read about character encoding in Java source files here.
Comments
-
Max almost 2 years
I Have a Page where I search for a term and it is displaying perfect. Whatever character type it is.
Now when I have few checkboxes in JSP and I check it and submit. In these checkboxes I have one box name like
ABC Farmacéutica Corporation
.When I click on submit button, I am calling a function and will set all parameters to a form and will submit that form. (I tested putting alert for the special character display before submit and it is displaying good).
Now, coming to the Java end, I use Springs Frame work. When I print the term in controller, then it is displayed like
ABC Farmacéutica Corporation
.Please help... Thanks in advance.
EDIT :
Please try this sample Example
import java.net.*; class sample{ public static void main(String[] args){ try{ String aaa = "ABC Farmacéutica Corporation"; String bbb = "ABC Farmacéutica Corporation"; aaa = URLEncoder.encode(aaa, "UTF-8"); bbb = URLDecoder.decode(bbb, "UTF-8"); System.out.println("aaa "+aaa); System.out.println("bbb "+bbb); }catch(Exception e){ System.out.println(e); } } }
I am getting output as,
aaa PiSA+Farmac%C3%A9utica+Mexicana+Corporativo bbb PiSA Farmacéutica Mexicana Corporativo
Try to print the
string aaa
as it is. -
Max almost 13 yearsI have to pass that term to web services. initially when I am getting the complete data, all the terms are displayed correctly. The only problem is when I send it to web service, not able to send the same term to it.
-
Max almost 13 yearsYes I am using UTF-8 for JSP. Still the problem persists
-
matbrgz almost 13 yearsThen look at this particular text snippet from generation to it ends up in the output stream. It might also be because you have written a property file in UTF-8 and then read it under Windows.
-
Liv almost 13 yearsare you setting the correct encoding when dealing with your webservices?
-
Max almost 13 yearsI am not encoding in java end. Because I tried using URLEncoder.encode(term, "UTF-8"). then If I print it in logger, displaying as ABC+Farmac%C3%A9utica+Corporation. This is not identified by webservice
-
Liv almost 13 yearsit's not about url encoding the data -- if you are using a webservice (SOAP I guess?) when you pass the data is the encoding of the data sent (posted) and received set correctly?
-
Max almost 13 yearsI don't have idea on that, because I will put all the fields in one object and will pass that object to a web service link
-
Liv almost 13 yearswhat do you use for the webservices call -- Axis?
-
Max almost 13 yearsI use spring annotation @RequestWrapper and set localName and targetNamespace and className
-
matbrgz almost 13 yearsWhere does the "ABC Farmacéutica Corporation" string come from? Where is it physically defined?
-
Max almost 13 yearsIt's part of my table data only. In that multiple terms I got this special character data.
-
matbrgz almost 13 yearsIn which physical file is the characters "ABC Farmacéutica Corporation" found? The JSP page? A property file? Java code?
-
Max almost 13 yearsI can say that I see first In JSP. well, I am confused what you are expecting. From Page1 I do a search, then in Page2 I will display these terms in table with checkboxes. Now I click on the checkboxes and will submit to page3. So while submitting getting the problem because one of the checkbox term is having this special character
-
matbrgz almost 13 yearsSomewhere, the actual characters that make up the string "ABC Farmacéutica Corporation" are typed by you or somebody else into a file. If you needed to change it into "Carperation" where would you edit?
-
matbrgz almost 13 yearsSo this is defined as a String inside a SomeClass.java file?
-
matbrgz almost 13 yearsIn that case replace "é" with "\u00E9" in your source and try again.
-
Max almost 13 yearsSo, \u00e9 is which encoding part. So that I will try sending the term from JSP by converting these kind of é to \u00e9
-
Max almost 13 yearsHi, This makes sense to me. But I see you are changing "é" with "\u00E9". So which part of encoding is that. So that I can use dynamically all these kind of characters.
-
matbrgz almost 13 yearsNo, in the physical file where you defined "ABC etc" as a string constant, you change that physical constant to have "\u00E9" instead of "é".
-
McDowell almost 13 years@Max -
\u00e9
is a (UTF-16) Unicode escape sequence. I have an app here that will display the escapes for any graphemes you enter. -
Max almost 13 yearsGetting error as ,org.apache.cxf.binding.soap.SoapFault: Error performing Ms FAST Search.EX class: com.fastsearch.esp.search.SearchEngineException. EX Cause: null. EX Message : parsefql: Query Error: line 1:92: unexpected char: 'u'.
-
matbrgz almost 13 yearsYou have done this incorrectly. The character sequence should be expanded when read from the Java source file. I would suggest that you create a minimal but fully functional example showing just this behaviour, and open a new question.
-
Max almost 13 yearsHey, Thanks Andersen. I just found the error with this discussion. Till now my webservice team is not accepting the encoded terms.
-
Paŭlo Ebermann almost 13 yearsNo, one doesn't want to recode existing strings (since there are cases where you simply get a
?
instead). Better make sure the string does not arrive in the wrong encoding. -
Yanick Rochon almost 13 years@Paŭlo, ok.... why the downvote? I already did mentioned in the question's comment about having all files encoded into UTF-8. However seeing that no one could provide a suitable solution for the OP, I'm suggesting this, which is valid Java to convert a string into a different encoding. The string displayed in his controller is clearly an ISO-8859-1 encoded string output in a UTF-8 environment. I'm not arguing the use of an encoding (I never use ISO-8859-1), I'm simply suggesting a solution that might work.
-
Paŭlo Ebermann almost 13 years(It's the other way around, a UTF-8-encoded string decoded as ISO-8859-1.) The conversion should start at a lower point, where the data enters the program (in
byte[]
form). If you have a wrongly decoded String, it is most often too late, and encoding and decoding the string again does help in many, but not in all cases, since these encodings do not have the same range of valid bytes. (If you edit your post to say something like this as a disclaimer, I will remove my downvote - now I simply can't, until your post is edited again.) -
Yanick Rochon almost 13 yearsyes, an UTF-8 string displayed as an ISO-8859-1 encoded string. In any case, disclaimer added.
-
matbrgz almost 13 yearsI do not understand your explanation. What was the problem?