encoding problem in servlet

10,843

Solution 1

Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:

<%@ page pageEncoding="UTF-8" %>

Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.

<Connector URIEncoding="UTF-8">

For the case that you'd like to use POST, then you need to ensure that the HttpServletRequest is instructed to parse the POST request body using UTF-8.

request.setCharacterEncoding("UTF-8");

Call this before you access the first parameter. A Filter is the best place for this.

See also:

Solution 2

Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn't UTF-8!

It would probably be safest to switch to POST requests.

Share:
10,843
hguser
Author by

hguser

Updated on June 04, 2022

Comments

  • hguser
    hguser almost 2 years

    I have a servlet which receive some parameter from the client ,then do some job. And the parameter from the client is Chinese,so I often got some invalid characters in the servet. For exmaple: If I enter

    http://localhost:8080/Servlet?q=中文&type=test
    

    Then in the servlet,the parameter of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.

    However if I enter the adderss bar again,the url will changed to :

    http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test
    

    Now my servlet will get the right parameter of 'q'.

    What is the problem?

    UPDATE

    BTW,it words well when I send the form with post. WHen I send them in the ajax,for example:

    url="http://..q='中文',
    xmlhttp.open("POST",url,true); 
    

    Then the server side also get the invalid characters.

    It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.

    That's to say http://.../q=中文 does not work, http://.../q=%D6%D0%CE%C4 work.

    But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work? alt text