Why does POST not honor charset, but an AJAX request does? tomcat 6

10,358

Solution 1

form post (outputs chars in iso)

<form id="leadform" enctype="application/x-www-form-urlencoded; charset=utf-8" method="post" accept-charset="utf-8" action="{//app/path}">

You don't need to specify the charset there. The browser will use the charset which is specified in HTTP response header.

Just

<form id="leadform" method="post" action="{//app/path}">

is enough.


xml declaration:

<?xml version="1.0" encoding="utf-8"?>

Irrelevant. It's only relevant for XML parsers. Webbrowsers doesn't parse text/html as XML. This is only relevant for the server side (if you're using a XML based view technology like Facelets or JSPX, on plain JSP this is superfluous).


Doctype:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Irrelevant. It's only relevant for HTML parsers. Besides, it doesn't specify any charset. Instead, the one in the HTTP response header will be used. If you aren't using a XML based view technology like Facelets or JSPX, this can be as good <!DOCTYPE html>.


meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Irrelevant. It's only relevant when the HTML page is been viewed from local disk or is to be parsed locally. Instead, the one in the HTTP response header will be used.


jvm parameters:

-Dfile.encoding=UTF-8

Irrelevant. It's only relevant to Sun/Oracle(!) JVM to parse the source files.


I have also tried using request.setCharacterEncoding("UTF-8"); but it seems as if tomcat simply ignores it. I am not using the RequestDumper valve.

This will only work when the request body is not been parsed yet (i.e. you haven't called getParameter() and so on beforehand). You need to call this as early as possible. A Filter is a perfect place for this. Otherwise it will be ignored.


From what I've read, POST data encoding is mostly dependent on the page encoding where the form is. As far as I can tell, my page is correctly encoded in utf-8.

It's dependent on the HTTP response header.

All you need to do are the following three things:

  1. Add the following to top of your JSP:

    <%@page pageEncoding="UTF-8" %>
    

    This will set the response encoding to UTF-8 and set the response header to UTF-8.

  2. Create a Filter which does the following in doFilter() method:

    if (request.getCharacterEncoding() == null) {
        request.setCharacterEncoding("UTF-8");
    }
    chain.doFilter(request, response);
    

    This will make that the POST request body will be processed as UTF-8.

  3. Change the <Connector> entry in Tomcat/conf/server.xml as follows:

    <Connector (...) URIEncoding="UTF-8" />
    

    This will make that the GET query strings will be processed as UTF-8.

See also:

Solution 2

Try this :

How do I change how POST parameters are interpreted? 

POST requests should specify the encoding of the parameters and values they send. Since many clients fail to set an explicit encoding, the default is used (ISO-8859-1). In many cases this is not the preferred interpretation so one can employ a javax.servlet.Filter to set request encodings. Writing such a filter is trivial. Furthermore Tomcat already comes with such an example filter.

Please take a look at:

5.x

webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java

webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java

6.x

webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java

For more info , refer to the below URL http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

Share:
10,358
Chris
Author by

Chris

Updated on June 07, 2022

Comments

  • Chris
    Chris almost 2 years

    I have a tomcat based application that needs to submit a form capable of handling utf-8 characters. When submitted via ajax, the data is returned correctly from getParameter() in utf-8. When submitting via form post, the data is returned from getParameter() in iso-8859-1.

    I used fiddler, and have determined the only difference in the requests, is that charset=utf-8 is appended to the end of the Content-Type header in the ajax call (as expected, since I send the content type explicitly).

    ContentType from ajax: "application/x-www-form-urlencoded; charset=utf-8"

    ContentType from form: "application/x-www-form-urlencoded"

    I have the following settings:

    ajax post (outputs chars correctly):

    $.ajax( {
      type : "POST",
      url : "blah",
      async : false,
      contentType: "application/x-www-form-urlencoded; charset=utf-8",
      data  : data,
      success : function(data) { 
      }
     });
    

    form post (outputs chars in iso)

     <form id="leadform" enctype="application/x-www-form-urlencoded; charset=utf-8" method="post" accept-charset="utf-8" action="{//app/path}">
    

    xml declaration:

    <?xml version="1.0" encoding="utf-8"?>
    

    Doctype:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    

    meta tag:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    

    jvm parameters:

    -Dfile.encoding=UTF-8
    

    I have also tried using request.setCharacterEncoding("UTF-8"); but it seems as if tomcat simply ignores it. I am not using the RequestDumper valve.

    From what I've read, POST data encoding is mostly dependent on the page encoding where the form is. As far as I can tell, my page is correctly encoded in utf-8.

    The sample JSP from this page works correctly. It simply uses setCharacterEncoding("UTF-8"); and echos the data you post. http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

    So to summarize, the post request does not send the charset as being utf-8, despite the page being in utf-8, the form parameters specifying utf-8, the xml declaration or anything else. I have spent the better part of three days on this and am running out of ideas. Can anyone help me?