accept-charset="UTF-8" parameter doesnt do anything, when used in form

29,981

The question, as asked, is self-contradictory: the heading says that the accept-charset parameter does not do anything, whereas the question body says that when the accept-charset attribute (this is the correct term) is used, “the headers have different accept charset option in the request header”. I suppose a negation is missing from the latter statement.

Browsers send Accept-Charset parameters in HTTP request headers according to their own principles and settings. For example, my Chrome sends Accept-Charset:windows-1252,utf-8;q=0.7,*;q=0.3. Such a header is typically ignored by server-side software, but it could be used (and it was designed to be used) to determine which encoding is to be used in the server response, in case the server-side software (a form handler, in this case) is capable of using different encodings in the response.

The accept-charset attribute in a form element is not expected to affect HTTP request headers, and it does not. It is meant to specify the character encoding to be used for the form data in the request, and this is what it actually does. The HTML 4.01 spec is obscure about this, but the W3C HTML5 draft puts it much better, though for some odd reason uses plural: “gives the character encodings that are to be used for the submission”. I suppose the reason is that you could specify alternate encodings, to prepare for situations where a browser is unable to use your preferred encoding. And what actually happens in Chrome for example is that if you use accept-charset="foobar utt-8", then UTF-8 used.

In practice, the attribute is used to make the encoding of data submission different from the encoding of the page containing the form. Suppose your page is ISO-8859-1 encoded and someone types Greek or Hebrew letters into your form. Browsers will have to do some error recovery, since those characters cannot be represented in ISO-8859-1. (In practice they turn the characters to numeric character references, which is logically all wrong but pragmatically perhaps the best they can do.) Using <form charset=utf-8> helps here: no matter what the encoding is, the form data will be sent as UTF-8 encoding, which can handle any character.

If you wish to tell the form handler which encoding it should use in its response, then you can add a hidden (or non-hidden) field into the form for that.

Share:
29,981
insomiac
Author by

insomiac

Updated on July 30, 2022

Comments

  • insomiac
    insomiac almost 2 years

    I am using accept-charset="utf-8" attribute in form and found that the when do a form post with non-ascii, the headers have different accept charset option in the request header. Is there anything i am missing ? My form looks like this

    <form method="post" action="controller" accept-charset="UTF-8">
    ..input text box
    .. submit button
    </form>
    

    Thanks in advance

  • insomiac
    insomiac over 11 years
    Thanks for the answer its helpful. Do you know how can i set accept-charset default as UTF-8?
  • Jukka K. Korpela
    Jukka K. Korpela over 11 years
    The default for accept-charset is UNKNOWN as per HTML 4.01, but HTML5 drafts reflect the reality better: the default is the document’s character encoding. If you mean setting a default to be used in your authoring software, then it all depends on that software.
  • Y.L.
    Y.L. almost 10 years
    What's the version of your Chrome? Accept-Charset is obsolete, you should not depend on it anymore:code.google.com/p/chromium/issues/detail?id=112804
  • Jukka K. Korpela
    Jukka K. Korpela almost 10 years
    @CyberRusher, this is still relevant in Chrome 35. The accept-charset HTML attribute is distinct from the Accept-Charset HTTP header (which is what your linked document discusses).
  • Y.L.
    Y.L. almost 10 years
    @JukkaK.Korpela, Doesn't the accept-charset HTML attribute controls the Accept-Charset HTTP header? My Chrome version is "35.0.1916.153 m", but there is no any Accept-Char in the request header.
  • Jukka K. Korpela
    Jukka K. Korpela almost 10 years
    @CyberRusher, no. As I mention in my answer: “The accept-charset attribute in a form element is not expected to affect HTTP request headers, and it does not. ”
  • Y.L.
    Y.L. almost 10 years
    @JukkaK.Korpela, How could I set my chrome to let it send Accept-Character in HTTP request? could you give me some tips? thank you!
  • Jukka K. Korpela
    Jukka K. Korpela almost 10 years
    @CyberRusher, that would be a browser configuration issue and off-topic (for this question and for SO), and I'm afraid it can’t be done.
  • ppostma1
    ppostma1 about 9 years
    FYI: The definition of HTTP -headers- and get-requests content is: "MUST BE ISO-8859-1". preferably ascii. Anything with a different charset can only be done in a body (including POST). So always expect the request headers to be ISO. However setting the form charset permits the post body to contain utf8 despite headers. Each browser has different responses on how they handle the post body characters. Few browsers send the request header '~post content is utf8~'. If you set it on the form, you have to expect it on processing