How do I transcode a Javascript string to ISO-8859-1?

72,847

Solution 1

It is my understanding that Javascript uses UTF-8 for its strings

No, no.

Each page has its charset enconding defined in meta tag, just below head element

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

or

<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.

And it is a good idea to define its target charset encoding on server side.

Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

PHP
header("Content-Type: text/html; charset=UTF-8");

C#
I do not know how to...

And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).

<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>

...

So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem

No, no.

The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.

Java
request.setCharacterEncoding("UTF-8")

PHP
// I do not know how to...

If you really want to translate the target charset encoding, TRY as follows

InternetExplorer
    formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
    formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";

Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;

alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
    if(value == "á")
        return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));

Here you can see in action:

You can use this link as guideline (See JavaScript escapes)

Added to original answer how I implement jQuery funcionality

var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
    queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
    url:"url.htm",
    data:dataString,
    contentType:"application/x-www-form-urlencoded; charset=UTF-8",
    success:function(response) {
        // proccess response
    });
});

It works fine without any headache.

Regards,

Solution 2

I had a very similar problem. I needed to pass a URL parameter using JQuery to make an ajax call, and most of the times parameters values included accents.

Both pages had to be set to charset=ISO-8859-1 and javascript's functions: encodeURI, encodeURIComponent etc. only uses UTF-8.

What I did was to create a link in the original page, including all parameters without any encoding, let's say:

var myLink = document.getElementById("myHiddenLink");
myLink.setAttribute("href", "México, Perú, María and any other words with accents and spaces");

and then assign the href value to a variable, like this:

var theLink = myLink.getAttribute("href");

So finally "theLink" variable value was ISO-8859-1 encoded, and everything worked just fine.

Share:
72,847
Marcos Marin
Author by

Marcos Marin

Born and raised in México, currently attending University in the UNAM. Participated in the Google Summer of Code in 2007 with the Mono project and did an internship at Microsoft in 2008 and 2009.

Updated on July 09, 2022

Comments

  • Marcos Marin
    Marcos Marin almost 2 years

    I'm writing a Chrome extension that works with a website that uses ISO-8859-1. Just to give some context, what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an Ajax call (using jQuery).

    If the message contains characters like á these characters appear as á in the posted message. Forcing the browser to display UTF-8 instead of ISO-8859-1 makes the á appear correctly.

    It is my understanding that Javascript uses UTF-8 for its strings, so it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in Javascript, and I can't touch the server side code. Any advice?

    I've tried setting the created form to use iso-8859-1 like this:

    var form = document.createElement("form");
    form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";
    

    And also:

    var form = document.createElement("form");
    form.encoding = "ISO-8859-1";
    

    But that doesn't seem to work.

    EDIT:

    The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet:

    function cfaqs_post_message(msg) {
      var url = cfaqs_build_post_url();
      msg = escape(msg).replace(/\+/g, "%2B");
      $.ajax({
        type: "POST",
        url: url,
        processData: false,
        data: "message=" + msg + "&post=Preview Message",
        success: function(html) {
          // ...
        },
        dataType: "html",
        contentType: "application/x-www-form-urlencoded"
      });
    }
    
  • Marcos Marin
    Marcos Marin about 14 years
    Thanks for the informative answer, I'm marking it as correct even though this was not exactly the solution. My post didn't really give enough information to show the real issue. (I only found out about that after banging my head against the wall for a few more hours)
  • Arthur Ronald
    Arthur Ronald about 14 years
    @Marcos Marin Added content to original answer
  • Eduardo Fabricio
    Eduardo Fabricio over 7 years
    For C# : <%@ Page RequestEncoding="utf-8" ResponseEncoding="utf-8" %>