Javascript Charset problem

12,690

Solution 1

While my initial assumption was the same as T.J. Crowder's, a quick chat established that the OP uses some hosting service and cannot easily change the Content-Type headers.

The files were sent as text/plain or text/html without any Charset paramter, hence the browser interprets them as UTF-8 (which is the default).

So saving the files in UTF-8 (instead of ANSI/Windows-1252) did the trick.

Solution 2

You need to ensure that the HTTP response returning the file data has the correct charset identified on it. You have to do that server-side, I don't think you can force it from the client. (When you set the content type in the request header, you're setting the content type of the request, not the response.) So for instance, the response header from the server would be along the lines of:

Content-Type: text/plain; charset=windows-1252

...if by "ANSI" you mean the Windows-1252 charset. That should tell the browser what it needs to do to decode the response text correctly before handing it to the JavaScript layer.

One problem, though: As far as I can tell, Windows-1252 doesn't have the full Romanian alphabet. So if you're seeing characters like Ș, ș, Ţ, ţ, etc., that suggests the source text is not in Windows-1252. Now, perhaps it's okay to drop the diacriticals on those in Romanian (I wouldn't know) and so if your source text just uses S and T instead of Ș and Ţ, etc., it could still be in Windows-1252. Or it may be ISO-8859 or ISO-8859-2 (both of which drop some diacriticals) or possibly ISO-8859-16 (which has full Romanian support). Details here.

So the first thing to do is determine what character set the source text is actually in.

Share:
12,690
Cata
Author by

Cata

Eager to learn and do new things. I consider myself an open minded person and I appreciate any feedback received because admitting your mistakes it's a good start to make improvements! ;)

Updated on June 04, 2022

Comments

  • Cata
    Cata almost 2 years

    I want to read a file from my server with javascript and display it's content in a html page. The file is in ANSI charset, and it has romanian characters.. I want to display those characters in the way they are :D not in different black symbols..

    So I think my problem is the charset.. I have a get request that takes the content of the file, like this:

    function IO(U, V) {//LA MOD String Version. A tiny ajax library.  by, DanDavis
    var X = !window.XMLHttpRequest ? new ActiveXObject('Microsoft.XMLHTTP') : new XMLHttpRequest();
    X.open(V ? 'PUT' : 'GET', U, false );
    X.setRequestHeader('Content-Type', 'Charset=UTF-8');
    X.send(V ? V : '');return X.responseText;}
    

    As far as I know the romanian characters are included in UTF-8 charset so I set the charset of the request header to utf-8.. the file is in utf-8 format and I have the meta tag that tells the browser that the page has utf-8 content..

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    

    So if I query the server the direct file, the browser shows me the romanian characters but if I display the content of the page through this script, I see only symbols instead of characters.. So what I am doing wrong?

    Thank you!

    PS: I want this to work on Firefox at least not necessarily in all browsers..

    • yoavmatchulsky
      yoavmatchulsky almost 13 years
      Content Type needs to be: text/html; charset=UTF-8
    • Tomalak
      Tomalak almost 13 years
      What Content-Type header do you use with that file?
    • Cata
      Cata almost 13 years
      well in the text editor is ansi :D if that is the question :D, is not a html page it has just text :D
    • Cata
      Cata almost 13 years
      I tried "text/html; charset=UTF-8" but it's the same :D
    • Tomalak
      Tomalak almost 13 years
      @Cata: It is irrelevant what your text editor does. The only relevant thing is what Content-Type header the web server returns for that particular file.
    • Cata
      Cata almost 13 years
      @Tomalak: it seems to be ISO-8859-1 the charset :D, i took it from firefox :)
    • Tomalak
      Tomalak almost 13 years
      @Cata: Can you confirm that this is that the charset declaration that you also see if you load the file on its own?
    • Tomalak
      Tomalak almost 13 years
  • Tomalak
    Tomalak almost 13 years
    I just read that HTML5 standardized that "ISO-8859-1" charset declarations must be interpreted as "Windows-1252". Finally.
  • Cata
    Cata almost 13 years
    Now the page returns the content charset : ISO-8859-1 like the content of the page but the problem is still the same..
  • T.J. Crowder
    T.J. Crowder almost 13 years
    @Cata: The response with the file data has that content type? And you're certain that's the charset of the source text (not, say, ISO-8859-2 or -16)? If the answer is yes to both questions, I'm afraid I don't know what it is.
  • Cata
    Cata almost 13 years
    it seems that the file with the content send the content charset as simple text.. I have changed the charset of the file to utf and now everything is fine, thanks to Tomalak :D
  • ncubica
    ncubica almost 11 years
    I have been doing a prototype with just html pages and I was creating pages on the fly with just right click on a folder (windows) and new file... and all were save with ANSI (without know), before this answer I was becoming crazy to find out why the f"#$%ck the UTF-8 meta was not working... Now I can code again in peace :o)