reading in utf-8 file (javascript XMLHttpRequest) gives bad european characters

36,772

Solution 1

EDIT: Seems that this answer, although accepted, is suboptimal, so for anyone coming here with a similar problem, check out Ricardo's answer

I think you have to use a different way to print the characters, for example, see the code at the end of this discussion:

<script>
  function getUnicode(num) {
    num = num.toString(16);
    if (num.length < 3) {
      for ( var i = num.length; i < 4; i++) {
        num = '0' + num;
      }
    }
    return ( "&#" + num + ";" );
  }

  for ( var i = 0; i < 65355; i++) {
    document.write(getUnicode(i));
  }
</script>

Solution 2

Probably your file is not in UTF-8 then try this from javascript:

var request = new XMLHttpRequest();
request.open("GET", path, false);
request.overrideMimeType('text/xml; charset=iso-8859-1');

Solution 3

I'm having the same issue and I fixed in this way.

If you serve the js file containing the spanish days as UTF-8 and the if is NOT saved as UTF-8 it WONT work.

Save the file in your IDE as UTF-8 (ie. eclipse default for js files will be cp1252) and also serve it as UTF-8 char encoding.

If your app is java, do a filter with this code:

response.setCharacterEncoding("UTF-8");

have a good one

Share:
36,772
mark smith
Author by

mark smith

Updated on November 19, 2020

Comments

  • mark smith
    mark smith over 3 years

    can anyone help? I have small procedure to read in an UTF-8 file with javascript using XMLHttpRequest.. this file has european characters like miércoles sábado etc.. Notice the accents..

    But when being read in .. the characters are all messed up.. I have checked the file and it is perfect.. it must be the procedure for reading in..

    heres an example i have file that contains, the file is perfect, it happens to be javascript but it doesn't matter.. any UTF-8 encoding file with special characters gives me the same issue

    this.weekDays = new Array("Lunes", "Martes", "Miércoles", "Jueves", "Viernes", "Sábado", "Domingo");

    but when returned and read by the procedure below it is like this (notice the funny characters in sabado and miercoles)

    this.weekDays = new Array("Lunes", "Martes", "Miércoles", "Jueves", "Viernes", "Sábado", "Domingo");

    Here is my procedure - its very small...

    var contentType = "application/x-www-form-urlencoded; charset=utf-8";
    
    var request = new XMLHttpRequest(); 
    request.open("GET", path, false);
    request.setRequestHeader('Content-type', contentType)
    
    if (request.overrideMimeType) request.overrideMimeType(contentType);
    
    try { request.send(null); }
    catch (e) { return null; }
    if (request.status == 500 || request.status == 404 || request.status == 2 || (request.status == 0 && request.responseText == '')) return null;
    
    //PROBLEM HERE is with european charcters that are read in
    
    print(request.responseText);
    
    
    return request.responseText;
    
    • cal meacham
      cal meacham almost 15 years
      are you sure the file is in UTF-8? Did you set your text editor to save it with that encoding explicitly? Setting the request to UTF-8 is irrelvant, is the answer really in UTF-8 and the corresponding header set in the response?
    • Nikos M.
      Nikos M. about 9 years
      thgis is old but for anyone stumbling on this, use the .overrideMimeType('text/plain; charset=utf8'); method of the xmlhttprequest object from MDN Using XMLHttpRequest
  • Dinei
    Dinei almost 6 years
    I'm sure my file is UTF-8 encoded, but the server didn't return this charset header, so this solved the problem.
  • Ricardo
    Ricardo almost 6 years
    @Dinei if this snippet is working for you that means your output is not in UTF8. Some server providers could make modify your output. I suggest to try it in postman and check the headers.
  • Dinei
    Dinei almost 6 years
    I have a .json file which is UTF-8 encoded, but for some reason it is being served with Content-type: text/plain; charset=ISO-8859-1 header. Using overrideMimeType with charset=UTF-8 solved my problem.
  • Jack G
    Jack G almost 6 years
    Sorry, but this question is about javascript, not java.