decodeURIComponent vs unescape, what is wrong with unescape?

59,412

Solution 1

What I want to know is what is wrong with escape/unescape ?

They're not “wrong” as such, they're just their own special string format which looks a bit like URI-parameter-encoding but actually isn't. In particular:

  • ‘+’ means plus, not space
  • there is a special “%uNNNN” format for encoding Unicode UTF-16 code points, instead of encoding UTF-8 bytes

So if you use escape() to create URI parameter values you will get the wrong results for strings containing a plus, or any non-ASCII characters.

escape() could be used as an internal JavaScript-only encoding scheme, for example to escape cookie values. However now that all browsers support encodeURIComponent (which wasn't originally the case), there's no reason to use escape in preference to that.

There is only one modern use for escape/unescape that I know of, and that's as a quick way to implement a UTF-8 encoder/decoder, by leveraging the UTF-8 processing in URIComponent handling:

utf8bytes= unescape(encodeURIComponent(unicodecharacters));
unicodecharacters= decodeURIComponent(escape(utf8bytes));

Solution 2

escape operates only on characters in the range 0 to 255 inclusive (ISO-8859-1, which is effectively unicode code points representable with a single byte). (*)

encodeURIComponent works for all strings javascript can represent (which is the whole range of unicode's basic multilingual plane, i e unicode code points 0 to 1,114,111 or 0x10FFFF that cover almost any human writing system in current use).

Both functions produce url safe strings that only use code points 0 to 127 inclusive (US-ASCII), which the latter accomplishes by first encoding the string as UTF-8 and then applying the %XX hex encoding familiar from escape, to any code point that would not be url safe.

This is incidentally why you can make a two-funcall UTF-8 encoder/decoder in javascript without any loops or garbage generation, by combining these primitives to cancel out all but the UTF-8-processing side effects, as the unescape and decodeURIComponent versions do the same in reverse.

(*) Foot note: Some modern browsers like Google Chrome have been tweaked to produce %uXXXX for the above-255 range of characters escape wasn't originally defined for, but web server support for decoding that encoding is not as well-implemented as decoding the IETF-standardized UTF-8 based encoding.

Solution 3

The best Answer is this it's working online on this website http://meyerweb.com/eric/tools/dencoder/

function decode() {
    var obj = document.getElementById('dencoder');
    var encoded = obj.value;
    obj.value = decodeURIComponent(encoded.replace(/\+/g,  " "));
}

Solution 4

Another "modern" use I've run into is parsing a URI-encoded string that may include invalid UTF8 byte sequences. In certain cases decodeURIComponent can throw an exception. You may need to catch this exception and fall back to using unescape.

An example would be 'tür' encoded as 't%FCr' which I've seen Firefox produce (when characters are pasted into the address bar after the ?).

Share:
59,412
andynormancx
Author by

andynormancx

C#, Javascript, DHTML, MSSQL developer.

Updated on July 09, 2022

Comments

  • andynormancx
    andynormancx almost 2 years

    In answering another question I became aware that my Javascript/DOM knowledge had become a bit out of date in that I am still using escape/unescape to encode the contents of URL components whereas it appears I should now be using encodeURIComponent/decodeURIComponent instead.

    What I want to know is what is wrong with escape/unescape ? There are some vague suggestions that there is some sort of problem around Unicode characters, but I can't find any definite explanation.

    My web experience is fairly biased, almost all of it has been writing big Intranet apps tied to Internet Explorer. That has involved a lot of use of escape/unescape and the apps involved have fully supported Unicode for many years now.

    So what are the Unicode problems that escape/unescape are supposed to have ? Does anyone have any test cases to demonstrate the problems ?

    • Peter Bailey
      Peter Bailey about 15 years
      I think this article covers it pretty well
    • andynormancx
      andynormancx about 15 years
      Excellent, just what I wanted. I see the issue is that Mozilla doesn't cope with Unicode in escape, which explains why I haven't run into any problems with it using an IE only app.
    • andynormancx
      andynormancx about 15 years
      I am both blessed and cursed by my history of working with IE only Intranet apps. Blessed because I never have to cope with IE/FF differences and cursed for much the same reason.
    • Amit Patil
      Amit Patil about 15 years
      Mozilla and IE both do the same (curious) thing with Unicode, even if the docs don't mention it.
    • Jonathan Day
      Jonathan Day about 12 years
      Chrome also struggles with Unicode when using (un)escape...
    • Mick MacCallum
      Mick MacCallum almost 10 years
      I know the rules were different when you posted this, but link only answers are frowned upon now-a-days and are usually deleted. However, since you stand to lose a lot from the removal of this answer, I'd like to offer you the chance to save it by editing it to include more information on the subject. (perhaps a summary of what's on the other side of the link) Thank you.
  • Alexis Wilke
    Alexis Wilke about 10 years
    It looks like that bug was fixed in Firefox. However, it is not unlikely that some people wrongly encode characters using ISO-8859-1 instead of UTF-8.
  • Curtis Yallop
    Curtis Yallop about 8 years
    A great reference: unixpapa.com/js/querystring.html - on deprecated escape/unescape, dumb encodeURI/decodeURI and decodeURIComponent/encodeURIComponent - quirks and how to use it. decodeURIComponent does not convert "+" to space.
  • Matthew Oakley
    Matthew Oakley over 7 years
    escape will escape a single quote, whereas encodeURI doesn't. Which makes it useless for my project.
  • acabra85
    acabra85 about 7 years
    bobince- I am currently using exactly that approach to get the utf8bytes= unescape(encodeURIComponent(unicodecharacters)); how can I achieve the same result after browsers stop supporting unescape method? Thanks.
  • Amit Patil
    Amit Patil about 7 years
    @acabra85: ultimately something like the TextEncoder/TextDecoder APIs from w3.org/TR/encoding. Support isn't there right now though and I wouldn't worry about escape/unescape going away for a long time.
  • onassar
    onassar about 3 years
    This was what I needed, but it was super important for me to reverse the order. To replace the + symbols first, and then decode with decodeURIComponent. In my case, this was important because I was dealing with email addresses. As a result, the + symbol in email addresses were getting replaced with spaces which is incorrect. Email addresses don't allow spaces, but plus symbols are allowed. Hope this helps someone else :)