decodeURIComponent vs unescape, what is wrong with unescape?
Solution 1
What I want to know is what is wrong with escape/unescape ?
They're not “wrong” as such, they're just their own special string format which looks a bit like URI-parameter-encoding but actually isn't. In particular:
- ‘+’ means plus, not space
- there is a special “%uNNNN” format for encoding Unicode UTF-16 code points, instead of encoding UTF-8 bytes
So if you use escape() to create URI parameter values you will get the wrong results for strings containing a plus, or any non-ASCII characters.
escape() could be used as an internal JavaScript-only encoding scheme, for example to escape cookie values. However now that all browsers support encodeURIComponent (which wasn't originally the case), there's no reason to use escape in preference to that.
There is only one modern use for escape/unescape that I know of, and that's as a quick way to implement a UTF-8 encoder/decoder, by leveraging the UTF-8 processing in URIComponent handling:
utf8bytes= unescape(encodeURIComponent(unicodecharacters));
unicodecharacters= decodeURIComponent(escape(utf8bytes));
Solution 2
escape
operates only on characters in the range 0 to 255 inclusive (ISO-8859-1, which is effectively unicode code points representable with a single byte). (*)
encodeURIComponent
works for all strings javascript can represent (which is the whole range of unicode's basic multilingual plane, i e unicode code points 0 to 1,114,111 or 0x10FFFF that cover almost any human writing system in current use).
Both functions produce url safe strings that only use code points 0 to 127 inclusive (US-ASCII), which the latter accomplishes by first encoding the string as UTF-8 and then applying the %XX
hex encoding familiar from escape
, to any code point that would not be url safe.
This is incidentally why you can make a two-funcall UTF-8 encoder/decoder in javascript without any loops or garbage generation, by combining these primitives to cancel out all but the UTF-8-processing side effects, as the unescape
and decodeURIComponent
versions do the same in reverse.
(*) Foot note: Some modern browsers like Google Chrome have been tweaked to produce %uXXXX for the above-255 range of characters escape wasn't originally defined for, but web server support for decoding that encoding is not as well-implemented as decoding the IETF-standardized UTF-8 based encoding.
Solution 3
The best Answer is this it's working online on this website http://meyerweb.com/eric/tools/dencoder/
function decode() {
var obj = document.getElementById('dencoder');
var encoded = obj.value;
obj.value = decodeURIComponent(encoded.replace(/\+/g, " "));
}
Solution 4
Another "modern" use I've run into is parsing a URI-encoded string that may include invalid UTF8 byte sequences. In certain cases decodeURIComponent can throw an exception. You may need to catch this exception and fall back to using unescape.
An example would be 'tür' encoded as 't%FCr' which I've seen Firefox produce (when characters are pasted into the address bar after the ?).
Comments
-
andynormancx almost 2 years
In answering another question I became aware that my Javascript/DOM knowledge had become a bit out of date in that I am still using
escape
/unescape
to encode the contents of URL components whereas it appears I should now be usingencodeURIComponent
/decodeURIComponent
instead.What I want to know is what is wrong with
escape
/unescape
? There are some vague suggestions that there is some sort of problem around Unicode characters, but I can't find any definite explanation.My web experience is fairly biased, almost all of it has been writing big Intranet apps tied to Internet Explorer. That has involved a lot of use of
escape
/unescape
and the apps involved have fully supported Unicode for many years now.So what are the Unicode problems that
escape
/unescape
are supposed to have ? Does anyone have any test cases to demonstrate the problems ?-
Peter Bailey about 15 yearsI think this article covers it pretty well
-
andynormancx about 15 yearsExcellent, just what I wanted. I see the issue is that Mozilla doesn't cope with Unicode in escape, which explains why I haven't run into any problems with it using an IE only app.
-
andynormancx about 15 yearsI am both blessed and cursed by my history of working with IE only Intranet apps. Blessed because I never have to cope with IE/FF differences and cursed for much the same reason.
-
Amit Patil about 15 yearsMozilla and IE both do the same (curious) thing with Unicode, even if the docs don't mention it.
-
Jonathan Day about 12 yearsChrome also struggles with Unicode when using (un)escape...
-
Mick MacCallum almost 10 yearsI know the rules were different when you posted this, but link only answers are frowned upon now-a-days and are usually deleted. However, since you stand to lose a lot from the removal of this answer, I'd like to offer you the chance to save it by editing it to include more information on the subject. (perhaps a summary of what's on the other side of the link) Thank you.
-
-
Alexis Wilke about 10 yearsIt looks like that bug was fixed in Firefox. However, it is not unlikely that some people wrongly encode characters using ISO-8859-1 instead of UTF-8.
-
Curtis Yallop about 8 yearsA great reference: unixpapa.com/js/querystring.html - on deprecated escape/unescape, dumb encodeURI/decodeURI and decodeURIComponent/encodeURIComponent - quirks and how to use it. decodeURIComponent does not convert "+" to space.
-
Matthew Oakley over 7 yearsescape will escape a single quote, whereas encodeURI doesn't. Which makes it useless for my project.
-
acabra85 about 7 yearsbobince- I am currently using exactly that approach to get the utf8bytes= unescape(encodeURIComponent(unicodecharacters)); how can I achieve the same result after browsers stop supporting unescape method? Thanks.
-
Amit Patil about 7 years@acabra85: ultimately something like the TextEncoder/TextDecoder APIs from w3.org/TR/encoding. Support isn't there right now though and I wouldn't worry about escape/unescape going away for a long time.
-
onassar about 3 yearsThis was what I needed, but it was super important for me to reverse the order. To replace the
+
symbols first, and then decode withdecodeURIComponent
. In my case, this was important because I was dealing with email addresses. As a result, the+
symbol in email addresses were getting replaced with spaces which is incorrect. Email addresses don't allow spaces, but plus symbols are allowed. Hope this helps someone else :)