How to detect if a string is encoded with escape() or encodeURIComponent()

22,279

Solution 1

Encourage your clients to use encodeURIComponent(). See this page for an explanation: Comparing escape(), encodeURI(), and encodeURIComponent(). If you really want to try to figure out exactly how something was encoded, you can try to look for some of the characters that escape() and encodeURI() do not encode.

Solution 2

This won't help in the server-side, but in the client-side I have used javascript exceptions to detect if the url encoding has produced ISO Latin or UTF8 encoding.

decodeURIComponent throws an exception on invalid UTF8 sequences.

try {
     result = decodeURIComponent(string);
}
catch (e) {
     result =  unescape(string);                                       
}

For example, ISO Latin encoded umlaut 'ä' %E4 will throw an exception in Firefox, but UTF8-encoded 'ä' %C3%A4 will not.

See Also

Solution 3

I realize this is an old question, but I am unaware of a better solution. So I do it like this (thanks to a comment by RobertPitt above):

function isEncoded(str) {
    return typeof str == "string" && decodeURIComponent(str) !== str;
}

I have not yet encountered a case where this failed. Which doesn't mean that case doesn't exists. Maybe someone could shed some light on this.

Solution 4

Thanks for @mika for great answer. Maybe just one improvement since unescape function is considered as deprecated:

declare function unescape(s: string): string;


decodeURItoString(str): string {

 var resp = str;

 try {
    resp = decodeURI(str);
 } catch (e) {
    console.log('ERROR: Can not decodeURI string!');

    if ( (unescape != null) && (unescape instanceof Function) ) {
        resp = unescape(str);
    }
 }

return resp;

}

Share:
22,279
Rodrigo
Author by

Rodrigo

Updated on August 18, 2020

Comments

  • Rodrigo
    Rodrigo over 3 years

    I have a web service that receives data from various clients. Some of them sends the data encoded using escape(), while the others instead use encodeURIComponent(). Is there a way to detect the encoding used to escape the data?

  • Rodrigo
    Rodrigo over 14 years
    I agree that, but unfortunately I can't force the clients to adopt a encoding standard.
  • RobertPitt
    RobertPitt about 12 years
    also, maybe something like: function isEncoded(str){return decodeURIComponent(str) !== str;}
  • mergen
    mergen almost 8 years
    It'll fail when where's something only partially encoded, like http://google.de/hello%20world woops. Still have to find an elegant way to handle this.
  • krisku
    krisku over 7 years
    This solution has absolutely nothing to do with trying to determine which of escape() or encodeURIComponent() something has been encoded with..
  • krisku
    krisku over 7 years
    They differ wildly in how non-ascii characters are encoded: encodeURIComponent() produces percent encoded UTF-8 sequences while escape() percent encodes the octets (as in ISO-8859-1 bytes).
  • Rehan
    Rehan about 6 years
    @RobertPitt thanks for your idea, it worked for me. :)