How to find out if string has already been URL encoded?

69,402

Solution 1

Decode, compare to original. If it does differ, original is encoded. If it doesn't differ, original isn't encoded. But still it says nothing about whether the newly decoded version isn't still encoded. A good task for recursion.

I hope one can't write a quine in urlencode, or this algorithm would get stuck.

Exception: When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded

Solution 2

Use regexp to check if your string contains illegal characters (i.e. characters which cannot be found in URL-encoded string, like whitespace).

Solution 3

Try decoding the url. If the resulting string is shorter than the original then the original URL was already encoded, else you can safely encode it (either it is not encoded, or even post encoding the url stays as is, so encoding again will not result in a wrong url). Below is sample pseudo (inspired by ruby) code:

# Returns encoded URL for any given URL after determining whether it is already encoded or not
    def escape(url)
      unescaped_url = URI.unescape(url)
      if (unescaped_url.length < url.length)
        return url
      else
        return URI.escape(url)
      end
    end

Solution 4

Joel on software had a solution for this sometime back - http://www.joelonsoftware.com/articles/Wrong.html
Or You may add some prefix to the Strings.

Solution 5

You can't know for sure, unless your strings conform to a certain pattern, or you keep track of your strings. As you noted by yourself, a String that is encoded can also be encoded, so you can't be 100% sure by looking at the string itself.

Share:
69,402
Trick
Author by

Trick

There a feeling I get, when I look to the west.

Updated on April 16, 2020

Comments

  • Trick
    Trick about 4 years

    How could I check if string has already been encoded?

    For example, if I encode TEST==, I get TEST%3D%3D. If I again encode last string, I get TEST%253D%253D, I would have to know before doing that if it is already encoded...

    I have encoded parameters saved, and I need to search for them. I don't know for input parameters, what will they be - encoded or not, so I have to know if I have to encode or decode them before search.

  • Trick
    Trick about 14 years
    You gave me the idea, how to do this. Now my SQL looks like SELECT * FROM something WHERE param= " + param + " OR param = "+encode(param)
  • Trick
    Trick about 14 years
    I did not do this, but this is the solution.
  • SF.
    SF. about 14 years
    So how will you differentiate between hello%20world and interest20%growth ? The first is a valid urlencoded string, the other is a string that has to be escaped and does not produce a valid unescape.
  • sverkerw
    sverkerw about 14 years
    How do you know that you don't need SELECT * FROM something WHERE param= " + param + " OR param = "+encode(param) + " OR param = "+encode(encode(param))? That way lies infinite regress.
  • SF.
    SF. about 14 years
    well, true except of a case where "good enough" is enough; if the 0.01% of users really want the program not to work, it won't work for them. Sometimes the extra, extreme clauses are just not worth the effort and the overhead.
  • benrifkah
    benrifkah about 12 years
    This fails if your string contains windows variable names like %DESCRIPTION% which decodes to ÞSCRIPTION% or %ABOUT% which becomes «OUT%.
  • SF.
    SF. about 12 years
    @benrifkah: true but then there is no way to tell them apart if the input is completely arbitrary.
  • benrifkah
    benrifkah about 12 years
    @SF. Indeed. I posted the caveat to expose the issue so that people could act according to their needs.
  • benrifkah
    benrifkah about 12 years
    Checking for illegal characters does not include the percent symbol because it is not illegal it just gets escaped. When you check for the percent symbol you may have a URI encoded string if it is followed by "25". This only works if you know that your input is either not encoded or encoded exactly 1 time and that the input does not naturally include sequences that URI encoding generates.
  • stan
    stan over 10 years
    @SF. : This will fail if the initial unencoded string contains a + character in the middle. The decoded string will contain a space character instead and it will not be equal. A better way would be to compare the lengths. If the original string is larger than the decoded string, then the original was encoded.
  • stan
    stan over 10 years
    @SF. : But my attempt above also doesn't say anything about whether the newly decoded version isn't encoded as well.
  • ceiroa
    ceiroa about 10 years
    It doesn't work if the raw string contains a plus sign. You decode it, compare to the original, and the strings are different. The + has been replaced with space. You end up not encoding it, even though you should.
  • Chris Geirman
    Chris Geirman over 9 years
    Unfortunately, this was NOT the solution. I'm passing a URL as the url encrypted string, so I did an REFind(':', str) and it returns 6 (https:) whether the string is encrypted or not.
  • Paul Kienitz
    Paul Kienitz over 8 years
    If a string contains invalid chars, you can prove it is not encoded, but if it contains only valid chars and percent signs, that does not prove that it is encoded. That is not knowable. So this may be as good a check as one can realistically do.
  • Kristian Ivanov
    Kristian Ivanov almost 8 years
    You sir, saved me in a moment, when my brain has stopped and couldn't figure the following before reading your comment if( contactNumber != Uri.encode(contactNumber)){ contactNumber = Uri.encode(contactNumber); }
  • Prabhath Suminda
    Prabhath Suminda over 7 years
    This is worng. When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded. see docs.oracle.com/javase/6/docs/api/java/net/URLDecoder.html
  • Florian K
    Florian K almost 6 years
    this won't work if the url is encoded in the way that a ' '(space) is replaced by a '+' because the length then stays the same
  • amit_saxena
    amit_saxena almost 6 years
    Probably it's better to encode your URLs only as %20. The pros are described here: stackoverflow.com/a/2678602/762747 If that's not a possibility, then may be you can check for + signs after ?, and if you find any, then the URL is already encoded and you can return it as is. It's just an extra check to the above code, depending on your use case.
  • Darrin
    Darrin about 4 years
    This doesn't logically work if using java.net.URLDecode.decode(String, String) implementation. Reason: If the string contains "%xy" where x is a letter, such as "s". If you try to decode such an unencoded string, it results in throwing IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value").
  • Darrin
    Darrin about 4 years
    Actually, the logic of attempting to "decode" something that you have not passed through some filter logic is always just a bad design. This bad design encourages more bad design, such as choosing to try and catch all "Throwable" exceptions and ignore them. Doing that adds time to process and then ignore exceptions, or worse, hide an exception that could have been useful in diagnosing a real problem. Just bad, very bad. ;-)
  • Darrin
    Darrin about 4 years
    Nope. Test it with an unencoded string containing "%s" in it. The exception will make code designed like this fail to execute due to InvalidArgumentException that is caused by an invalid "%xy" where xy are supposed to be hex digits. Same problem as the accepted answer, and one that tempts additional poor design flaws, such as ignoring unknown exception types.
  • Aleksei Bulgak
    Aleksei Bulgak about 4 years
    Does not work if url contains any query parameters with encoded urls inside. Like log-in links with redirect info inside or query params
  • Tooraj Jam
    Tooraj Jam over 3 years
    This fails on urls containing none BASIC_LATIN characters too.
  • cppxaxa
    cppxaxa over 3 years
    Thanks for the answer, confirming, it works as expected in Java with spring