How to find out if string has already been URL encoded?
Solution 1
Decode, compare to original. If it does differ, original is encoded. If it doesn't differ, original isn't encoded. But still it says nothing about whether the newly decoded version isn't still encoded. A good task for recursion.
I hope one can't write a quine in urlencode, or this algorithm would get stuck.
Exception: When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded
Solution 2
Use regexp to check if your string contains illegal characters (i.e. characters which cannot be found in URL-encoded string, like whitespace).
Solution 3
Try decoding the url. If the resulting string is shorter than the original then the original URL was already encoded, else you can safely encode it (either it is not encoded, or even post encoding the url stays as is, so encoding again will not result in a wrong url). Below is sample pseudo (inspired by ruby) code:
# Returns encoded URL for any given URL after determining whether it is already encoded or not
def escape(url)
unescaped_url = URI.unescape(url)
if (unescaped_url.length < url.length)
return url
else
return URI.escape(url)
end
end
Solution 4
Joel on software had a solution for this sometime back - http://www.joelonsoftware.com/articles/Wrong.html
Or You may add some prefix to the Strings.
Solution 5
You can't know for sure, unless your strings conform to a certain pattern, or you keep track of your strings. As you noted by yourself, a String that is encoded can also be encoded, so you can't be 100% sure by looking at the string itself.
Comments
-
Trick about 4 years
How could I check if string has already been encoded?
For example, if I encode
TEST==
, I getTEST%3D%3D
. If I again encode last string, I getTEST%253D%253D
, I would have to know before doing that if it is already encoded...I have encoded parameters saved, and I need to search for them. I don't know for input parameters, what will they be - encoded or not, so I have to know if I have to encode or decode them before search.
-
Trick about 14 yearsYou gave me the idea, how to do this. Now my SQL looks like
SELECT * FROM something WHERE param= " + param + " OR param = "+encode(param)
-
Trick about 14 yearsI did not do this, but this is the solution.
-
SF. about 14 yearsSo how will you differentiate between
hello%20world
andinterest20%growth
? The first is a valid urlencoded string, the other is a string that has to be escaped and does not produce a valid unescape. -
sverkerw about 14 yearsHow do you know that you don't need
SELECT * FROM something WHERE param= " + param + " OR param = "+encode(param) + " OR param = "+encode(encode(param))
? That way lies infinite regress. -
SF. about 14 yearswell, true except of a case where "good enough" is enough; if the 0.01% of users really want the program not to work, it won't work for them. Sometimes the extra, extreme clauses are just not worth the effort and the overhead.
-
benrifkah about 12 yearsThis fails if your string contains windows variable names like
%DESCRIPTION%
which decodes toÞSCRIPTION%
or%ABOUT%
which becomes«OUT%
. -
SF. about 12 years@benrifkah: true but then there is no way to tell them apart if the input is completely arbitrary.
-
benrifkah about 12 years@SF. Indeed. I posted the caveat to expose the issue so that people could act according to their needs.
-
benrifkah about 12 yearsChecking for illegal characters does not include the percent symbol because it is not illegal it just gets escaped. When you check for the percent symbol you may have a URI encoded string if it is followed by "25". This only works if you know that your input is either not encoded or encoded exactly 1 time and that the input does not naturally include sequences that URI encoding generates.
-
stan over 10 years@SF. : This will fail if the initial unencoded string contains a + character in the middle. The decoded string will contain a space character instead and it will not be equal. A better way would be to compare the lengths. If the original string is larger than the decoded string, then the original was encoded.
-
stan over 10 years@SF. : But my attempt above also doesn't say anything about whether the newly decoded version isn't encoded as well.
-
ceiroa about 10 yearsIt doesn't work if the raw string contains a plus sign. You decode it, compare to the original, and the strings are different. The + has been replaced with space. You end up not encoding it, even though you should.
-
Chris Geirman over 9 yearsUnfortunately, this was NOT the solution. I'm passing a URL as the url encrypted string, so I did an REFind(':', str) and it returns 6 (https:) whether the string is encrypted or not.
-
Paul Kienitz over 8 yearsIf a string contains invalid chars, you can prove it is not encoded, but if it contains only valid chars and percent signs, that does not prove that it is encoded. That is not knowable. So this may be as good a check as one can realistically do.
-
Kristian Ivanov almost 8 yearsYou sir, saved me in a moment, when my brain has stopped and couldn't figure the following before reading your comment if( contactNumber != Uri.encode(contactNumber)){ contactNumber = Uri.encode(contactNumber); }
-
Prabhath Suminda over 7 yearsThis is worng. When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded. see docs.oracle.com/javase/6/docs/api/java/net/URLDecoder.html
-
Florian K almost 6 yearsthis won't work if the url is encoded in the way that a
' '
(space) is replaced by a'+'
because the length then stays the same -
amit_saxena almost 6 yearsProbably it's better to encode your URLs only as %20. The pros are described here: stackoverflow.com/a/2678602/762747 If that's not a possibility, then may be you can check for + signs after ?, and if you find any, then the URL is already encoded and you can return it as is. It's just an extra check to the above code, depending on your use case.
-
Darrin about 4 yearsThis doesn't logically work if using java.net.URLDecode.decode(String, String) implementation. Reason: If the string contains "%xy" where x is a letter, such as "s". If you try to decode such an unencoded string, it results in throwing IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value").
-
Darrin about 4 yearsActually, the logic of attempting to "decode" something that you have not passed through some filter logic is always just a bad design. This bad design encourages more bad design, such as choosing to try and catch all "Throwable" exceptions and ignore them. Doing that adds time to process and then ignore exceptions, or worse, hide an exception that could have been useful in diagnosing a real problem. Just bad, very bad. ;-)
-
Darrin about 4 yearsNope. Test it with an unencoded string containing "%s" in it. The exception will make code designed like this fail to execute due to InvalidArgumentException that is caused by an invalid "%xy" where xy are supposed to be hex digits. Same problem as the accepted answer, and one that tempts additional poor design flaws, such as ignoring unknown exception types.
-
Aleksei Bulgak about 4 yearsDoes not work if url contains any query parameters with encoded urls inside. Like log-in links with redirect info inside or query params
-
Tooraj Jam over 3 yearsThis fails on urls containing none BASIC_LATIN characters too.
-
cppxaxa over 3 yearsThanks for the answer, confirming, it works as expected in Java with spring