Java URL encoding of query string parameters
Solution 1
URLEncoder
is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character &
nor the parameter name-value separator character =
.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString()
as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8"
.
Note that spaces in query parameters are represented by +
, not %20
, which is legitimately valid. The %20
is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?
), not in query string (the part after ?
).
Also note that there are three encode()
methods. One without Charset
as second argument and another with String
as second argument which throws a checked exception. The one without Charset
argument is deprecated. Never use it and always specify the Charset
argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
Solution 2
I would not use URLEncoder
. Besides being incorrectly named (URLEncoder
has nothing to do with URLs), inefficient (it uses a StringBuffer
instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder
or Spring's org.springframework.web.util.UriUtils.encodeQuery
or Commons Apache HttpClient
.
The reason being you have to escape the query parameters name (ie BalusC's answer q
) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24
Since I'm just linking to other answers I marked this as a community wiki. Feel free to edit.
Solution 3
You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url= new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that Uri to ASCII string:
urlStr=uri.toASCIIString();
Now your url string is completely encoded first we did simple url encoding and then we converted it to ASCII String to make sure no character outside US-ASCII are remaining in string. This is exactly how browsers do.
Solution 4
Guava 15 has now added a set of straightforward URL escapers.
Solution 5
URL url= new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL
for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere)
to Punycode encode the host name!
4. Use java.net.URI.toASCIIString()
to percent-encode, NFC encoded unicode - (better would be NFKC!). For more info see: How to encode properly this URL
In some cases it is advisable to check if the url is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامهای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the testcases provided by Web Plattform Tests.
user1277546
Updated on July 28, 2022Comments
-
user1277546 almost 2 years
Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried
URLEncoder
and creating URI/URL objects but none of them come out quite right. -
2rs2ts over 9 yearsThese suffer from the same goofy escaping rules as
URLEncoder
. -
Luis Sep over 9 yearsWhy does it have nothing to do with URLs?
-
BalusC over 9 years@Luis:
URLEncoder
is as its javadoc says intented to encode query string parameters conformapplication/x-www-form-urlencoded
as described in HTML spec: w3.org/TR/html4/interact/…. Some users indeed confuse/abuse it for encoding whole URIs, like the current answerer apparently did. -
Adam Gent over 9 years@LuisSep in short URLEncoder is for encoding for form submission. It is not for escaping. Its not the exact same escaping that you would use to create URLs to be put in your web page but happens to be similar enough that people abuse it. The only time you should be using URLEncoder is if your writing a HTTP client (and even then there are far superior options for encoding).
-
Adam Gent over 9 years@BalusC "Some users indeed confuse/abuse it for encoding whole URIs, like the current answerer apparently did.". You assumed wrong. I never said I screwed up with it. I have just seen others that have done it, who's bugs I have to fix. The part that I screwed up is that the Java URL class will accept unescaped brackets but not the URI class. There are a lot of way to screw up constructing URLs and not everyone is brilliant like you. I would say that most users that are looking on SO for URLEncoding probably are "users indeed confuse/abuse" URI escaping.
-
BalusC over 9 yearsQuestion wasn't about that yet your answer implies that.
-
Adam Gent over 9 yearsYeah it is. He is concatenating strings to make a URL. URLEncoder is for encoding to the mime type not for making URLs. I have no reputation interest (hence I marked this a wiki post and only want to steer people correctly).
-
user11153 about 9 yearsThanks! It's stupid that your solution works, but built-in
URL.toURI()
doesn't. -
Emmanuel Touzery about 9 yearsnot sure they have the problem. they differentiate for instance "+" or "%20" to escape " " (form param or path param) which
URLEncoder
doesn't. -
ZioByte about 9 yearsUnfortunately this doesn't seem to work with "file:///" (e.g.: "file:///some/directory/a file containing spaces.html"); it bombs with MalformedURLException in "new URL()"; any idea how to fix this?
-
M Abdul Sami about 9 yearsYou need to do something like this: String urlStr = "some/directory/a file containing spaces.html"; URL url= new URL(urlStr); URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef()); urlStr=uri.toASCIIString(); urlStr.replace("http://","file:///"); I have not tested it, but I think it will work.... :)
-
Rudy_TM about 9 yearsI have used this since I started in Android, but today I noticed that the + and the = in the query strings don't get encoded, any solution?
-
M Abdul Sami about 9 yearswhere do you want to have + and = sign in url ? can you give an example of such url ?
-
tibi over 8 yearsis it also possible to decode it back to the original?
-
M Abdul Sami over 8 years@tibi you can simply use uri.toString() method to convert it to string instead of Ascii string.
-
Paul Taylor over 8 yearsThis worked for me I just replaced call to URLEncoder() to call to UrlEscapers.urlFragmentEscaper() and it worked, not clear if I should be using UrlEscapers.urlPathSegmentEscaper() instead.
-
Paul Taylor over 8 yearsActually it didnt work for me because unlike URLEncoder it doesnt encode '+' it leaves it alone, server decodes '+' as space whereas if I use URLEncoder '+'s are converted to %2B and correctly decoded back to +
-
rmuller almost 8 yearsThis is not using the standard Java API. So please specify library used.
-
mgaert almost 7 yearsLink update: UrlEscapers
-
sharadendu sinha almost 7 yearsThere can be 2 types of parameters in URL. Query string ( followed by ?) and path parameter (Typically part of URL itself). So, what about path parameters. URLEncoder produces + for space even for path parameters. In-fact it just does not handles anything other than query string. Also, this behavior is not in sync with node js servers. So for me this class is a waste and cannot be used other than for very specific / special scenarios.
-
BalusC almost 7 years@sharadendusinha: as documented and answered,
URLEncoder
is for URL-encoded query parameters conformapplication/x-www-form-urlencoded
rules. Path parameters don't fit in this category. You need an URI encoder instead. -
Adam Gent almost 7 yearsAs I predicted would happen ... users getting confused because obviously the problem is people need to encode more than just the parameter value. Its a very rare case that you only need to encode a parameter value. Its why I provided my "confused" wiki answer to help folks like @sharadendusinha.
-
Julian Honma over 6 yearsThe API I was working with didn't accept the
+
replacement for spaces, but accepted the %20 so this solution worked better than BalusC, thanks! -
Armand over 6 yearsNot sure why, but I just stumbled on this. URLEncoder is correct (see W3C HTML 4.01 Specification). The URL passed query parameters are to be encoded using the "default content type" of application/x-www-form-urlencoded. Regardless of what the mime type was intended for, the rules of encoding for the mime type, is exactly what is expected for the query parameters. in the OP's question "....?q="+value, the value must be encoded to this mime type. The use or URLEncoder may not be the most efficient, but it used in this case for exactly it's intended purpose.
-
Adam Gent over 6 yearsThe URL specification is separate from the HTML spec. It's also has changed and just like HTML there is a newer spec.
-
user207421 over 6 yearsThis is a correct way to encode the path component of the URL. It is not a correct way to encode query parameter names or values, which is what the question is about.
-
user207421 over 6 yearsNot correct. You have to encode the parameter names and values separately. Encoding the entire query string will also encode the
=
and&
separators, which is not correct. -
Raj Kumar Samala about 6 yearshow to fix this using JSP tags ?
-
Wijay Sharma about 6 yearsWhy should I not encode the entire url (instead of just query parameters) ?
-
BalusC about 6 years@WijaySharma: Because URL-specific characters would get encoded as well. You should only do that when you want to pass the entire URL as a query parameter of another URL.
-
Salem Artin over 5 years+1 for providing the link to talisman.org/~erlkonig/misc/…
-
M Abdul Sami over 5 years@user207421it encodes both path components and querry params. Have a look: URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
-
wetjosh almost 5 years" +, not %20" is what I needed to hear. Thank you so much.
-
cppxaxa over 3 yearsFor the spring users, confirming this solution work well !!!
-
Sim almost 3 yearsThis solution does not work! The query parameter are not encoded, only spaces are.
-
Sim almost 3 yearsThis solution does not work (anymore?), as documented special chars such as
-
or.
are NOT encoded. Thus this is a pretty useless function. -
BalusC almost 3 years... for the specific task you had in mind, which is clearly not URL-encoding of query string parameters.
-
basin almost 3 years@Sim on the contrary, it double encodes the query containing escaped ampersands and delimiter ampersands