HttpClient and non-ASCII URL characters (á,é,í,ó,ú)
Solution 1
Looking at the documentation of HttpMethodBase, it appears that all String
parameters have to be pre-encoded. The simplest solution is to constructor your URL in stages, with setPath()
and the variant of setQueryString()
that takes an array of name-value parameters.
Solution 2
I would recommend using UrlEncoder
to encode your queryString values (not the whole queryString).
UrlEncoder.encode("Categoría:Mejoras de las Botas", "UTF-8");
Comments
-
ianmartorell almost 2 years
'Long time reader, first time poster' here.
I'm in the process of making a bot for a spanish Wiki I administer. I wanted to make it from scratch, since one of the purposes of me making it is to practice Java. However, I ran into some trouble when trying to make GET requests with HttpClient to URIs that contain non-ASCII characters such as á,é,í,ó or ú.
String url = "http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas" method = new GetMethod(url); client.executeMethod(method);
When I do the above, GetMethod complains about the URI:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid uri 'http://es.pruebaloca.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras%20de%20las%20Botas&cmlimit=500&format=xml': Invalid query at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222) at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89) at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:69) at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:120) at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38) at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58) at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Note that in the URI shown in the stack trace, spaces are encoded into
%20
and theí
s are left as is. That exact same URI works perfectly on a browser, but I can't get around into GetMethod accepting it.I've also tried doing the following:
URI uri = new URI(url, false); method = new GetMethod(uri.getEscapedURI()); client.executeMethod(method);
This way,
URI
escaped thei
s, but double escaped the spaces (%2520
)...http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categor%C3%ADa:Mejoras%2520de%2520las%2520Botas&cmlimit=500&format=xml
Now, if I don't use any spaces in the query, there's no double escaping and I get the desired output. So if there wasn't any possibility of non-ASCII characters, I wouldn't need to use the
URI
class and wouldn't get the double escaping. In an attempt to avoid the first escaping of the spaces, I tried this:URI uri = new URI(url, true); method = new GetMethod(uri.getEscapedURI()); client.executeMethod(method);
But the
URI
class didn't like it:org.apache.commons.httpclient.URIException: Invalid query at org.apache.commons.httpclient.URI.parseUriReference(URI.java:2049) at org.apache.commons.httpclient.URI.<init>(URI.java:167) at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:66) at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:121) at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38) at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58) at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80) Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:39) at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58) at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Any input on how to avoid this double escaping would be greatly appreciated. I've lurked all around with absolutely no luck.
Thanks!
Edit: The solution that works best for me is parsifal's one, but, as an addition, I'd like to say that setting the path with
method.setPath(url)
madeHttpMethod
reject a cookie I needed to save:Aug 26, 2011 4:07:08 PM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders WARNING: Cookie rejected: "wikicities_session=900beded4191ff880e09944c7c0aaf5a". Illegal path attribute "/". Path of origin: "http://es.metroid.wikia.com/api.php"
However, if I send the URI to the constructor and forget about the
setPath(url)
, the cookie gets saved without problem.String url = "http://es.metroid.wikia.com/api.php"; NameValuePair[] query = { new NameValuePair("action", "query"), new NameValuePair("list", "categorymembers"), new NameValuePair("cmtitle", "Categoría:Mejoras de las Botas"), new NameValuePair("cmlimit", "500"), new NameValuePair("format", "xml") }; HttpMethod method = null; ... method = new GetMethod(url); // Or PostMethod(url) method.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); // It had been like this the whole time method.setQueryString(query); client.executeMethod(method);
-
JB Nizet over 12 yearsI think every parameter must be encoded separately, else the & and the = will be encoded.
-
ianmartorell over 12 yearsYay! That works perfectly. I was actually already sending the parameters as an
ArrayList<NameValuePair>
, so I didn't have to change much the code. Thanks :) -
ianmartorell over 12 yearsThat works pretty well, but you'd have to encode all query parameters separately. I find parsifal's answer more useful since all
NameValuePair
s get encoded at once withmethod.setQueryString(pairs);
, beingpairs
aNameValuePair[]
. -
ianmartorell over 12 yearsYep, like @JB Nizet says, you have to encode it separately or else you'll get
http://es.metroid.wikia.com/api.php?action%3Dquery%26list%3Dcategorymembers%26cmtitle%3DCategor%C3%ADa%3AMejoras+de+las+Botas
.