Java - Convert String to valid URI object

156,382

Solution 1

You might try: org.apache.commons.httpclient.util.URIUtil.encodeQuery in Apache commons-httpclient project

Like this (see URIUtil):

URIUtil.encodeQuery("http://www.google.com?q=a b")

will become:

http://www.google.com?q=a%20b

You can of course do it yourself, but URI parsing can get pretty messy...

Solution 2

Android has always had the Uri class as part of the SDK: http://developer.android.com/reference/android/net/Uri.html

You can simply do something like:

String requestURL = String.format("http://www.example.com/?a=%s&b=%s", Uri.encode("foo bar"), Uri.encode("100% fubar'd"));

Solution 3

I'm going to add one suggestion here aimed at Android users. You can do this which avoids having to get any external libraries. Also, all the search/replace characters solutions suggested in some of the answers above are perilous and should be avoided.

Give this a try:

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.

This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.

The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.

Solution 4

Even if this is an old post with an already accepted answer, I post my alternative answer because it works well for the present issue and it seems nobody mentioned this method.

With the java.net.URI library:

URI uri = URI.create(URLString);

And if you want a URL-formatted string corresponding to it:

String validURLString = uri.toASCIIString();

Unlike many other methods (e.g. java.net.URLEncoder) this one replaces only unsafe ASCII characters (like ç, é...).


In the above example, if URLString is the following String:

"http://www.domain.com/façon+word"

the resulting validURLString will be:

"http://www.domain.com/fa%C3%A7on+word"

which is a well-formatted URL.

Solution 5

If you don't like libraries, how about this?

Note that you should not use this function on the whole URL, instead you should use this on the components...e.g. just the "a b" component, as you build up the URL - otherwise the computer won't know what characters are supposed to have a special meaning and which ones are supposed to have a literal meaning.

/** Converts a string into something you can safely insert into a URL. */
public static String encodeURIcomponent(String s)
{
    StringBuilder o = new StringBuilder();
    for (char ch : s.toCharArray()) {
        if (isUnsafe(ch)) {
            o.append('%');
            o.append(toHex(ch / 16));
            o.append(toHex(ch % 16));
        }
        else o.append(ch);
    }
    return o.toString();
}

private static char toHex(int ch)
{
    return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}

private static boolean isUnsafe(char ch)
{
    if (ch > 128 || ch < 0)
        return true;
    return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
}
Share:
156,382
Vaerenberg
Author by

Vaerenberg

Developing and consulting for iOS and some Android since 2008. Worked on multiple top-selling and featured apps currently available on the App Store.

Updated on July 05, 2022

Comments

  • Vaerenberg
    Vaerenberg almost 2 years

    I am trying to get a java.net.URI object from a String. The string has some characters which will need to be replaced by their percentage escape sequences. But when I use URLEncoder to encode the String with UTF-8 encoding, even the / are replaced with their escape sequences.

    How can I get a valid encoded URL from a String object?

    http://www.google.com?q=a b gives http%3A%2F%2www.google.com... whereas I want the output to be http://www.google.com?q=a%20b

    Can someone please tell me how to achieve this.

    I am trying to do this in an Android app. So I have access to a limited number of libraries.

  • Vaerenberg
    Vaerenberg about 15 years
    Thanks Hans. I am trying to do this in an Android app. So I have access to a limited number of libraries. Do you have any other suggestions? Thanks again
  • Hans Doggen
    Hans Doggen about 15 years
    Perhaps you could have a look at the source of the URIUtil class (it is open source after all). I would assume that it is possible to extract the necessary code from that class.
  • cutts
    cutts about 13 years
    This would give google.com?q=a+b not google.com?q=a%20b as desired.
  • MrCranky
    MrCranky about 13 years
    Ah, yes, found that myself a few weeks afterwards. Will modify answer to reflect what we actually end up using
  • mindas
    mindas almost 13 years
    This does not work (at least in some cases). E.g. character 'Š' is encoded as '%M1', but should be encoded as '%C5%A0'.
  • Gray
    Gray almost 13 years
    This also doesn't work for characters such as tab. I would suggest that this be changed to be unsafe if it doesn't match [A-Za-z0-9_-.~]. See en.wikipedia.org/wiki/Percent-encoding
  • kentcdodds
    kentcdodds about 12 years
    Doesn't handle converting a question mark (I tried it with the URL: http://www.google.com/Do you like Spam? and it took care of the spaces, but not the question mark at the end)
  • Abdo
    Abdo over 11 years
    Thanks a lot! It's ridiculous how long it takes sometimes to find a simple Java function!
  • Bogdan Zurac
    Bogdan Zurac about 11 years
    Unfortunately, the encode() method is crap when trying to encode forward slashes ("/"). I just used a plain old String.replace() to get the job done. That was very lame... searchQuery.replace("/", "%2f");
  • Aidanc
    Aidanc over 10 years
    This method is now depreciated, users should specify a method on encoding see: docs.oracle.com/javase/1.4.2/docs/api/java/net/URLEncoder.ht‌​ml
  • MrCranky
    MrCranky over 10 years
    True, I missed that. Answer amended.
  • Michael Plautz
    Michael Plautz over 9 years
    This is the answer that I was looking for and requires no dependency on outside libraries.
  • dgiugg
    dgiugg over 9 years
    The pointed project (Apache commons-httpclient) "is now end of life". It has been in part replaced by HttpComponents-httpclient but I could not manage to find an equivalent method in the new API.
  • Sarp Kaya
    Sarp Kaya about 9 years
    I agree with dgiugg. The answer is deprecated.
  • Sarp Kaya
    Sarp Kaya about 9 years
    No this is wrong answer. URLDecoder.decode("to convert","UTF-8") returns "to convert" and URLDecoder.decode("to%20convert","UTF-8") returns "to convert". So this does the opposite of what the question is asking.
  • Daniel
    Daniel almost 9 years
    IT seems like it does not exist for new versions of the apache commits -httpclient
  • Daniel
    Daniel almost 9 years
  • Ramin
    Ramin over 8 years
    your answer was the one I was looking for, I couldn't extract the parameter for various reasons and this is the only method that truly worked.
  • Junior Mayhé
    Junior Mayhé over 8 years
    And everybody should also have a look at documentation when dealing with exceptions developer.android.com/reference/java/net/…
  • Sebas
    Sebas over 8 years
    @kentcdodds it's because the question mark is legal in this case. I'm sure that if you add another one after, it would be converted
  • behelit
    behelit over 6 years
    This doesn't seem to convert quotes? i.e. ' "
  • dgiugg
    dgiugg over 6 years
    @behelit True that, just checked. However, ' is a safe character. But " raises an exception! Same thing with java.net.URL.