Ruby: How to escape url with square brackets [ and ]?

14,666

Solution 1

encode doesn't escape brackets because they aren't special -- they have no special meaning in the path part of a URI, so they don't actually need escaping.

If you want to escape chars other than just the "unsafe" ones, pass a second arg to the encode method. That arg should be a regex matching, or a string containing, every char you want encoded (including chars the function would otherwise already match!).

Solution 2

You can escape [ with %5B and ] with %5D.

Your URL will be:

URL.gsub("[","%5B").gsub("]","%5D")

I don't like that solution but it's working.

Solution 3

If using a third-party gem is an option, try addressable.

require "addressable/uri"

url = Addressable::URI.parse("http://[::1]/path[]").normalize!.to_s
#=> "http://[::1]/path%5B%5D"

Note that the normalize! method will not only escape invalid characters but also perform casefolding on the hostname part, unescaping on unnecessarily escaped characters and the like:

uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F").normalize!
url = uri.to_s #=> "http://example.org/path%5B%5D?query%5B%5D=/"

So, if you just want to normalize the path part, do as follows:

uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F")
uri.path = uri.normalized_path
url = uri.to_s #=> "http://Example.ORG/path%5B%5D?query[]=%2F"

Solution 4

According to new IP-v6 syntax there could be urls like this:

http://[1080:0:0:0:8:800:200C:417A]/index.html

Because of this we should escape [] only after host part of the url:

if url =~ %r{\[|\]}
  protocol, host, path = url.split(%r{/+}, 3)
  path = path.gsub('[', '%5B').gsub(']', '%5D') # Or URI.escape(path, /[^\-_.!~*'()a-zA-Z\d;\/?:@&%=+$,]/)
  url = "#{protocol}//#{host}/#{path}"
end
Share:
14,666
foobar
Author by

foobar

Updated on June 05, 2022

Comments

  • foobar
    foobar almost 2 years

    This url:

    http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-[nsfw]
    

    should be:

    http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-%5Bnsfw%5D
    

    But when I pass the first one into URI.encode, it doesn't escape the square brackets. I also tried CGI.escape, but that escapes all the '/' as well.

    What should I use to escape URLS properly? Why doesn't URI.encode escape square brackets?