Do I really need to remove special characters in a URL?

5,577

Solution 1

Modern browsers will automatically encode the special characters in the URL before requesting it. So you are already using encoded characters, you just don't know it.

I used http://www.url-encode-decode.com/ to encode the file portion of your URL (using UTF-8):

http://www.mydomain.com/downloads/Some+Band+-+En+fran%C3%A7ais+avec+des+caract%C3%A8res+sp%C3%A9ciaux+%282013%29+%5B7%27%27+EP%5D.zip

That should be what browsers are sending when you link without the encoding. For compatibility with older browsers you should URL encode all your links.

Solution 2

Yes, for uniform compatibility with different browsers and internet accessible applications, you would need to encode all of the following in a URL:

  • Spaces
  • ASCII Control characters
  • Non-ASCII characters
  • Reserved characters
  • Unsafe characters

For more information as to what these are, see this: What characters need to be encoded and why?

Since it seems that you know what the URL's are, you can try to use online URL encoders like the one in the link above, or in the following link, which also provides information about URL encoding: Url Encode/Decode online

Then test the URL's in as many browsers as possible to confirm they are working before shaing them. You can download several different browsers (e.g., Chrome, Firefox, and Opera) and install them on the same computer for testing.

As you become more familiar with which characters need to be encoded, you can replace or remove them in the names of your files prior to uploading.

Solution 3

...share the downloads in a music forum

This is really just adding to the existing answers... the URL needs to be encoded at some point, either implicitly by the browser (or forum software) or explicitly by you.

You specifically mention you are sharing these links in a forum. Many forums automatically encode links in forum posts, so you might not have to explicitly encode this yourself - but this will depend on the forum.

Stack exchange (markdown) encodes links to a certain extent, but will fail on the unencoded spaces (as will a lot of forum software) if you simply type the unencoded URL into the post and allow the forum to auto-detect the URL. However, if the forum has a specific prompt for embedding links then it might cope with this OK, as it does when using the toolbar option on Stack Exchange:

NOTE TO EDITORS: Please don't "correct" the (broken) links below, or surround in <pre> tags
- the links are meant to be broken or viewed as-is; it is serving as an example!

Link typed manually

(As you can see, it is broken at the first space)

[link typed manually unencoded](http://www.example.com/downloads/Some Band - En français avec des caractères spéciaux (2013) [7'' EP].zip)

Link entered using the hyperlink option on the toolbar

link is correctly encoded by the forum software

The above link is encoded as:

<a href="http://www.example.com/downloads/Some%20Band%20-%20En%20fran%C3%A7ais%20avec%20des%20caract%C3%A8res%20sp%C3%A9ciaux%20%282013%29%20%5B7%27%27%20EP%5D.zip" rel="nofollow">link is correctly encoded by the <em>forum</em> software</a>
Share:
5,577

Related videos on Youtube

djointster
Author by

djointster

Updated on September 18, 2022

Comments

  • djointster
    djointster over 1 year

    I have an FTP account shared with friends where we upload underground music albums and then we use the links to share the downloads in a music forum. The problem is that the album names are in french so there is a lot of special characters in the name.

    So the URL looks like http://www.mydomain.com/downloads/Some Band - En français avec des caractères spéciaux (2013) [7'' EP].zip

    For me it works perfectly and I can download the file by using this URL, but I have read everywhere that special chars are bad in URL.

    Is there any reason why I must remove the special characters or encode the URL? Is everyone able to access a URL with special characters or will some older browsers not be able to download the files?

    I really don't care about SEO or anything else. I just want the download links to work for everyone.

    Since the files are uploaded through FTP, I can't use PHP to remove the special characters with a regex, so I really don't know what to do.

    • unor
      unor almost 11 years
      Asked the same question there too: stackoverflow.com/q/17119689/1591669
    • dan
      dan almost 11 years
      It's considered a duplicate question. I'd suggest deleting it there because you already have answers here, and this site is more appropriate for non-programming questions like this.
    • cl-r
      cl-r almost 11 years
      Java have a URLEncoder URLDecoder classes to do this.
  • djointster
    djointster almost 11 years
    Why does wikipedia uses accents like éèà and parentheses in the URL if it must be encoded ?
  • djointster
    djointster almost 11 years
    Also, is there a place where i can find which browsers will not support non-encoded URL ?
  • dan
    dan almost 11 years
    Accents and parentheses are included under HTML ASCII Characters: nationalfinder.com/html/char-asc.htm I'm not sure if there's a list like that or not, however, depending on your environment, you might want to consider that devices or apps with limited browsers (e.g., a stripped down WebKit or HTML viewer) might also not support characters that need to be encoded.
  • MrWhite
    MrWhite over 8 years
    That tool does not seem to encode the URL properly? (The output from that tool is consistent with just passing the source string through PHP's urlencode() function - which is not correct.) Spaces in the path part of the URL should be percent-encoded as %20, not a + (plus), as shown above. The + should only be used to encode spaces in the query string part. A + in the path part of the URL is a literal +, so is likely to result in a 404.
  • MrWhite
    MrWhite over 8 years
    The second tool linked to above (url-encode-decode.com) is not intended to be used to encode an entire URL, it simply URL encodes the submitted text (it does not parse the URL in any way). That tool is only suitable for encoding submitted form data (the query string).