ASP.Net URL Encoding

12,550

Solution 1

You should consider having a table off of your category/department table which has a unique URL for each category. Then you can use a special routine to generate the URLs. This can be a SQL scalar function, or a CLR function, but one of the things it would do is normalize the URL for the web. You can convert "Beverage & Bar" to "Beverage-And-Bar" and "Pastry / Decorating" to "Pastry-Decorating". Mainly, the routine needs to replace all invalid HTTP URL characters with something else. An example is this:

public static class URL
{
    static readonly Regex feet = new Regex(@"([0-9]\s?)'([^'])", RegexOptions.Compiled);
    static readonly Regex inch1 = new Regex(@"([0-9]\s?)''", RegexOptions.Compiled);
    static readonly Regex inch2 = new Regex(@"([0-9]\s?)""", RegexOptions.Compiled);
    static readonly Regex num = new Regex(@"#([0-9]+)", RegexOptions.Compiled);
    static readonly Regex dollar = new Regex(@"[$]([0-9]+)", RegexOptions.Compiled);
    static readonly Regex percent = new Regex(@"([0-9]+)%", RegexOptions.Compiled);
    static readonly Regex sep = new Regex(@"[\s_/\\+:.]", RegexOptions.Compiled);
    static readonly Regex empty = new Regex(@"[^-A-Za-z0-9]", RegexOptions.Compiled);
    static readonly Regex extra = new Regex(@"[-]+", RegexOptions.Compiled);

    public static string PrepareURL(string str)
    {
        str = str.Trim().ToLower();
        str = str.Replace("&", "and");

        str = feet.Replace(str, "$1-ft-");
        str = inch1.Replace(str, "$1-in-");
        str = inch2.Replace(str, "$1-in-");
        str = num.Replace(str, "num-$1");

        str = dollar.Replace(str, "$1-dollar-");
        str = percent.Replace(str, "$1-percent-");

        str = sep.Replace(str, "-");

        str = empty.Replace(str, string.Empty);
        str = extra.Replace(str, "-");

        str = str.Trim('-');
        return str;
    }
}

You could make this a SQL enhance function, or run URL generation as a separate process. Then to implement mapping, you would map the entire URL directly to a category ID. This approach is better in the long run for several reasons. First, you are not always generating URLs, you do this once and they stay static, you don't have to worry about your procedure changing, and then GoogleBot not being able to find old URLs. Also, if you get a collision, you may notice a potential duplicate category name, because a collision would only be different by special characters. Finally, you can always view your URLs from the database, without having to run the mapping function.

Solution 2

I have a url rewrite i implement in the global.asax file in the begin authenticated request as I have some security. This is where I take the raw url and then do the db look up. this then rewrites the path to the aspx page and all the parameters are passed through the query string. No encoding is necessary.

However if you are using the url to actually change data then i can see that you will have huge problems as you are effectively using the http GET to change database. It is usually concidered a bad idead, and not something i do.

I only use a post request to do any databse manipulation. This keeps the url clean as all the data is in the page form.

The only issue i had was to set the correct url to the page.form.action which in most cases is the raw url.

If its the category names that are causing the issue then perhaps you should restrict the names to alpha numeric characters only and swap spaces for "-". IIS will throw a wobbly with periods "." as it looks for file names.

P.S. IIS does not understand the tilde "~", this is something that the compiler understands. so if you use it in an anchor tag it will not work as expected and you should use the application root instead of the tilde.

Edit:

OK, it looks like an issue with IIS having issues with certain characters such as . / and &. Even if you do urlencode these IIS will still try to implement its own meanings. As such consider removing them so:

Beverage & bar becomes BeverageBar

Pastry / decorating becomes PastryDecorating.

This will keep you urls clean, but does mean an extra column in the database so you can cheack the url against this shortened category name.

Solution 3

I'm having the exact same problem. Thanks for writing it up so nicely. It actually helped me to understand the problem better.

I had some other considerations however. One of the goals I have is to support the potential for any characters to be in the url which is based on the title of an article. Additionally I want to ensure uniqueness in the encoding and a two way encode / decode process.

So I did some manual encoding to solve the problem. This won't completely eliminate percent encoding, but will greatly reduce it and keep users from generating an inaccessible url. My process starts with using the Server.URLEncode function. But this doesn't eliminate the problems in the url. Because IIS is decoding the url and then passing it to the application, certain characters will break it with a dangerous request exception. These characters include +, &, /, !, *, ., ( and ). So on those characters plus other characters I would like to make more readable I do a double encoding for a more usable url. Encoding is also hard because of the limited number of characters that are allowed in an url. So prior to encoding I made all letters capital and then did the encoding with lower case. This keeps it from being totally decodable, but I can easily do a match in the database or in code by making the value I wish to match be upper case.

Well, here is my code. Feedback would be appreciated. Oh ya, this is in VB, but things should transfer over to C# easy enough.

Dim strReturn As String = Trim(strStringToEncode)
strReturn = Server.UrlEncode(strReturn)

strReturn = strReturn.Replace("-", "dash").Replace("+", "-")

strReturn = strReturn.Replace("%26", "and").
                    Replace("%2f", "or").
                    Replace("!", "excl").
                    Replace("*", "star").
                    Replace("%27", "apos").
                    Replace("(", "lprn").
                    Replace(")", "rprn").
                    Replace("%3b", "semi").
                    Replace("%3a", "coln").
                    Replace("%40", "at").
                    Replace("%3d", "eq").
                    Replace("%2b", "plus").
                    Replace("%24", "dols").
                    Replace("%25", "pct").
                    Replace("%2c", "coma").
                    Replace("%3f", "query").
                    Replace("%23", "hash").
                    Replace("%5b", "lbrk").
                    Replace("%5d", "rbrk").
                    Replace(".", "dot").
                    Replace("%3e", "gt").
                    Replace("%3c", "lt")

Return strReturn
Share:
12,550
Kelly Robins
Author by

Kelly Robins

...

Updated on June 30, 2022

Comments

  • Kelly Robins
    Kelly Robins almost 2 years

    I am implementing URL rewriting in ASP.net and my URLs are causing me a world of problems.

    The URL is generated from a database of departments & categories. I want employees to be able to add items to the database with whatever special characters are appropriate without it breaking the site.

    I am encoding the data before I construct the URLs.

    There are several problems...

    1. IIS decodes the URL before it reaches .net making it impossible to properly parse anything with a "/" in it.
    2. ASP.net gets confused by the url making "~" useless within certain pages
    3. I migrated from the built in test server to my local IIS server (XP machine) and any URL containing an encoded & (%26) gives me a "Bad Request" error.
    4. UrlEncode leaves some breaking characters untouched such as '.'

    I did have two other related posts on this subject, at the time I only saw the small problems not the big problem upstream. I've found some registry tricks to solve the "Bad Request" issue but I'm going to be deploying to a shared hosting environment making that useless. I also know that this is a fix for some security issue so I don't want to necessarily bypass it without knowing what can of worms I'm opening.

    Rather than trying to force .net to pass me the raw url, or override IIS settings i'd like to make truly safe URLs in the first place.

    I'll note i've tried AntiXss.URLEncode, HttpUtility.URLEncode, URI.EscapeDataString. I've even tried stupid things like double URLEncodng. Is there a utility that does what I need, or do i really need to roll my own. I'm even considering doing something Hacky like replacing the % with an unusual string of characters. The end result should be at least readable which was the point of using URL rewriting in the first place.

    Sorry for the long post- I just wanted to make sure that I've included all the necessary details. I can't seem to find any relevant information on this, and it seems like it would be a common problem - so maybe I'm missing something big. Thanks for your help, and patience with the long explanation!


    Edit for clarity:

    When I say the urls are being built from a database what I mean is that the directory structure is contstructed from the departments and categories in my database.

    Some Example URLS -

    Mystore/Refrigeration/Bar+Fridge.aspx
    Mystore/Cooking+Equipment.aspx
    Mystore/Kitchen/Cutting+Boards.asxpx

    The problems come in when I use a department like "Beverage & Bar" or "Pastry/Decorating" to construct my URL. Despite being encoded first these cause the aforementioned issues.

    My handlers are already implemented and working fine except for the special character encoding issues.

  • Kelly Robins
    Kelly Robins over 14 years
    Sorry I should have been clearer- I am not doing any database manipulation with my URLs. My store is broken down into departments and categories. Rather than being hard coded the directory structure is built from the database. The various menus have links of the form Mystore/Department or Mystore/Department/Category that while encoded and technically correct are being broken by IIS before the request even makes it back to my httpHandler.
  • Kelly Robins
    Kelly Robins over 14 years
    That could be the best solution. I may have just been massively over-complicating things. My only concern is that i'm going to need to be able to lookup items from the URL which could be complicated by a non-reversible method of encoding. My only other idea was to use Uri.EscapeDataString(b).Replace("%", "_") which i'm fairly sure would condemn me to programmer hell. Thank you very much for your fast responses and help on this.. I'm taking another look at my code to see if this will work.
  • Kelly Robins
    Kelly Robins over 14 years
    That is absolutely perfect. Thank you very much, you saved me more time than I care to admit.
  • Kelly Robins
    Kelly Robins over 14 years
    Thank you very much for your help. This is one of those times where I am profoundly frustrated that I can't accept multiple answers. You pointed me in the right direction and got me back on track with this... Thank you!!
  • Nate
    Nate over 13 years
    Already found a problem. URL Scan rejects the single smart quote.
  • Nate
    Nate over 13 years
    Found many quotes that make urlscan mad. This will help fix it. Replace("%e2%80%99", "rsquo"). Replace("%e2%80%98", "lsquo"). Replace("%e2%80%9d", "rdquo"). Replace("%e2%80%9c", "ldquo"). Replace("%e2%80%9b", "lsrquo"). Replace("%e2%80%9f", "ldrquo").
  • Kelly Robins
    Kelly Robins over 12 years
    Thanks for the info though the issue was more that urlencode/decode didn't work as either asp.net or iis was still rejecting the encoded urls. I think I ended up using a substitution scheme instead but this was a while a go so I'm a bit fuzzy.
  • Frédéric
    Frédéric about 9 years
    Have a look at web.config parameters like requestFiltering allowDoubleEscaping="true" (stackoverflow.com/a/1453287/1178314) and httpRuntime requestValidationMode="2.0" relaxedUrlToFileSystemMapping="true" requestPathInvalidCharacters="". In my use case, it allows me to support many more characters in urls.