Best way to handle security and avoid XSS with user entered URLs

security url xss html-sanitizing

55,935

Solution 1

If you think URLs can't contain code, think again!

https://owasp.org/www-community/xss-filter-evasion-cheatsheet

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
    return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", "");
}

Solution 2

The process of rendering a link "safe" should go through three or four steps:

Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.

If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

Solution 3

Use a library, such as OWASP-ESAPI API:

PHP - http://code.google.com/p/owasp-esapi-php/
Java - http://code.google.com/p/owasp-esapi-java/
.NET - http://code.google.com/p/owasp-esapi-dotnet/
Python - http://code.google.com/p/owasp-esapi-python/

Read the following:

For example:

$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$esapi = new ESAPI( "/etc/php5/esapi/ESAPI.xml" ); // Modified copy of ESAPI.xml
$sanitizer = ESAPI::getSanitizer();
$sanitized_url = $sanitizer->getSanitizedURL( "user-homepage", $url );

Another example is to use a built-in function. PHP's filter_var function is an example:

$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$sanitized_url = filter_var($url, FILTER_SANITIZE_URL);

Using filter_var allows javascript calls, and filters out schemes that are neither http nor https. Using the OWASP ESAPI Sanitizer is probably the best option.

Still another example is the code from WordPress:

http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L2561

Additionally, since there is no way of knowing where the URL links (i.e., it might be a valid URL, but the contents of the URL could be mischievous), Google has a safe browsing API you can call:

https://developers.google.com/safe-browsing/lookup_guide

Rolling your own regex for sanitation is problematic for several reasons:

Unless you are Jon Skeet, the code will have errors.
Existing APIs have many hours of review and testing behind them.
Existing URL-validation APIs consider internationalization.
Existing APIs will be kept up-to-date with emerging standards.

Other issues to consider:

What schemes do you permit (are file:/// and telnet:// acceptable)?
What restrictions do you want to place on the content of the URL (are malware URLs acceptable)?

Solution 4

Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)

Solution 5

You don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library

It is very easy to use, all you need is an include and that is it :)

While you're on the topic, why not given a read on Design Guidelines for Secure Web Applications

If any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Python, ROR, etc)

View more solutions

55,935

Author by

Keith

Keith Henry Chief Software Architect, building offline-first and responsive applications in the recruitment industry. I'm also on Linked In. Email me on Google's email, my address is ForenameSurname.

Updated on July 05, 2022

Comments

Keith almost 2 years
We have a high security application and we want to allow users to enter URLs that other users will see.

This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.

What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?

Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)

Is there an argument for not supporting user entered links at all?

Clarification:

Basically our users want to input:

stackoverflow.com

And have it output to another user:
```
<a href="http://stackoverflow.com">stackoverflow.com</a>
```
What I really worry about is them using this in a XSS hack. I.e. they input:

alert('hacked!');

So other users get this link:
```
<a href="javascript:alert('hacked!');">stackoverflow.com</a>
```
My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.

You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?

I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?
Joel Coehoorn over 15 years

No, they're not, if the URL is displayed back on the page.
warren over 15 years

?? a Uniform Resource Locator is not Javascript, displaying the URL back on the page has nothing to do with Javascript
Peter Burns over 15 years

That's what I used to think, too. Trust me on this: you are wrong. And if you think you're right, you are in big trouble.
Keith over 15 years

Maybe I didn't explain it well enough: User enters "stackoverflow.com" and if we turn that into "<a href="stackoverflow.com">stackoverflow.com</a>" there's the risk introduced. If you just let anything through they can do: "<a href="alert('hacked!');">stackoverflow.com</a>"
warren over 15 years

ok - that I can see being a risk, and in that case, the javascript could be viewed as a url; but, strictly speaking, that's still not a real url (google.com/…)
Keith over 15 years

We're specifically on C# 3.5 and ASP.Net - I'll check that library out.
Keith over 15 years

That's an idea we thought of, definitely secure, but our users are relatively low-tech. They would really like links that they can click.
warren over 15 years

understandable, I prefer them generally, but copy/paste does make me take a couple seconds to decide if I REALLY want to do it
Nick Stinemates over 15 years

Why are we allowing tags? I assume he was referring to turning any instance of: - somesite.com - somesite.com In to <a href="somesite.com">http://somesite.com</a>
Keith over 15 years

I've seen that link before - it's part of what I worry about with this. We have to be very careful as a single XSS hack could cost us a great deal. Your Regex based solution seems to have been working well on SO, certainly. Would you consider it safe for, say, banking applications?
balexandre over 15 years

not so well I might say Keith, <a href="stackoverflow.com/questions/209327/… does not accept special chars in the URL</a>, that with URL Rewriting are safe to pass like: <pre>gynækologen.dk/Undersøgelser_og_behandlinger.aspx</…>
Cédric Guillemette over 15 years

This is not enough. Unless I'm missing something, this string would pass through the filter: javascript:alert('hacked')
Cédric Guillemette over 15 years

Even this would get through: javascript:while(true)alert('Hacked!'); I've tested a couple places here on SO and it looks like SanatizeUrl is only part of the solution.
Kornel over 15 years

This set of characters still allows a lot of code. Lack of '"' can be worked around with /xxx/.source.
Kornel over 15 years

Whitelist is neccessary, because IE allows tab characters in protocol, i.e. java&x09script: works in IE and bypasses blacklists.
Earlz about 13 years

Doesn't this prevent users providing the (customary) http:// prefix to all web addresses?
Keith about 11 years

Cheers, but the problem here is that OWASP isn't Jon Skeet either. I don't want to roll my own, my real question is about the extent to which any of these can be relied on. I'll check out the OWASP one, but definitely don't trust any security built in to PHP!
Dave Jarvis about 11 years

If you can, try the Google Safe Browsing API. It might not be appropriate for your situation, but if the source code is available it could serve as an excellent starting point.
Keith almost 11 years

I don't see why that would help, there isn't a problem with code executing on the server. The problem is that the code looks like a link to the server, but executes malicious XSS when the user clicks on it. My question is whether (given the huge variety of possible attack permutations) there can ever be a check strict enough to be certain that XSS content cannot get through.
Shashi over 10 years

Whatever I have gathered from my understanding is that, there is always a way to overcome the XSS filtering.
Keith over 10 years

Nothing is 100% safe, but our customers want high security and user entered links and I want to know the best way to do that.
antinome over 10 years

Five years later, I see no responses to the comments that give examples of how this answer is insecure. Yet it is the highest-voted answer on the highest-voted question (that I could find) on this topic! Given how awesome stackoverflow usually is, I'm surprised that I'm still not sure how to securely implement this relatively common scenario.
antinome over 10 years

This is the only answer with actual code that hasn't been pointed out to be insecure. IMHO, the best answer.
Kenji about 9 years

"This is the only answer with actual code that hasn't been pointed out to be insecure. IMHO, the best answer." Nope, it is not. filter_var($url, FILTER_SANITIZE_URL); allows e.g. javascript:alert();
ForguesR about 9 years

Those examples aren't URL. An URL has a protocol and a ressource name.
migueldiab over 8 years

Also, link seems to be no longer live. Mirrors (www).jb51.net/tools/xss.htm (beware of that link that might have some weird JS in it) and wayback machine web.archive.org/web/20110216074326/http://ha.ckers.org/xss.h‌tml
Kyle Pittman about 5 years

Link appears to be dead, and at one time it seems that it redirected to owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet
Bogdan about 5 years

I looked into using this library to sanitize urls but couldn't find any action to do that for .net. This is where I took the code and documentation from code.google.com/archive/p/owasp-esapi-dotnet/downloads, the project itself looks stale
Bogdan about 5 years

@DaveJarvis I think the link you mentioned is not related to this question, maybe you meant this docs.microsoft.com/en-us/aspnet/core/security/… . The link related to XSS covers good coding practices which will avoid avoid using nontrusted urls. Unfortunately for older big projects this is more expensive to verify and change.
Ramesh Pareek over 4 years

how to do that in js
Abhishek Kamal about 3 years

In validation, only check for what you need (it'll will be small & simple). Do not check for what you not need (it'll will be long code & hard to get all possible outcomes).
dfrankow about 3 years

which functions in w3lib? maybe safe_url_string? w3lib.readthedocs.io/en/latest/…
Neil over 2 years

I tried to use safe_url_string on a malicous URL for XXE attack and it didn't sanitize