Will HTML Encoding prevent all kinds of XSS attacks?

57,274

Solution 1

No.

Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.

For instance, consider server-generated client-side javascript - the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.

Next, consider the following pseudocode:

<input value=<%= HtmlEncode(somevar) %> id=textbox>

Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to

a onclick=alert(document.cookie)

the resulting output is

<input value=a onclick=alert(document.cookie) id=textbox>

which would clearly work. Obviously, this can be (almost) any other script... and HtmlEncode would not help much.

There are a few additional vectors to be considered... including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).

Also don't forget about UTF-7 type attacks - where the attack looks like

+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-

Nothing much to encode there...

The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you're output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or... etc.

If you're using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.

Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.

Oh, and don't forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you'll still have UTF-7 vulnerabilities...

Some more information, and a pretty definitive list (constantly updated), check out RSnake's Cheat Sheet: http://ha.ckers.org/xss.html

Solution 2

If you systematically encode all user input before displaying then yes, you are safe you are still not 100 % safe.
(See @Avid's post for more details)

In addition problems arise when you need to let some tags go unencoded so that you allow users to post images or bold text or any feature that requires user's input be processed as (or converted to) un-encoded markup.

You will have to set up a decision making system to decide which tags are allowed and which are not, and it is always possible that someone will figure out a way to let a non allowed tag to pass through.

It helps if you follow Joel's advice of Making Wrong Code Look Wrong or if your language helps you by warning/not compiling when you are outputting unprocessed user data (static-typing).

Solution 3

If you encode everything it will. (depending on your platform and the implementation of htmlencode) But any usefull web application is so complex that it's easy to forget to check every part of it. Or maybe a 3rd party component isn't safe. Or maybe some code path that you though did encoding didn't do it so you forgot it somewhere else.

So you might want to check things on the input side too. And you might want to check stuff you read from the database.

Solution 4

As mentioned by everyone else, you're safe as long as you encode all user input before displaying it. This includes all request parameters and data retrieved from the database that can be changed by user input.

As mentioned by Pat you'll sometimes want to display some tags, just not all tags. One common way to do this is to use a markup language like Textile, Markdown, or BBCode. However, even markup languages can be vulnerable to XSS, just be aware.

# Markup example
[foo](javascript:alert\('bar'\);)

If you do decide to let "safe" tags through I would recommend finding some existing library to parse & sanitize your code before output. There are a lot of XSS vectors out there that you would have to detect before your sanitizer is fairly safe.

Solution 5

No, just encoding common HTML tokens DOES NOT completely protect your site from XSS attacks. See, for example, this XSS vulnerability found in google.com:

http://www.securiteam.com/securitynews/6Z00L0AEUE.html

The important thing about this type of vulnerability is that the attacker is able to encode his XSS payload using UTF-7, and if you haven't specified a different character encoding on your page, a user's browser could interpret the UTF-7 payload and execute the attack script.

Share:
57,274
Niyaz
Author by

Niyaz

I hang out here.

Updated on July 27, 2021

Comments

  • Niyaz
    Niyaz almost 3 years

    I am not concerned about other kinds of attacks. Just want to know whether HTML Encode can prevent all kinds of XSS attacks.

    Is there some way to do an XSS attack even if HTML Encode is used?

  • verix
    verix over 15 years
    Or if Javascript is being used somehow to alter the user input for GUI purposes. I came across an XSS vulnerability that, at first, encoded <> to < and >... but when passed to this function, they were replaced again! So... there goes your XSS prevention, I guess. :)
  • Erik
    Erik over 15 years
    It is of course wrong in the first place to write <input value=<%= HtmlEncode(somevar) %> id=textbox> and not <input value="<%= HtmlEncode(somevar)" %> id=textbox> if you do not know if the tekst contains e.g. a blank.
  • AviD
    AviD over 15 years
    That's exactly the point - HTMLEncode does not protect you against mistakes. Of course, the programmer expected somevar to contain 23 - its just that nasty attacker that decided to shove a blank in...
  • AviD
    AviD over 15 years
    While this includes a good point regarding bypassing some tags, the answer to the question is wrong. See my answer...
  • Pat
    Pat over 15 years
    Added a comment to the OP so he accepts your answer instead. And added a link in my post to your answer, just in case.
  • Espo
    Espo over 15 years
    It would not help to enclose it, image that SOMEVAR includes this text | " onclick="alert();" " | it will then render like a valid tag.
  • Adam Tuliper
    Adam Tuliper about 13 years
    Espo - Im late to the game on this - but it surely helps to enclose and encode - as in your example htmlencoding it (a quote) will yield: &quot; and thus will be onclick=&quot;alert()
  • AviD
    AviD about 13 years
    @Adam, the proper solution here is to attribute-encode it (in addition to enclosing it), instead of html-encode. The context is different, so the encoding rules are different too - html encoding won't help you here.
  • brianary
    brianary over 10 years
    @AviD Given an HtmlEncode() function that does encode quote characters (as the ASP code in the example would), can you provide an example of an XSS value for <input value="<%= HtmlEncode(value) %>" id=textbox> ?