Preventing XSS in Node.js / server side javascript

64,005

Solution 1

One of the answers to Sanitize/Rewrite HTML on the Client Side suggests borrowing the whitelist-based HTML sanitizer in JS from Google Caja which, as far as I can tell from a quick scroll-through, implements an HTML SAX parser without relying on the browser's DOM.

Update: Also, keep in mind that the Caja sanitizer has apparently been given a full, professional security review while regexes are known for being very easy to typo in security-compromising ways.

Update 2017-09-24: There is also now DOMPurify. I haven't used it yet, but it looks like it meets or exceeds every point I look for:

  • Relies on functionality provided by the runtime environment wherever possible. (Important both for performance and to maximize security by relying on well-tested, mature implementations as much as possible.)

    • Relies on either a browser's DOM or jsdom for Node.JS.
  • Default configuration designed to strip as little as possible while still guaranteeing removal of javascript.

    • Supports HTML, MathML, and SVG
    • Falls back to Microsoft's proprietary, un-configurable toStaticHTML under IE8 and IE9.
  • Highly configurable, making it suitable for enforcing limitations on an input which can contain arbitrary HTML, such as a WYSIWYG or Markdown comment field. (In fact, it's the top of the pile here)

    • Supports the usual tag/attribute whitelisting/blacklisting and URL regex whitelisting
    • Has special options to sanitize further for certain common types of HTML template metacharacters.
  • They're serious about compatibility and reliability

    • Automated tests running on 16 different browsers as well as three diffferent major versions of Node.JS.
    • To ensure developers and CI hosts are all on the same page, lock files are published.

Solution 2

I've created a module that bundles the Caja HTML Sanitizer

npm install sanitizer

http://github.com/theSmaw/Caja-HTML-Sanitizer

https://www.npmjs.com/package/sanitizer

Any feedback appreciated.

Solution 3

All usual techniques apply to node.js output as well, which means:

  • Blacklists will not work.
  • You're not supposed to filter input in order to protect HTML output. It will not work or will work by needlessly malforming the data.
  • You're supposed to HTML-escape text in HTML output.

I'm not sure if node.js comes with some built-in for this, but something like that should do the job:

function htmlEscape(text) {
   return text.replace(/&/g, '&').
     replace(/</g, '&lt;').  // it's not neccessary to escape >
     replace(/"/g, '&quot;').
     replace(/'/g, '&#039;');
}

Solution 4

I recently discovered node-validator by chriso.

Example

get('/', function (req, res) {

  //Sanitize user input
  req.sanitize('textarea').xss(); // No longer supported
  req.sanitize('foo').toBoolean();

});

XSS Function Deprecation

The XSS function is no longer available in this library.

https://github.com/chriso/validator.js#deprecations

Solution 5

You can also look at ESAPI. There is a javascript version of the library. It's pretty sturdy.

Share:
64,005
Techwraith
Author by

Techwraith

Console Cowboy. Creator of Atomify. CTO at @Getable. Previously at @Yammer and @Storify.

Updated on July 05, 2022

Comments

  • Techwraith
    Techwraith almost 2 years

    Any idea how one would go about preventing XSS attacks on a node.js app? Any libs out there that handle removing javascript in hrefs, onclick attributes,etc. from POSTed data?

    I don't want to have to write a regex for all that :)

    Any suggestions?

  • Techwraith
    Techwraith over 13 years
    Thanks, I've got it basically figured out with regex (yuck) - but I'd love to look into creating a connect middle-ware to sanitize all params.
  • balupton
    balupton over 10 years
    Using require('sanitizer').sanitize strips out all a[href] attributes, rather than just naughty ones. For our use case, we need links to still be accepted (just not naughty links, and other xss naughties etc), any suggestions?
  • Brmm
    Brmm over 10 years
    They removed xss support a month ago.
  • Daniel Flippance
    Daniel Flippance about 10 years
    "You're not supposed to filter input" ... "You're supposed to HTML-escape...output": Do you have any reference for this proposed best practice?
  • Kornel
    Kornel about 10 years
    @DanielFlippance these two points are a logical consequence of "you're supposed to HTML-escape HTML output" and that is the HTML spec.
  • jmnwong
    jmnwong over 9 years
    As pointed out in nealpoole.com/blog/2013/07/… --- you cannot simply use the escape filter to prevent XSS. More details are explained in the OWASP XSS Prevention Cheat Sheet. You should still use the Google Caja Sanitizer.
  • Joseph Lust
    Joseph Lust almost 5 years
    Unfortunately I found this library removed valid CSS markup, like !important.
  • LachoTomov
    LachoTomov over 3 years
    Not filtering user input is a risky "best practice". You're opening the door for developer mistakes and in a large project developer mistakes will happen, so you'll get hacked over and over again. Keep this in mind if you decide to go this way.
  • Kornel
    Kornel over 3 years
    @LachoTomov For catching developer mistakes I suggest using escaping-by-default template engines. Input mangling has two significant downsides: data loss and false sense of security. For example, people can have apostrophes in their names. You can't filter out everything that could possibly be bad in any context, but if obvious things are filtered out developers may be less vigilant about escaping, and smoke tests may pass when they shouldn't.
  • LachoTomov
    LachoTomov over 3 years
    @Kornel sure there are ways to defend against this. But you don't always have control over what other developers use. For ex. if you're building some public API. If it returns unsafe data 50+% of the sites that use it will be hacked. And yeah, you can blame it on the other developers, but that's not the idea - sites are still hacked :) So the right approach depends on the particular use case.