Escape HTML using jQuery

69,192

Solution 1

That's a pretty standard way of doing it, my version used a <div> though:

return $('<div/>').text(t).html();

This isn't technically 100% safe though as Mike Samuel notes but it is probably pretty safe in practice.

The current Prototype.js does this:

function escapeHTML() {
    return this.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;');
}

But it used to use the "put text in a div and extract the HTML" trick.

There's also _.escape in Underscore, that does it like this:

// List of HTML entities for escaping.
var htmlEscapes = {
  '&': '&amp;',
  '<': '&lt;',
  '>': '&gt;',
  '"': '&quot;',
  "'": '&#x27;',
  '/': '&#x2F;'
};

// Regex containing the keys listed immediately above.
var htmlEscaper = /[&<>"'\/]/g;

// Escape a string for HTML interpolation.
_.escape = function(string) {
  return ('' + string).replace(htmlEscaper, function(match) {
    return htmlEscapes[match];
  });
};

That's pretty much the same approach as Prototype's. Most of the JavaScript I do lately has Underscore available so I tend to use _.escape these days.

Solution 2

There is no guarantee that html() will be completely escaped so the result might not be safe after concatenation.

html() is based on innerHTML, and a browser could, without violating lots of expectations, implement innerHTML so that $("<i></i>").text("1 <").html() is "1 <", and that $("<i></i>").text("b>").html() is "b>".

Then if you concatenate those two individually safe results, you get "1 <b>" which will obviously not be the HTML version of the concatenation of the two plaintext pieces.

So, this method is not safe by deduction from first principles, and there's no widely followed spec of innerHTML (though HTML5 does address it).

The best way to check if it does what you want is to test corner cases like this.

Solution 3

That should work. That's basically how the Prototype.js library does it, or at least how it used to do it. I generally do it with three calls to ".replace()" but that's mostly just a habit.

Share:
69,192
Michael Mior
Author by

Michael Mior

I received my Masters from the University of Toronto researching database scalability in the cloud. I spent a couple years working as a Web developer at Bunch working on frontend HTML/CSS/JS, backend PHP/Python/MySQL, and iPhone app (Objective-C). I recently completed my PhD in the data systems group at the University of Waterloo. I have now joined the faculty of the Rochester Institute of Technology as an Assistant Professor.

Updated on September 21, 2020

Comments

  • Michael Mior
    Michael Mior over 3 years

    I came up with a hack to escape HTML using jQuery and I'm wondering if anyone sees a problem with it.

    $('<i></i>').text(TEXT_TO_ESCAPE).html();
    

    The <i> tag is just a dummy as jQuery needs a container to set the text of.

    Is there perhaps an easier way to do this? Note that I need the text stored in a variable, not for display (otherwise I could just call elem.text(TEXT_TO_ESCAPE);).

    Thanks!

  • Michael Mior
    Michael Mior about 13 years
    Actually, I want $("<i></i>").text("1 <").html() to be "1 &lt;" and $("<i></i>").text("b>").html() to be "b&gt;". (which works)
  • Mike Samuel
    Mike Samuel about 13 years
    @Michael, if you've tested it on major browsers, and it works, great. As of 15 June, 2009, a current version of Safari actually unescaped &gt; so <input name="Hello&gt;World"> was returned via innerHTML as <input name="Hello>World">. That may have been fixed in modern browsers though. My point is that testing is the way to gain confidence.
  • Mike Samuel
    Mike Samuel about 13 years
    A lot of libraries do this. Just be aware that the result here is safe to embed in a PCDATA context and an RCDATA context, but not an attribute context since quotes are not escaped. If you might be susceptible to UTF-7 attacks and the like you should also escape '+': en.wikipedia.org/wiki/UTF-7#Security
  • mu is too short
    mu is too short about 13 years
    @Mike: I don't think the .text(t).html() or Prototype's replace approaches are really that great, both approaches have problems. The lack of a encodeHTML() function in the standard JavaScript library is a gaping hole and a rather surprising oversight.
  • Marcel Korpel
    Marcel Korpel about 13 years
    @muis: I don't think so: the core JavaScript language is not specifically aimed at web browsers.
  • mu is too short
    mu is too short about 13 years
    @Marcel: But we do have encodeURIComponent and JavaScript's roots are in web browsers. And, the fact that everyone ends up writing their own indicates that there is a gap in the standard libraries.
  • Michael Mior
    Michael Mior about 13 years
    @muis Thanks for the pointer to Prototype. It turns out that my proposed method doesn't work as I expect in some browsers (read: IE)
  • mu is too short
    mu is too short about 13 years
    @Michael: Which IE version screws it up? I haven't noticed any problems but that doesn't mean there aren't any.
  • Michael Mior
    Michael Mior about 13 years
    IE8. It actually mostly works. Perhaps this is expected behaviour, but newlines were stripped from the string which was returned.
  • mu is too short
    mu is too short about 13 years
    @Michael: You can't really depend on whitespace being preserved unless you're using <pre> blocks but, as usual, whatever works.
  • Michael Mior
    Michael Mior about 13 years
    Fortunately when working with a string, it's safe to just do the replacing. I was thrown at first because FF and Chrome seem to preserve whitespace just fine.
  • StephenD
    StephenD about 11 years
    For those interested in the relative performance of the different approaches, you can see the results at: jsperf.com/encode-html-entities In essence, the regular expression approach appears to be the general winner, closely followed by the multiple replace() approach. However, the regular expression approach probably scales better than replace() if you escaping more than the basic three: &,<,>