Escaping HTML entities in JavaScript string literals within the <script> block

35,804

Solution 1

The following characters could interfere with an HTML or Javascript parser and should be escaped in string literals: <, >, ", ', \, and &.

In a script block using the escape character, as you found out, works. The concatenation method (</scr' + 'ipt>') can be hard to read.

var s = 'Hello <\/script>';

For inline Javascript in HTML, you can use entities:

<div onClick="alert('Hello &quot;>')">click me</div>

Demo: http://jsfiddle.net/ThinkingStiff/67RZH/

The method that works in both <script> blocks and inline Javascript is \uxxxx, where xxxx is the hexadecimal character code.

  • < - \u003c
  • > - \u003e
  • " - \u0022
  • ' - \u0027
  • \ - \u005c
  • & - \u0026

Demo: http://jsfiddle.net/ThinkingStiff/Vz8n7/

HTML:

<div onClick="alert('Hello \u0022>')">click me</div>

<script>
    var s = 'Hello \u003c/script\u003e';
alert( s );
</script>   

Solution 2

I'd say the best practice would be avoiding inline JS in the first place.

Put the JS code in a separate file and include it with the src attribute

<script src="path/to/file.js"></script>

and use it to set event handlers from the inside isntead of putting those in the HTML.

//jquery example
$('div.something').on('click', function(){
    alert('Hello>');
})

Solution 3

Here's how I do it:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

var myString='Encode HTML entities!\n"Safe" escape <script></'+'script> & other tags!';

test.value=encode(myString);

testing.innerHTML=encode(myString);

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/
<textarea id=test rows="9" cols="55"></textarea>

<div id="testing">www.WHAK.com</div>

Solution 4

(edit - somehow didn't notice you mentioned slash-escape in your question already...)

OK so you know how to escape a slash.

In inline event handlers, you can't use the bounding character inside a literal, so use the other one:

<div onClick='alert("Hello \"")'>test</div>

But this is all in aid of making your life difficult. Just don't use inline event handlers! Or if you absolutely must, then have them call a function defined elsewhere.

Generally speaking, there are few reasons for your server-side code to be writing javascript. Don't generate scripts from the server - pass data to pre-written scripts instead.

(original)

You can escape anything in a JS string literal with a backslash (that is not otherwise a special escape character):

var s = 'Hello <\/script>';

This also has the positive effect of causing it to not be interpreted as html. So you could do a blanket replace of "/" with "\/" to no ill effect.

Generally, though, I am concerned that you would have user-submitted data embedded as a string literal in javascript. Are you generating javascript code on the server? Why not just pass data as JSON or an HTML "data" attribute or something instead?

Share:
35,804
mojuba
Author by

mojuba

Passed the Turing Test.

Updated on July 27, 2020

Comments

  • mojuba
    mojuba almost 4 years

    On the one hand if I have

    <script>
    var s = 'Hello </script>';
    console.log(s);
    </script>
    

    the browser will terminate the <script> block early and basically I get the page screwed up.

    On the other hand, the value of the string may come from a user (say, via a previously submitted form, and now the string ends up being inserted into a <script> block as a literal), so you can expect anything in that string, including maliciously formed tags. Now, if I escape the string literal with htmlentities() when generating the page, the value of s will contain the escaped entities literally, i.e. s will output

    Hello &lt;/script&gt;
    

    which is not desired behavior in this case.

    One way of properly escaping JS strings within a <script> block is escaping the slash if it follows the left angle bracket, or just always escaping the slash, i.e.

    var s = 'Hello <\/script>';
    

    This seems to be working fine.

    Then comes the question of JS code within HTML event handlers, which can be easily broken too, e.g.

    <div onClick="alert('Hello ">')"></div>
    

    looks valid at first but breaks in most (or all?) browsers. This, obviously requires the full HTML entity encoding.

    My question is: what is the best/standard practice for properly covering all the situations above - i.e. JS within a script block, JS within event handlers - if your JS code can partly be generated on the server side and can potentially contain malicious data?