Escaping HTML entities in JavaScript string literals within the <script> block

javascript html escaping

35,804

Solution 1

The following characters could interfere with an HTML or Javascript parser and should be escaped in string literals: <, >, ", ', \, and &.

In a script block using the escape character, as you found out, works. The concatenation method (</scr' + 'ipt>') can be hard to read.

var s = 'Hello <\/script>';

For inline Javascript in HTML, you can use entities:

<div onClick="alert('Hello &quot;>')">click me</div>

Demo: http://jsfiddle.net/ThinkingStiff/67RZH/

The method that works in both <script> blocks and inline Javascript is \uxxxx, where xxxx is the hexadecimal character code.

< - \u003c
> - \u003e
" - \u0022
' - \u0027
\ - \u005c
& - \u0026

Demo: http://jsfiddle.net/ThinkingStiff/Vz8n7/

HTML:

<div onClick="alert('Hello \u0022>')">click me</div>

<script>
    var s = 'Hello \u003c/script\u003e';
alert( s );
</script>

Solution 2

I'd say the best practice would be avoiding inline JS in the first place.

Put the JS code in a separate file and include it with the src attribute

<script src="path/to/file.js"></script>

and use it to set event handlers from the inside isntead of putting those in the HTML.

//jquery example
$('div.something').on('click', function(){
    alert('Hello>');
})

Solution 3

Here's how I do it:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

var myString='Encode HTML entities!\n"Safe" escape <script></'+'script> & other tags!';

test.value=encode(myString);

testing.innerHTML=encode(myString);

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/

<textarea id=test rows="9" cols="55"></textarea>

<div id="testing">www.WHAK.com</div>

Solution 4

(edit - somehow didn't notice you mentioned slash-escape in your question already...)

OK so you know how to escape a slash.

In inline event handlers, you can't use the bounding character inside a literal, so use the other one:

<div onClick='alert("Hello \"")'>test</div>

But this is all in aid of making your life difficult. Just don't use inline event handlers! Or if you absolutely must, then have them call a function defined elsewhere.

Generally speaking, there are few reasons for your server-side code to be writing javascript. Don't generate scripts from the server - pass data to pre-written scripts instead.

(original)

You can escape anything in a JS string literal with a backslash (that is not otherwise a special escape character):

var s = 'Hello <\/script>';

This also has the positive effect of causing it to not be interpreted as html. So you could do a blanket replace of "/" with "\/" to no ill effect.

Generally, though, I am concerned that you would have user-submitted data embedded as a string literal in javascript. Are you generating javascript code on the server? Why not just pass data as JSON or an HTML "data" attribute or something instead?

View more solutions

35,804

Author by

mojuba

Passed the Turing Test.

Updated on July 27, 2020

Comments

mojuba almost 4 years
On the one hand if I have
```
<script>
var s = 'Hello </script>';
console.log(s);
</script>
```
the browser will terminate the <script> block early and basically I get the page screwed up.

On the other hand, the value of the string may come from a user (say, via a previously submitted form, and now the string ends up being inserted into a <script> block as a literal), so you can expect anything in that string, including maliciously formed tags. Now, if I escape the string literal with htmlentities() when generating the page, the value of s will contain the escaped entities literally, i.e. s will output
```
Hello &lt;/script&gt;
```
which is not desired behavior in this case.

One way of properly escaping JS strings within a <script> block is escaping the slash if it follows the left angle bracket, or just always escaping the slash, i.e.
```
var s = 'Hello <\/script>';
```
This seems to be working fine.

Then comes the question of JS code within HTML event handlers, which can be easily broken too, e.g.
```
<div onClick="alert('Hello ">')"></div>
```
looks valid at first but breaks in most (or all?) browsers. This, obviously requires the full HTML entity encoding.

My question is: what is the best/standard practice for properly covering all the situations above - i.e. JS within a script block, JS within event handlers - if your JS code can partly be generated on the server side and can potentially contain malicious data?