How to unescape html in javascript?

45,310

Solution 1

Change your test string to <b><<&&&</b> to get a better handle on what the risk is... (or better, <img src='http://www.spam.com/ASSETS/0EE75B480E5B450F807117E06219CDA6/spamReg.png' onload='alert(document.cookie);'> for cookie-stealing spam)

See the example at http://jsbin.com/uveme/139/ (based on your example, using prototype for the unescaping.) Try clicking the four different buttons to see the different effects. Only the last one is a security risk. (You can view/edit the source at http://jsbin.com/uveme/139/edit) The example doesn't actually steal your cookies...

  1. If your text is coming from a known-safe source and is not based on any user input, then you are safe.
  2. If you are using createTextNode to create a text node and appendChild to insert that unaltered node object directly into your document, you are safe.
  3. Otherwise, you need to take appropriate measures to ensure that unsafe content can't make it to your viewer's browser.

Note: As pointed out by Ben Vinegar Using createTextNode is not a magic bullet: using it to escape the string, then using textContent or innerHTML to get the escaped text out and doing other stuff with it does not protect you in your subsequent uses. In particluar, the escapeHtml method in Peter Brown's answer below is insecure if used to populate attributes.

Solution 2

A very good read is http://benv.ca/2012/10/4/you-are-probably-misusing-DOM-text-methods/ which explains why the convention wisdom of using createTextNode is actually not secure at all.

A representative example take from the article above of the risk:

function escapeHtml(str) {
    var div = document.createElement('div');
    div.appendChild(document.createTextNode(str));
    return div.innerHTML;
};

var userWebsite = '" onmouseover="alert(\'derp\')" "';
var profileLink = '<a href="' + escapeHtml(userWebsite) + '">Bob</a>';
var div = document.getElementById('target');
div.innerHtml = profileLink;
// <a href="" onmouseover="alert('derp')" "">Bob</a>

Solution 3

Some guesswork for what it's worth.

innerHTML is literally the browser interpretting hte html.

so < becomes the less than symbol becuase that's what would happen if you put < in the html document.

The largest security risk of strings with & is an eval statement, any JSON could make the application insecure. I'm no security expert but if strings remain strings than you should be ok.

This is another way innerHTML is secure the unescaped string is on it's way to becoming html, so theres no risk of it running the javascript.

Solution 4

Try escape and unescape functions available in Javascript

More details : http://www.w3schools.com/jsref/jsref_unescape.asp

Solution 5

As long as your code is creating text nodes, the browser should NOT render anything harmful. In fact, if you inspect the generated text node's source using Firebug or the IE Dev Toolbar, you'll see that the browser is re-escaping the special characters.

give it a

"<script>"

and it re-escapes it to:

"&lt;script&gt;"

There are several types of nodes: Elements, Documents, Text, Attributes, etc.

The danger is when the browser interprets a string as containing script. The innerHTML property is susceptible to this problem, since it will instruct the browser to create Element nodes, one of which could be a script element, or have inline Javascript such as onmouseover handlers. Creating text nodes circumvents this problem.

Share:
45,310
Joseph
Author by

Joseph

Im just a hacker trying to be a better hacker

Updated on July 09, 2022

Comments

  • Joseph
    Joseph almost 2 years

    I'm working with a web service that will give me values like:

    var text = "&lt;&lt;&lt;&amp;&amp;&amp;";
    

    And i need to print this to look like "<<<&&&" with javascript.

    But here's the catch: i can't use inner HTML(I'm actually sending this values to a prototype library that creates Text Nodes so it doesn't unescape my raw html string. If editing the library would not be an option, how would you unescape this html?

    I need to undertand the real deal here, what's the risk of unescaping this type of strings? how does innerHTML does it? and what other options exist?

    EDIT- The problem is not about using javascript normal escape/unescape or even jQuery/prototype implementations of them, but about the security issues that could come from using any of this... aka "They told me it was pretty insecure to use them"

    (For those trying to undertand what the heck im talking about with innerHTML unescaping this weird string, check out this simple example:

    <html>
    <head>
    <title>createTextNode example</title>
    
    <script type="text/javascript">
    
    var text = "&lt;&lt;&lt;&amp;&amp;&amp;";
    function addTextNode(){
        var newtext = document.createTextNode(text);
        var para = document.getElementById("p1");
        para.appendChild(newtext);
    }
    function innerHTMLTest(){
        var para = document.getElementById("p1");
        para.innerHTML = text;
    }
    </script>
    </head>
    
    <body>
    <div style="border: 1px solid red">
    <p id="p1">First line of paragraph.<br /></p>
    </div><br />
    
    <button onclick="addTextNode();">add another textNode.</button>
    <button onclick="innerHTMLTest();">test innerHTML.</button>
    
    </body>
    </html>