Why use \x3C instead of < when generating HTML from JavaScript?

22,193

Solution 1

When the browser sees </script>, it considers this to be the end of the script block (since the HTML parser has no idea about JavaScript, it can't distinguish between something that just appears in a string, and something that's actually meant to end the script element). So </script> appearing literally in JavaScript that's inside an HTML page will (in the best case) cause errors, and (in the worst case) be a huge security hole.

That's why you somehow have to prevent this sequence of characters to appear. Other common workarounds for this issue are "<"+"/script>" and "<\/script>" (they all come down to the same thing).

While some consider this to be a "bug", it actually has to happen this way, since, as per the specification, the HTML part of the user agent is completely separate from the scripting engine. You can put all kinds of things into <script> tags, not just JavaScript. The W3C mentions VBScript and TCL as examples. Another example is the jQuery template plugin, which uses those tags as well.

But even within JavaScript, where you could suggest that such content in strings could be recognized and thus not be treated as ending tags, the next ambiguity comes up when you consider comments:

<script type="text/javascript">foo(42); // call the function </script>

– what should the browser do in this case?

And finally, what about browsers that don't even know JavaScript? They would just ignore the part between <script> and </script>, but if you gave different semantics to the character sequence </script> based on the browsers knowledge of JavaScript, you'd suddenly have two different results in the HTML parsing stage.

Lastly, regarding your question about substituting all angle brackets: I'd say at least in 99% of the cases, that's for obfuscation, i.e. to hide (from anti-virus software, censoring proxies (like in your example (nested parens are awesome)), etc.) the fact that your JavaScript is doing some HTML-y stuff. I can't think of good technical reasons to hide anything but </script>, at least not for reasonably modern browsers (and by that, I mean pretty much anything newer than Mosaic).

Solution 2

Some parsers handle the < version as the closing tag and interpret the code as

<script>
  window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">
</script>

\x3C is hexadecimal for <. Those are interchangable within the script.

Share:
22,193
Mark Whitaker
Author by

Mark Whitaker

I'm a freelance software developer in the UK, working mostly on mobile, .NET and web projects. You can find me at mainwave.co.uk.

Updated on April 11, 2020

Comments

  • Mark Whitaker
    Mark Whitaker about 4 years

    I see the following HTML code used a lot to load jQuery from a content delivery network, but fall back to a local copy if the CDN is unavailable (e.g. in the Modernizr docs):

    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
    <script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">\x3C/script>')</script>
    

    My question is, why is the last < character in the document.write() statement replaced with the escape sequence \x3C? < is a safe character in JavaScript and is even used earlier in the same string, so why escape it there? Is it just to prevent bad browser implementations from thinking the </script> inside the string is the real script end tag? If so are there really any browsers out there that would fail on this?

    As a follow-on question, I've also seen a variant using unescape() (as given in this answer) in the wild a couple of times too. Is there a reason why that version always seems to substitute all the < and > characters?

  • Mark Whitaker
    Mark Whitaker over 12 years
    I just amended the question to that effect, probably as you were typing your answer. Are there really any modern browsers that would fail on this?
  • J. K.
    J. K. over 12 years
    Most, I think. I know for sure that Chrome interprets it as a closing tag. (Checked yesterday)
  • Marcin
    Marcin over 12 years
    Would this happen with a CDATA section wrapping the script?
  • Quentin
    Quentin over 12 years
    @Mark — All browsers would "fail" on that, because </script> is supposed to be treated as an end tag.
  • Quentin
    Quentin over 12 years
    @Marcin — Not if the document was served as application/xhtml+xml and parsed as XML, and not if the document was parsed using a real SGML parser … but it would with a tag soup or HTML 5 parser.
  • Marcin
    Marcin over 12 years
    @Quentin, awesome, thanks for clarifying. What's up with HTML5 changing the behaviour?
  • balpha
    balpha over 12 years
    @Mark: Expanded my answer a bit.
  • moey
    moey over 11 years
    Sorry, can't help myself from adding a thank-you comment: Big Thanks for the clarity of the answer!