Caveats Encoding a C# string to a Javascript string

16,368

Solution 1

(.net 4) You can;

System.Web.HttpUtility.JavaScriptStringEncode(@"aa\bb ""cc"" dd\tee", true);
== 
"aa\\bb \"cc\" dd\\tee"

Solution 2

It's my understanding that you do have to be careful, as JavaScript is not UTF-16, rather, it's UCS-2 which I believe is a subset of UTF-16. What this means for you, is that any character that is represented than a higher code point of 2 bytes (0xFFFF) could give you problems in JavaScript.

In summary, under the covers, the engine may use UTF-16, but it only exposes UCS-2 like methods.

Great article on the issue: http://mathiasbynens.be/notes/javascript-encoding

Share:
16,368

Related videos on Youtube

Machado
Author by

Machado

I'm a software developer since I was 12, when I got my first AT-286 computer. Nowadays I work with software development for financial markets, but I already worked with several technologies (from PalmOS to Interactive Digital TV, passing through common Web technologies). Some of my main interests are software architecture and programming.

Updated on June 04, 2022

Comments

  • Machado
    Machado almost 2 years

    I'm trying to write a custom Javascript MVC3 Helper class foe my project, and one of the methods is supposed to escape C# strings to Javascript strings.

    I know C# strings are UTF-16 encoded, and Javascript strings also seem to be UTF-16. No problem here.

    I know some characters like backslash, single quotes or double quotes must be backslash-escaped on Javascript so:

    \ becomes \\
    ' becomes \'
    " becomes \"
    

    Is there any other caveat I must be aware of before writing my conversion method ?

    EDIT: Great answers so far, I'm adding some references from the answers in the question to help others in the future.

    Alex K. suggested using System.Web.HttpUtility.JavaScriptStringEncode, which I marked as the right answer for me, because I'm using .Net 4. But this function is not available to previous .Net versions, so I'm adding some other resources here:

    CR  becomes \r   // Javascript string cannot be broke into more than 1 line
    LF  becomes \n   // Javascript string cannot be broke into more than 1 line
    TAB becomes \t
    
    Control characters must be Hex-Escaped
    

    JP Richardson gave an interesting link informing that Javascript uses UCS-2, which is a subset of UTF-16, but how to encode this correctly is an entirely new question.

    LukeH on the comments below reminded the CR, LF and TAB chars, and that reminded me of the control chars (BEEP, NULL, ACK, etc...).

  • Machado
    Machado about 12 years
    Nice! I'm using MVC3 with .Net 4, so this is very useful!
  • Machado
    Machado about 12 years
    So, how could we safely transform the C# UTF-16 into UCS-2 in order to encode the string the right way ?
  • Matt R
    Matt R about 10 years
    What's the solution for users of .net version < 4?
  • Casey
    Casey almost 9 years
    Why would you choose to do it this way?
  • Gqqnbig
    Gqqnbig about 7 years
    My string is not url, why do I use UrlEncode. It seems silly. But I believe it will work.