Rules for C++ string literals escape character

136,997

Solution 1

Control characters:

(Hex codes assume an ASCII-compatible character encoding.)

  • \a = \x07 = alert (bell)
  • \b = \x08 = backspace
  • \t = \x09 = horizonal tab
  • \n = \x0A = newline (or line feed)
  • \v = \x0B = vertical tab
  • \f = \x0C = form feed
  • \r = \x0D = carriage return
  • \e = \x1B = escape (non-standard GCC extension)

Punctuation characters:

  • \" = quotation mark (backslash not required for '"')
  • \' = apostrophe (backslash not required for "'")
  • \? = question mark (used to avoid trigraphs)
  • \\ = backslash

Numeric character references:

  • \ + up to 3 octal digits
  • \x + any number of hex digits
  • \u + 4 hex digits (Unicode BMP, new in C++11)
  • \U + 8 hex digits (Unicode astral planes, new in C++11)

\0 = \00 = \000 = octal ecape for null character

If you do want an actual digit character after a \0, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0".

Solution 2

\0 will be interpreted as an octal escape sequence if it is followed by other digits, so \00 will be interpreted as a single character. (\0 is technically an octal escape sequence as well, at least in C).

The way you're doing it:

std::string ("0\0" "0", 3)  // String concatenation 

works because this version of the constructor takes a char array; if you try to just pass "0\0" "0" as a const char*, it will treat it as a C string and only copy everything up until the null character.

Here is a list of escape sequences.

Solution 3

\a is the bell/alert character, which on some systems triggers a sound. \nnn, represents an arbitrary ASCII character in octal base. However, \0 is special in that it represents the null character no matter what.

To answer your original question, you could escape your '0' characters as well, as:

std::string ("\060\000\060", 3);

(since an ASCII '0' is 60 in octal)

The MSDN documentation has a pretty detailed article on this, as well cppreference

Solution 4

I left something like this as a comment, but I feel it probably needs more visibility as none of the answers mention this method:

The method I now prefer for initializing a std::string with non-printing characters in general (and embedded null characters in particular) is to use the C++11 feature of initializer lists.

std::string const str({'\0', '6', '\a', 'H', '\t'});

I am not required to perform error-prone manual counting of the number of characters that I am using, so that if later on I want to insert a '\013' in the middle somewhere, I can and all of my code will still work. It also completely sidesteps any issues of using the wrong escape sequence by accident.

The only downside is all of those extra ' and , characters.

Share:
136,997

Related videos on Youtube

David Stone
Author by

David Stone

Member of C++ Standardization Committee, where I chair the Modules Study Group (SG2) and vice chair the Evolution Working Group (EWG).

Updated on July 09, 2022

Comments

  • David Stone
    David Stone almost 2 years

    What are the rules for the escape character \ in string literals? Is there a list of all the characters that are escaped?

    In particular, when I use \ in a string literal in gedit, and follow it by any three numbers, it colors them differently.

    I was trying to create a std::string constructed from a literal with the character 0 followed by the null character (\0), followed by the character 0. However, the syntax highlighting alerted me that maybe this would create something like the character 0 followed by the null character (\00, aka \0), which is to say, only two characters.

    For the solution to just this one problem, is this the best way to do it:

    std::string ("0\0" "0", 3)  // String concatenation 
    

    And is there some reference for what the escape character does in string literals in general? What is '\a', for instance?

    • MPelletier
      MPelletier about 12 years
      Related, on how to escape an escape sequence. The best solution is to use concatenation as you had.
    • MPelletier
      MPelletier about 12 years
      If you need a single ` just use \`.
    • David Stone
      David Stone about 12 years
      It looks like I can also use the initializer list syntax: std::string { '0', 0, '0' };
    • David Stone
      David Stone over 11 years
      Not only can I use the initializer list syntax, I now highly recommend it over any other method of constructing a string that requires you to specify a size or uses escaped characters. Consider the subtle undefined behavior outlined in stackoverflow.com/questions/164168/…
    • MPelletier
      MPelletier over 11 years
      I realize now my comment at 1:32 is completely obfuscated... I have no idea what I meant...
  • mgiuffrida
    mgiuffrida about 12 years
    That example uses the constructor string (const char * s), which treats s like a C string. OP's example uses string (const char * s, size_t n), which treats it like an array of characters.
  • Rhubbarb
    Rhubbarb over 11 years
    In the case of \x, the hex digits will be read 'greedily' until the first non-hex digit (that is, not limited to 2 as you might expect, and as some syntax highlighters do assume). You can use the @dan04 trick of splitting strings to mark the end of the hex: "\x0020" "FeedDadBeer" rather than "\x0020FeedDadBeer".
  • eggyal
    eggyal over 9 years
    So then what is represented by \x followed by an odd number of hexits? One assumes that for an even number, each hexit represents a nibble of memory from highest-to-lowest order—thus \x5f is 01011111 rather than 11110101; but then does that mean \x5 is 01010000 rather than 00000101? And then what about \x5f5? Is that 01011111 01010000 or 01011111 00000101?
  • Stijn Sanders
    Stijn Sanders about 8 years
    I don't know if this would validate a question of its own, but I've received string-data from some source with "\e" in it. I don't see it listed on any reference, could it be equivalient to \x1B?
  • dan04
    dan04 about 8 years
    @StijnSanders: It's not in the C or C++ standard, but some compilers use \e to indicate the escape character \x1B. I have added it to my list.
  • Rick
    Rick about 4 years
    Could you give a reference about \u and \U usage? It works and I am interested in it while C++ Primer 5th doesn't say anything about them. I can only find one or two Q&As talking them on SO.