Rules for C++ string literals escape character
Solution 1
Control characters:
(Hex codes assume an ASCII-compatible character encoding.)
-
\a
=\x07
= alert (bell) -
\b
=\x08
= backspace -
\t
=\x09
= horizonal tab -
\n
=\x0A
= newline (or line feed) -
\v
=\x0B
= vertical tab -
\f
=\x0C
= form feed -
\r
=\x0D
= carriage return -
\e
=\x1B
= escape (non-standard GCC extension)
Punctuation characters:
-
\"
= quotation mark (backslash not required for'"'
) -
\'
= apostrophe (backslash not required for"'"
) -
\?
= question mark (used to avoid trigraphs) -
\\
= backslash
Numeric character references:
-
\
+ up to 3 octal digits -
\x
+ any number of hex digits -
\u
+ 4 hex digits (Unicode BMP, new in C++11) -
\U
+ 8 hex digits (Unicode astral planes, new in C++11)
\0
= \00
= \000
= octal ecape for null character
If you do want an actual digit character after a \0
, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0"
.
Solution 2
\0 will be interpreted as an octal escape sequence if it is followed by other digits, so \00 will be interpreted as a single character. (\0 is technically an octal escape sequence as well, at least in C).
The way you're doing it:
std::string ("0\0" "0", 3) // String concatenation
works because this version of the constructor takes a char array; if you try to just pass "0\0" "0" as a const char*, it will treat it as a C string and only copy everything up until the null character.
Here is a list of escape sequences.
Solution 3
\a
is the bell/alert character, which on some systems triggers a sound. \nnn
, represents an arbitrary ASCII character in octal base. However, \0
is special in that it represents the null character no matter what.
To answer your original question, you could escape your '0' characters as well, as:
std::string ("\060\000\060", 3);
(since an ASCII '0' is 60 in octal)
The MSDN documentation has a pretty detailed article on this, as well cppreference
Solution 4
I left something like this as a comment, but I feel it probably needs more visibility as none of the answers mention this method:
The method I now prefer for initializing a std::string
with non-printing characters in general (and embedded null characters in particular) is to use the C++11 feature of initializer lists.
std::string const str({'\0', '6', '\a', 'H', '\t'});
I am not required to perform error-prone manual counting of the number of characters that I am using, so that if later on I want to insert a '\013' in the middle somewhere, I can and all of my code will still work. It also completely sidesteps any issues of using the wrong escape sequence by accident.
The only downside is all of those extra '
and ,
characters.
Related videos on Youtube
David Stone
Member of C++ Standardization Committee, where I chair the Modules Study Group (SG2) and vice chair the Evolution Working Group (EWG).
Updated on July 09, 2022Comments
-
David Stone almost 2 years
What are the rules for the escape character
\
in string literals? Is there a list of all the characters that are escaped?In particular, when I use
\
in a string literal in gedit, and follow it by any three numbers, it colors them differently.I was trying to create a
std::string
constructed from a literal with the character0
followed by the null character (\0
), followed by the character0
. However, the syntax highlighting alerted me that maybe this would create something like the character0
followed by the null character (\00
, aka\0
), which is to say, only two characters.For the solution to just this one problem, is this the best way to do it:
std::string ("0\0" "0", 3) // String concatenation
And is there some reference for what the escape character does in string literals in general? What is '\a', for instance?
-
MPelletier about 12 yearsRelated, on how to escape an escape sequence. The best solution is to use concatenation as you had.
-
MPelletier about 12 yearsIf you need a single
` just use
\`. -
David Stone about 12 yearsIt looks like I can also use the initializer list syntax:
std::string { '0', 0, '0' };
-
David Stone over 11 yearsNot only can I use the initializer list syntax, I now highly recommend it over any other method of constructing a string that requires you to specify a size or uses escaped characters. Consider the subtle undefined behavior outlined in stackoverflow.com/questions/164168/…
-
MPelletier over 11 yearsI realize now my comment at 1:32 is completely obfuscated... I have no idea what I meant...
-
-
mgiuffrida about 12 yearsThat example uses the constructor string (const char * s), which treats s like a C string. OP's example uses string (const char * s, size_t n), which treats it like an array of characters.
-
Rhubbarb over 11 yearsIn the case of
\x
, the hex digits will be read 'greedily' until the first non-hex digit (that is, not limited to 2 as you might expect, and as some syntax highlighters do assume). You can use the @dan04 trick of splitting strings to mark the end of the hex:"\x0020" "FeedDadBeer"
rather than"\x0020FeedDadBeer"
. -
eggyal over 9 yearsSo then what is represented by
\x
followed by an odd number of hexits? One assumes that for an even number, each hexit represents a nibble of memory from highest-to-lowest order—thus\x5f
is01011111
rather than11110101
; but then does that mean\x5
is01010000
rather than00000101
? And then what about\x5f5
? Is that01011111 01010000
or01011111 00000101
? -
Stijn Sanders about 8 yearsI don't know if this would validate a question of its own, but I've received string-data from some source with
"\e"
in it. I don't see it listed on any reference, could it be equivalient to\x1B
? -
dan04 about 8 years@StijnSanders: It's not in the C or C++ standard, but some compilers use
\e
to indicate the escape character\x1B
. I have added it to my list. -
Rick about 4 yearsCould you give a reference about
\u
and\U
usage? It works and I am interested in it while C++ Primer 5th doesn't say anything about them. I can only find one or two Q&As talking them on SO.