Multicharacter literal in C and C++

12,085

Solution 1

I don't know how extensively this is used, but "implementation-defined" is a big red-flag to me. As far as I know, this could mean that the implementation could choose to ignore your character designations and just assign normal incrementing values if it wanted. It may do something "nicer", but you can't rely on that behavior across compilers (or even compiler versions). At least "goto" has predictable (if undesirable) behavior...

That's my 2c, anyway.

Edit: on "implementation-defined":

From Bjarne Stroustrup's C++ Glossary:

implementation defined - an aspect of C++'s semantics that is defined for each implementation rather than specified in the standard for every implementation. An example is the size of an int (which must be at least 16 bits but can be longer). Avoid implementation defined behavior whenever possible. See also: undefined. TC++PL C.2.

also...

undefined - an aspect of C++'s semantics for which no reasonable behavior is required. An example is dereferencing a pointer with the value zero. Avoid undefined behavior. See also: implementation defined. TC++PL C.2.

I believe this means the comment is correct: it should at least compile, although anything beyond that is not specified. Note the advice in the definition, also.

Solution 2

It makes it easier to pick out values in a memory dump.

Example:

enum state { waiting, running, stopped };

vs.

enum state { waiting = 'wait', running = 'run.', stopped = 'stop' };

a memory dump after the following statement:

s = stopped;

might look like:

00 00 00 02 . . . .

in the first case, vs:

73 74 6F 70 s t o p

using multicharacter literals. (of course whether it says 'stop' or 'pots' depends on byte ordering)

Solution 3

Four character literals, I've seen and used. They map to 4 bytes = one 32 bit word. It's very useful for debugging purposes as said above. They can be used in a switch/case statement with ints, which is nice.

This (4 Chars) is pretty standard (ie supported by GCC and VC++ at least), although results (actual values compiled) may vary from one implementation to another.

But over 4 chars? I wouldn't use.

UPDATE: From the C4 page: "For our simple actions, we'll just provide an enumeration of some values, which is done in C4 by specifying four-character constants". So they are using 4 chars literals, as was my case.

Solution 4

In C++14 specification draft N4527 section 2.13.3, entry 2:

... An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int, and has an implementation-defined value.

Previous answers to your question pertained mostly on real machines that did support multicharacter literals. Specifically, on platforms where int is 4 bytes, four-byte multicharacter is fine and can be used for convenience, as per Ferrucio's mem dump example. But, as there is no guarantee that this will ever work or work the same way on other platforms, use of multicharacter literals should be deprecated for portable programs.

Solution 5

Multicharacter literals allow one to specify int values via the equivalent representation in characters. Useful for enums, FourCC codes and tags, and non-type template parameters. With a multicharacter literal, a FourCC code can be typed directly into the source, which is handy.

The implementation in gcc is described at https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html . Note that the value is truncated to the size of the type int, so 'efgh' == 'abcdefgh' if your ints are 4 chars wide, although gcc will issue a warning on the literal that overflows.

Unfortunately, gcc will issue a warning on all multi-character literals if -pedantic is passed, as their behavior is implementation-defined. As you can see above, it is perhaps possible for equality of two multi-character literals to change if you switch implementations.

Share:
12,085

Related videos on Youtube

topright gamedev
Author by

topright gamedev

Updated on January 14, 2021

Comments

  • topright gamedev
    topright gamedev over 3 years

    I didn't know that C and C++ allow multicharacter literal: not 'c' (of type int in C and char in C++), but 'tralivali' (of type int!)

    enum
    {
        ActionLeft = 'left',
        ActionRight = 'right',
        ActionForward = 'forward',
        ActionBackward = 'backward'
    };
    

    Standard says:

    C99 6.4.4.4p10: "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."

    I found they are widely used in C4 engine. But I suppose they are not safe when we are talking about platform-independend serialization. Thay can be confusing also because look like strings. So what is multicharacter literal's scope of usage, are they useful for something? Are they in C++ just for compatibility with C code? Are they considered to be a bad feature as goto operator or not?

    • Steve M
      Steve M over 13 years
      goto isn't a bad feature; at least in C. It's far more useful than multicharacter literals.
    • Martin York
      Martin York over 13 years
      Apple used to use them to identify the developer and application name. Basically they were a visual way of representing you developer ID. int id='MYCP'; Apple would tell you your developer ID as a character literal rather than just a boring old int.
    • Stuart Berg
      Stuart Berg over 12 years
      Multicharacter literals are used (abused?) in the in the boost::mpl::string sequence, if you're into that sort of thing.
    • johnnycrash
      johnnycrash about 8 years
      We use multi char literals to populate strings fast. To populate a string with "1234" we use *(int*)sz = '4321'. memcpy(sz, "1234", 4) is sometimes optimized to the same assembly *(int*)sz = '4321' produces. The optimizer doesn't always do this, so we force it using multichar literals..
    • L. F.
      L. F. almost 5 years
      In C++, multicharacter literals are conditionally-supported. Thus your code may fail to compile. If they are supported, they have implementation-defined value. Thus it is possible that an implementation choose to assign all multicharacter literals the value 0, breaking your code.
  • Armen Tsirunyan
    Armen Tsirunyan over 13 years
    As far as I understand it is not allowed to fail to compile
  • Ferruccio
    Ferruccio over 13 years
    You're fine as long as you don't rely on byte ordering or try to serialize the values.
  • pmg
    pmg over 13 years
    Well, I've heard that on Cray machines, "sizeof (char) == sizeof (int)" is true. I have absolutely no idea what a C compiler might do to a multicharacter literal on one of those ...
  • jv42
    jv42 over 13 years
    Nope, didn't encounter one of these beast. Code I've used was for x86-32 bits Windows PC.
  • topright gamedev
    topright gamedev over 13 years
    I totally agree about red-flag. My interest is theoretical mostly.
  • Stuart Berg
    Stuart Berg over 12 years
    The reference to "undefined behavior" here is irrelevant. "Implementation defined" and "undefined" are two different terms with two different meanings. I don't think that multicharacter literals fall under the nasal demons category. I think @Ferruccio is correct: you can use the feature as long as you don't care how the feature is implemented under the hood.
  • ignis
    ignis almost 12 years
    @Ferruccio, superbatfish. "The implementation could choose to ignore your character designations and just assign normal incrementing values if it wanted." (cit. Nick) You're only fine if your compiler's documentation mandates a specific behavior.
  • Ferruccio
    Ferruccio almost 12 years
    According to the Cray C & C++ Reference Manual (docs.cray.com/books/S-2179-52/html-S-2179-52/…), multicharacter literals work the same way (8 bits/char even though the char type itself is bigger).
  • legends2k
    legends2k over 10 years
    @Ferruccio: It's good to know such tricks. Usual programming is common, anyone can lookup the standards when some grammar is disputable while such tips are learned only by actual work/experience.
  • bobobobo
    bobobobo over 10 years
    Note despite this "implementation definedness" equality should still hold up (int mb = 'test' ; if( b == 'test' )) should hold up, as long as the code is running on the same machine.
  • legends2k
    legends2k over 10 years
    @Ferruccio: in the first case s would be of value 00 00 00 02 since the first value of enums start at 0 unless overridden.
  • Damian Yerrick
    Damian Yerrick over 8 years
    @pmg Does Cray support POSIX?
  • pmg
    pmg over 8 years
    @tepples: I don't know, but according to an article on Wikipedia I think it does.
  • Damian Yerrick
    Damian Yerrick over 8 years
    @pmg Because char is always 8-bit in POSIX.
  • bit2shift
    bit2shift over 6 years
    How about it should be made portable by arranging the characters the same way they're stored in strings? I guess who originally wrote that section of the standard was near-sighted and did not see the obvious solution.
  • Syroot
    Syroot almost 4 years
    Welcome to "implementation defined" behavior :)
  • Jan Hošek
    Jan Hošek over 3 years
    This actually may be more desirable since when afterward read byte-by-byte, the characters are recovered in original order.
  • mcabreb
    mcabreb over 3 years
    Little "endian".