What are the applications of the ## preprocessor operator and gotchas to consider?
Solution 1
CrashRpt: Using ## to convert macro multi-byte strings to Unicode
An interesting usage in CrashRpt (crash reporting library) is the following:
#define WIDEN2(x) L ## x
#define WIDEN(x) WIDEN2(x)
//Note you need a WIDEN2 so that __DATE__ will evaluate first.
Here they want to use a two-byte string instead of a one-byte-per-char string. This probably looks like it is really pointless, but they do it for a good reason.
std::wstring BuildDate = std::wstring(WIDEN(__DATE__)) + L" " + WIDEN(__TIME__);
They use it with another macro that returns a string with the date and time.
Putting L
next to a __ DATE __
would give you a compiling error.
Windows: Using ## for generic Unicode or multi-byte strings
Windows uses something like the following:
#ifdef _UNICODE
#define _T(x) L ## x
#else
#define _T(x) x
#endif
And _T
is used everywhere in code
Various libraries, using for clean accessor and modifier names:
I've also seen it used in code to define accessors and modifiers:
#define MYLIB_ACCESSOR(name) (Get##name)
#define MYLIB_MODIFIER(name) (Set##name)
Likewise you can use this same method for any other types of clever name creation.
Various libraries, using it to make several variable declarations at once:
#define CREATE_3_VARS(name) name##1, name##2, name##3
int CREATE_3_VARS(myInts);
myInts1 = 13;
myInts2 = 19;
myInts3 = 77;
Solution 2
One thing to be aware of when you're using the token-paste ('##
') or stringizing ('#
') preprocessing operators is that you have to use an extra level of indirection for them to work properly in all cases.
If you don't do this and the items passed to the token-pasting operator are macros themselves, you'll get results that are probably not what you want:
#include <stdio.h>
#define STRINGIFY2( x) #x
#define STRINGIFY(x) STRINGIFY2(x)
#define PASTE2( a, b) a##b
#define PASTE( a, b) PASTE2( a, b)
#define BAD_PASTE(x,y) x##y
#define BAD_STRINGIFY(x) #x
#define SOME_MACRO function_name
int main()
{
printf( "buggy results:\n");
printf( "%s\n", STRINGIFY( BAD_PASTE( SOME_MACRO, __LINE__)));
printf( "%s\n", BAD_STRINGIFY( BAD_PASTE( SOME_MACRO, __LINE__)));
printf( "%s\n", BAD_STRINGIFY( PASTE( SOME_MACRO, __LINE__)));
printf( "\n" "desired result:\n");
printf( "%s\n", STRINGIFY( PASTE( SOME_MACRO, __LINE__)));
}
The output:
buggy results:
SOME_MACRO__LINE__
BAD_PASTE( SOME_MACRO, __LINE__)
PASTE( SOME_MACRO, __LINE__)
desired result:
function_name21
Solution 3
Here's a gotcha that I ran into when upgrading to a new version of a compiler:
Unnecessary use of the token-pasting operator (##
) is non-portable and may generate undesired whitespace, warnings, or errors.
When the result of the token-pasting operator is not a valid preprocessor token, the token-pasting operator is unnecessary and possibly harmful.
For example, one might try to build string literals at compile time using the token-pasting operator:
#define STRINGIFY(x) #x
#define PLUS(a, b) STRINGIFY(a##+##b)
#define NS(a, b) STRINGIFY(a##::##b)
printf("%s %s\n", PLUS(1,2), NS(std,vector));
On some compilers, this will output the expected result:
1+2 std::vector
On other compilers, this will include undesired whitespace:
1 + 2 std :: vector
Fairly modern versions of GCC (>=3.3 or so) will fail to compile this code:
foo.cpp:16:1: pasting "1" and "+" does not give a valid preprocessing token
foo.cpp:16:1: pasting "+" and "2" does not give a valid preprocessing token
foo.cpp:16:1: pasting "std" and "::" does not give a valid preprocessing token
foo.cpp:16:1: pasting "::" and "vector" does not give a valid preprocessing token
The solution is to omit the token-pasting operator when concatenating preprocessor tokens to C/C++ operators:
#define STRINGIFY(x) #x
#define PLUS(a, b) STRINGIFY(a+b)
#define NS(a, b) STRINGIFY(a::b)
printf("%s %s\n", PLUS(1,2), NS(std,vector));
The GCC CPP documentation chapter on concatenation has more useful information on the token-pasting operator.
Solution 4
This is useful in all kinds of situations in order not to repeat yourself needlessly. The following is an example from the Emacs source code. We would like to load a number of functions from a library. The function "foo" should be assigned to fn_foo
, and so on. We define the following macro:
#define LOAD_IMGLIB_FN(lib,func) { \
fn_##func = (void *) GetProcAddress (lib, #func); \
if (!fn_##func) return 0; \
}
We can then use it:
LOAD_IMGLIB_FN (library, XpmFreeAttributes);
LOAD_IMGLIB_FN (library, XpmCreateImageFromBuffer);
LOAD_IMGLIB_FN (library, XpmReadFileToImage);
LOAD_IMGLIB_FN (library, XImageFree);
The benefit is not having to write both fn_XpmFreeAttributes
and "XpmFreeAttributes"
(and risk misspelling one of them).
Solution 5
A previous question on Stack Overflow asked for a smooth method of generating string representations for enumeration constants without a lot of error-prone retyping.
My answer to that question showed how applying little preprocessor magic lets you define your enumeration like this (for example) ...;
ENUM_BEGIN( Color )
ENUM(RED),
ENUM(GREEN),
ENUM(BLUE)
ENUM_END( Color )
... With the benefit that the macro expansion not only defines the enumeration (in a .h file), it also defines a matching array of strings (in a .c file);
const char *ColorStringTable[] =
{
"RED",
"GREEN",
"BLUE"
};
The name of the string table comes from pasting the macro parameter (i.e. Color) to StringTable using the ## operator. Applications (tricks?) like this are where the # and ## operators are invaluable.
John Rudy
"We were somewhere outside Barstow, on the edge of the desert, when the drugs began to take hold. I remember saying something like, 'I feel a bit lightheaded; maybe you should drive ... ' And suddenly there was a terrible roar all around us and the sky was full of what looked like huge bats, all swooping and screeching and diving around the car, which was going about a hundred miles an hour with the top down to Las Vegas. And a voice was screaming: 'Holy Jesus! What are these goddamn animals?'" You can e-mail me at johnrudy somewhere near codeheaven.info. You can also follow me on Twitter.
Updated on February 10, 2020Comments
-
John Rudy over 4 years
As mentioned in many of my previous questions, I'm working through K&R, and am currently into the preprocessor. One of the more interesting things — something I never knew before from any of my prior attempts to learn C — is the
##
preprocessor operator. According to K&R:The preprocessor operator
##
provides a way to concatenate actual arguments during macro expansion. If a parameter in the replacement text is adjacent to a##
, the parameter is replaced by the actual argument, the##
and surrounding white space are removed, and the result is re-scanned. For example, the macropaste
concatenates its two arguments:#define paste(front, back) front ## back
so
paste(name, 1)
creates the tokenname1
.How and why would someone use this in the real world? What are practical examples of its use, and are there gotchas to consider?
-
bk1e over 15 yearsWhat were you trying to do with that? It would work just as well without the "##", since there is no need to token-paste "," to "msg". Were you trying to stringify msg? Also, FILE and LINE must be in uppercase, not lowercase.
-
Michael Burr over 15 yearsThanks - I wasn't aware of this (but then I don't use these preprocessing operators too much...).
-
ya23 over 15 yearsYou're right indeed. I need to find the original script to see how ## was used. Shame on me, no cookie today!
-
Mark Ransom about 15 yearsIt's called the "token pasting" operator for a reason - the intent is to end up with a single token when you're done. Nice writeup.
-
Adam Davis over 12 yearsFor an explanation of this preprocessor behavior, see stackoverflow.com/questions/8231966/…
-
HELP PLZ almost 10 years@MichaelBurr i was reading your answer & i have a doubt. How come this LINE is printing the line number?
-
Michael Burr almost 10 years@AbhimanyuAryan: I'm not sure if this is what you're asking, but
__LINE__
is a special macro name that is replaced by the preprocessor with the current line number of the source file. -
alecov almost 10 yearsWhen the result of the token-pasting operator is not a valid preprocessor token, the behavior is undefined.
-
PJTraill almost 9 yearsI take it you mean by ‘non-standard’ that the compiler did not do string pasting but did do token pasting — or would it have worked even without
##
? -
user666412 over 8 yearsSince you can concatenate string literals at compile time, you could reduce the BuildDate expression to
std::wstring BuildDate = WIDEN(__DATE__) L" " WIDEN(__TIME__);
and implicitly build the whole string at once. -
Kerrek SB over 7 yearsLanguage changes like hexadecimal floats, or (in C++) digit separators and user-defined literals, continually change what constitutes a "valid preprocessing token", so please never abuse it like that! If you have to separate (language proper) tokens, please spell them as two separate tokens, and don't rely on accidental interactions between the preprocessor grammar and the language proper.
-
Antonio about 7 yearsIt would be cool if language specifications could be cited/linked, as in here