C preprocessor, recursive macros

15,438

Solution 1

In fact, it depends on your interpretation of the language standard. For example, under mcpp, a preprocessor implementation that strictly conforms to the text of the language standard, the second yields CAT(x, y); as well [extra newlines have been removed from the result]:

C:\dev>mcpp -W0 stubby.cpp
#line 1 "C:/dev/stubby.cpp"
        CAT(x, y) ;
        CAT(x, y) ;
C:\dev>

There is a known inconsistency in the C++ language specification (the same inconsistency is present in the C specification, though I don't know where the defect list is for C). The specification states that the final CAT(x, y) should not be macro-replaced. The intent may have been that it should be macro-replaced.

To quote the linked defect report:

Back in the 1980's it was understood by several WG14 people that there were tiny differences between the "non-replacement" verbiage and the attempts to produce pseudo-code.

The committee's decision was that no realistic programs "in the wild" would venture into this area, and trying to reduce the uncertainties is not worth the risk of changing conformance status of implementations or programs.


So, why do we get different behavior for M(0) than for N(0) with most common preprocessor implementations? In the replacement of M, the second invocation of CAT consists entirely of tokens resulting from the first invocation of CAT:

M(0) 
CAT(M_, 0)
CAT_I(M_, 0)
M_0
CAT(x, y)

If M_0 was instead defined to be replaced by CAT(M, 0), replacement would recurse infinitely. The preprocessor specification explicitly prohibits this "strictly recursive" replacement by stopping macro replacement, so CAT(x, y) is not macro replaced.

However, in the replacement of N, the second invocation of CAT consists only partially of tokens resulting from the first invocation of CAT:

N(0)
CAT(N_, 0)       ()
CAT_I(N_, 0)     ()
N_0              ()
CAT(x, y)
CAT_I(x, y)
xy

Here the second invocation of CAT is formed partially from tokens resulting from the first invocation of CAT and partially from other tokens, namely the () from the replacement list of N. The replacement is not strictly recursive and thus when the second invocation of CAT is replaced, it cannot yield infinite recursion.

Solution 2

Just follow the sequence:

1.)

M(0); //  expands to CAT(x, y) TRUE 
CAT(M_, 0)
CAT_I(M_, 0)
M_0
CAT(x, y)

2.)

N(0); //  expands to xy TRUE
CAT(N_, 0)()
CAT_I(N_, 0)()
N_0()
CAT(x, y)
CAT_I(x, y)
xy

You only need to recursively replace the macros.

Notes on ## preprocessor operator: Two arguments can be 'glued' together using ## preprocessor operator; this allows two tokens to be concatenated in the preprocessed code.

Unlike standard macro expansion, traditional macro expansion has no provision to prevent recursion. If an object-like macro appears unquoted in its replacement text, it will be replaced again during the rescan pass, and so on ad infinitum. GCC detects when it is expanding recursive macros, emits an error message, and continues after the offending macro invocation. (gcc online doc)

Share:
15,438

Related videos on Youtube

imre
Author by

imre

Updated on March 10, 2020

Comments

  • imre
    imre about 4 years

    Why does M(0) and N(0) have different results?

    #define CAT_I(a, b) a ## b
    #define CAT(a, b) CAT_I(a, b)
    
    #define M_0 CAT(x, y)
    #define M_1 whatever_else
    #define M(a) CAT(M_, a)
    M(0);       //  expands to CAT(x, y)
    
    #define N_0() CAT(x, y)
    #define N_1() whatever_else
    #define N(a) CAT(N_, a)()
    N(0);       //  expands to xy
    
    • imre
      imre about 13 years
      I don't really want to achieve anything, just noticed this while working on something, and I'm curious about the reasons. It annoys me when I don't understand something :) .
  • imre
    imre about 13 years
    I know. And correspondingly, N_0 is defined as a function-style (0-argument) macro. And for some reason, that seems to make a difference in recursive evaluations, but I don't know exactly why; that's my question.
  • imre
    imre about 13 years
    Interesting... The preprocessors in VC++ and the online Comeau compiler both expand N(0) to "xy".
  • imre
    imre about 13 years
    Umm... I still don't get it. The two sequences both reach the same CAT(x, y) -- so why stop there in one case but not the other?
  • imre
    imre about 13 years
    Also, is it somehow possible to work around this recursion limitation and make the last CAT evaluate? (Besides defining another alternative CAT?)
  • zwol
    zwol about 13 years
    I have a dim memory to the effect that because the () supplied to N_0 came from outside any macro expansion, that counts as a new macro expansion, so the "blue paint" comes off CAT() and it can be expanded once more. So this might be a bug in mcpp. FWIW gcc agrees with Comeau and VC++.
  • James McNellis
    James McNellis about 13 years
    The specification of the recursive replacement rules is absurdly convoluted; that's what happens when you try to write a complete, English-language specification for the behavior of a program after the program has been written and modified over a period of two decades :-). There is a long, detailed discussion of the issue in the documentation for the MCPP conformance suite, which is included in the source distributions of mcpp.
  • imre
    imre about 13 years
    Good to know about gcc -- right now I'm more interested in actually working code than standard compliance (although the only place where my code actually depends on this is some horrible dllexport/import stuff). So thanks for all the info, James and Zack.
  • James McNellis
    James McNellis about 13 years
    @imre: I'm interested to know why something so complex is needed for a dllimport/dllexport declspec. The idiom is to use a single macro (e.g. MYPROJECT_EXPORT) that is conditionally set to one of the two depending on whether "My Project" is being built.
  • Cacho Santa
    Cacho Santa about 13 years
    I think the recursion here it depends on the interpretation of the standard like James McNellis said. Nice question imre.
  • James McNellis
    James McNellis about 13 years
    @imre: In the case of M(0), the second CAT(...) invocation results entirely from the first CAT(...) invocation, thus it is a strictly recursive call. In the case of N(0), the second CAT(...) invocation results only partially from the first CAT(...) invocation and partially from other tokens that appear after that (the () in the replacement list of N). Thus, it is not entirely recursive.
  • imre
    imre about 13 years
    @James McNellis: The explanation would be too long for a comment here (involves custom RTTI macros, nested classes, and class templates, all in the context of dll-exporting), are you interested enough to receive an email? :)
  • James McNellis
    James McNellis about 13 years
    @imre: Nope. If you know what you're doing, that's good enough for me :-) Best of luck, though.
  • Jim Balter
    Jim Balter about 13 years
    You have quoted the wrong part of the defect report. The relevant quote is "The original intent of the J11 committee in this text was that the result should be 42, as demonstrated by the original pseudo-code description of the replacement algorithm provided by Dave Prosser, its author. The English description, however, omits some of the subtleties of the pseudo-code and thus arguably gives an incorrect answer for this case." and the operative word there is "arguably". Since that is only "arguable" but the argument goes against the intent, the argument is wrong, and so is mcpp.
  • Jim Balter
    Jim Balter about 13 years
    BTW, it's important to note that mcpp scores perfectly on the CPP validation suite ... written by the author of mcpp. So all that score shows is that mcpp does what its author thinks it should; it does not show that it is actually faithful to the C standard.
  • James McNellis
    James McNellis about 13 years
    @Jim: I would recommend reading the mcpp test suite documentation, which contains an eight page discussion on the subject and explains the contradictions in the specifications and the manner in which the specifications have changed. In C99 the behavior is explicitly unspecified. A conforming implementation may replace the second invocation of CAT or it may not.

Related