Is C++ context-free or context-sensitive?

66,701

Solution 1

Below is my (current) favorite demonstration of why parsing C++ is (probably) Turing-complete, since it shows a program which is syntactically correct if and only if a given integer is prime.

So I assert that C++ is neither context-free nor context-sensitive.

If you allow arbitrary symbol sequences on both sides of any production, you produce an Type-0 grammar ("unrestricted") in the Chomsky hierarchy, which is more powerful than a context-sensitive grammar; unrestricted grammars are Turing-complete. A context-sensitive (Type-1) grammar allows multiple symbols of context on the left hand side of a production, but the same context must appear on the right hand side of the production (hence the name "context-sensitive"). [1] Context-sensitive grammars are equivalent to linear-bounded Turing machines.

In the example program, the prime computation could be performed by a linear-bounded Turing machine, so it does not quite prove Turing equivalence, but the important part is that the parser needs to perform the computation in order to perform syntactic analysis. It could have been any computation expressible as a template instantiation and there is every reason to believe that C++ template instantiation is Turing-complete. See, for example, Todd L. Veldhuizen's 2003 paper.

Regardless, C++ can be parsed by a computer, so it could certainly be parsed by a Turing machine. Consequently, an unrestricted grammar could recognize it. Actually writing such a grammar would be impractical, which is why the standard doesn't try to do so. (See below.)

The issue with "ambiguity" of certain expressions is mostly a red herring. To start with, ambiguity is a feature of a particular grammar, not a language. Even if a language can be proven to have no unambiguous grammars, if it can be recognized by a context-free grammar, it's context-free. Similarly, if it cannot be recognized by a context-free grammar but it can be recognized by a context-sensitive grammar, it's context-sensitive. Ambiguity is not relevant.

But in any event, like line 21 (i.e. auto b = foo<IsPrime<234799>>::typen<1>();) in the program below, the expressions are not ambiguous at all; they are simply parsed differently depending on context. In the simplest expression of the issue, the syntactic category of certain identifiers is dependent on how they have been declared (types and functions, for example), which means that the formal language would have to recognize the fact that two arbitrary-length strings in the same program are identical (declaration and use). This can be modelled by the "copy" grammar, which is the grammar which recognizes two consecutive exact copies of the same word. It's easy to prove with the pumping lemma that this language is not context-free. A context-sensitive grammar for this language is possible, and a Type-0 grammar is provided in the answer to this question: https://math.stackexchange.com/questions/163830/context-sensitive-grammar-for-the-copy-language .

If one were to attempt to write a context-sensitive (or unrestricted) grammar to parse C++, it would quite possibly fill the universe with scribblings. Writing a Turing machine to parse C++ would be an equally impossible undertaking. Even writing a C++ program is difficult, and as far as I know none have been proven correct. This is why the standard does not attempt to provide a complete formal grammar, and why it chooses to write some of the parsing rules in technical English.

What looks like a formal grammar in the C++ standard is not the complete formal definition of the syntax of the C++ language. It's not even the complete formal definition of the language after preprocessing, which might be easier to formalize. (That wouldn't be the language, though: the C++ language as defined by the standard includes the preprocessor, and the operation of the preprocessor is described algorithmically since it would be extremely hard to describe in any grammatical formalism. It is in that section of the standard where lexical decomposition is described, including the rules where it must be applied more than once.)

The various grammars (two overlapping grammars for lexical analysis, one which takes place before preprocessing and the other, if necessary, afterwards, plus the "syntactic" grammar) are collected in Appendix A, with this important note (emphasis added):

This summary of C++ syntax is intended to be an aid to comprehension. It is not an exact statement of the language. In particular, the grammar described here accepts a superset of valid C++ constructs. Disambiguation rules (6.8, 7.1, 10.2) must be applied to distinguish expressions from declarations. Further, access control, ambiguity, and type rules must be used to weed out syntactically valid but meaningless constructs.

Finally, here's the promised program. Line 21 is syntactically correct if and only if the N in IsPrime<N> is prime. Otherwise, typen is an integer, not a template, so typen<1>() is parsed as (typen<1)>() which is syntactically incorrect because () is not a syntactically valid expression.

template<bool V> struct answer { answer(int) {} bool operator()(){return V;}};

template<bool no, bool yes, int f, int p> struct IsPrimeHelper
  : IsPrimeHelper<p % f == 0, f * f >= p, f + 2, p> {};
template<bool yes, int f, int p> struct IsPrimeHelper<true, yes, f, p> { using type = answer<false>; };
template<int f, int p> struct IsPrimeHelper<false, true, f, p> { using type = answer<true>; };

template<int I> using IsPrime = typename IsPrimeHelper<!(I&1), false, 3, I>::type;
template<int I>
struct X { static const int i = I; int a[i]; }; 

template<typename A> struct foo;
template<>struct foo<answer<true>>{
  template<int I> using typen = X<I>;
};
template<> struct foo<answer<false>>{
  static const int typen = 0;
};

int main() {
  auto b = foo<IsPrime<234799>>::typen<1>(); // Syntax error if not prime
  return 0;
}

[1] To put it more technically, every production in a context-sensitive grammar must be of the form:

αAβ &rightarrow; αγβ

where A is a non-terminal and α, β are possibly empty sequences of grammar symbols, and γ is a non-empty sequence. (Grammar symbols may be either terminals or non-terminals).

This can be read as A &rightarrow; γ only in the context [α, β]. In a context-free (Type 2) grammar, α and β must be empty.

It turns out that you can also restrict grammars with the "monotonic" restriction, where every production must be of the form:

α &rightarrow; β where |α| ≥ |β| > 0  (|α| means "the length of α")

It's possible to prove that the set of languages recognized by monotonic grammars is exactly the same as the set of languages recognized by context-sensitive grammars, and it's often the case that it's easier to base proofs on monotonic grammars. Consequently, it's pretty common to see "context-sensitive" used as though it meant "monotonic".

Solution 2

First, you rightly observed there are no context sensitive rules in the grammar at the end of the C++ standard, so that grammar is context-free.

However, that grammar doesn't precisely describe the C++ language, because it produces non-C++ programs such as

int m() { m++; }

or

typedef static int int;

The C++ language defined as "the set of well-formed C++ programs" is not context-free (it's possible to show that merely demanding variables to be declared makes it so). Given you can theoretically write Turing-complete programs in templates and make a program ill-formed based on their result, it's not even context-sensitive.

Now, (ignorant) people (usually not language theorists, but parser designers) typically use "not context-free" in some of the following meanings

  • ambiguous
  • can't be parsed with Bison
  • not LL(k), LR(k), LALR(k) or whatever parser-defined language class they chose

The grammar at the back of the standard doesn't satisfy these categories (i.e. it is ambiguous, not LL(k)...) so C++ grammar is "not context-free" for them. And in a sense, they're right it's damn well hard to produce a working C++ parser.

Note that the properties here used are only weakly connected to context-free languages - ambiguity doesn't have anything to do with context-sensitivity (in fact, context-sensitive rules typically help disambiguate productions), the other two are merely subsets of context-free languages. And parsing context-free languages is not a linear process (although parsing deterministic ones is).

Solution 3

Yes. The following expression has a different order of operations depending on type resolved context:

Edit: When the actual order of operation varies, it makes it incredibly difficult to use a "regular" compiler that parses to an undecorated AST before decorating it (propagating type information). Other context sensitive things mentioned are "rather easy" compared to this (not that template evaluation is at all easy).

#if FIRST_MEANING
   template<bool B>
   class foo
   { };
#else
   static const int foo = 0;
   static const int bar = 15;
#endif

Followed by:

static int foobar( foo < 2 ? 1 < 1 : 0 > & bar );

Solution 4

To answer your question, you need to distinguish two different questions.

  1. The mere syntax of almost every programming language is context-free. Typically, it is given as an extended Backus-Naur form or context-free gramar.

  2. However, even if a program conforms with the context-free gramar defined by the programming language, it is not necessarily a valid program. There are many non-context-free poperties that a program has to satisfy in order to be a valid program. E.g., the most simple such property is the scope of variables.

To conclude, whether or not C++ is context-free depends on the question you ask.

Solution 5

Yeah C++ is context sensitive, very context sensitive. You cannot build the syntax tree by simply parsing through the file using a context free parser because in some cases you need to know the symbol from previous knowledge to decide (ie. build a symbol table while parsing).

First example:

A*B;

Is this a multiplication expression?

OR

Is this a declaration of B variable to be a pointer of type A?

If A is a variable, then it's an expression, if A is type, it's a pointer declaration.

Second example:

A B(bar);

Is this a function prototype taking an argument of bar type?

OR

Is this declare variable B of type A and calls A's constructor with bar constant as an initializer?

You need to know again whether bar is a variable or a type from symbol table.

Third example:

class Foo
{
public:
    void fn(){x*y;}
    int x, y;
};

This is the case when building symbol table while parsing does not help because the declaration of x and y comes after the function definition. So you need to scan through the class definition first, and look at the method definitions in a second pass, to tell x*y is an expression, and not a pointer declaration or whatever.

Share:
66,701
fredoverflow
Author by

fredoverflow

Updated on July 18, 2022

Comments

  • fredoverflow
    fredoverflow almost 2 years

    I often hear claims that C++ is a context-sensitive language. Take the following example:

    a b(c);
    

    Is this a variable definition or a function declaration? That depends on the meaning of the symbol c. If c is a variable, then a b(c); defines a variable named b of type a. It is directly initialized with c. But if c is a type, then a b(c); declares a function named b that takes a c and returns an a.

    If you look up the definition of context-free languages, it will basically tell you that all grammar rules must have left-hand sides that consist of exactly one non-terminal symbol. Context-sensitive grammars, on the other hand, allow arbitrary strings of terminal and non-terminal symbols on the left-hand side.

    Browsing through Appendix A of "The C++ Programming Language", I couldn't find a single grammar rule that had anything else besides a single non-terminal symbol on its left-hand side. That would imply that C++ is context-free. (Of course, every context-free language is also context-sensitive in the sense that the context-free languages form a subset of the context-sensitive languages, but that is not the point.)

    So, is C++ context-free or context-sensitive?

  • Ira Baxter
    Ira Baxter almost 15 years
    Most C++ parsers do not use GLR parsing technology. GCC doesn't. Some do. See semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html for one that does.
  • Blaisorblade
    Blaisorblade over 12 years
    Why can that problem not be solved like for C, by remembering which type definitions are in scope?
  • Sam Harwell
    Sam Harwell over 12 years
    @Blaisorblade: One way to make a compiler "clean" is to separate tasks into independent steps in a chain, such as creating a parse tree from the input followed by a step that does the type analysis. C++ forces you to either 1) merge these steps into one or 2) parse the document according to both/all possible interpretations, and allowing the type resolution stages to narrow it down to the correct interpretation.
  • Blaisorblade
    Blaisorblade over 12 years
    @280Z28: agreed, but that's the case for C, too; I think a good answer to this question should show why C++ is worse than C. The PhD thesis linked here does that: stackoverflow.com/a/243447/53974
  • AProgrammer
    AProgrammer almost 12 years
    A B(); is a function declaration even in a function definition. Look for most vexing parse...
  • Kerrek SB
    Kerrek SB over 11 years
    Also a<b<c>>d, right? (Your example is actually a classic from C, where it is the only obstruction to being context-free.)
  • Puppy
    Puppy over 11 years
    That's more of a lexing issue, I think. But is certainly in the same category, yes.
  • Puppy
    Puppy over 11 years
    The questioner doesn't ask how it's more context-sensitive than C, only to show that it is context-sensitive.
  • Kerrek SB
    Kerrek SB over 11 years
    So.. is C++ more context-sensitive than C?
  • Puppy
    Puppy over 11 years
    Pretty sure there are more ambiguities, yes.
  • Puppy
    Puppy over 11 years
    So not only is it context-sensitive, but it can be made to depend on any context you can express in templates, which are Turing-complete.
  • rici
    rici over 11 years
    @DeadMG: "Computationally, a context-sensitive language is equivalent with a linear bounded nondeterministic Turing machine, also called a linear bounded automaton" (en.wikipedia.org/wiki/Context-sensitive_language) Let me see if I can find an online primary source.
  • user541686
    user541686 over 11 years
    @DeadMG I don't think you're answering the question (I don't think I was either). How does having terminals on the left hand side of a production solve this problem?
  • user541686
    user541686 over 11 years
    How does having multiple symbols on the left hand side of a production solve this problem? I don't think this answer is answering the question.
  • user541686
    user541686 over 11 years
    How does having multiple symbols on the left hand side of a production solve this problem? I don't think this answer is answering the question.
  • rici
    rici over 11 years
    @mehrdad: the ability to put multiple symbols on the left hand side of a production makes the grammar turing-complete. Since parsing the language is turing computable, a context-sensitive grammar could parse it. Actually writing such a CSG would be impractical.
  • Omri Barel
    Omri Barel over 11 years
    My answer is in response to the first sentence: "I often hear claims that C++ is a context-sensitive language." If those claims use the expression "context-sensitive" informally, then there's no problem. I don't think that C++ is formally context-sensitive.
  • user541686
    user541686 over 11 years
    @rici you dodged the question :P the current problem with C++ is that it's CFG is ambiguous. My question was, how does a CSG solve that problem example please?)... You responded by telling me it's Turing complete, which completely avoided giving a solution to the problem!
  • Admin
    Admin over 11 years
    @Mehrdad Are you asking how a CSG would solve the problem of the grammar being ambiguous?
  • rici
    rici over 11 years
    @mehrdad, the OP says "context-sensitive language", not context-sensitive grammar. Ambiguity is a feature of a grammar, not a language. The language is indeed context-sensitive, but not because a particular grammar for it is ambiguous.
  • Lightness Races in Orbit
    Lightness Races in Orbit over 11 years
    ambiguity doesn't have anything to do with context-sensitivity This was my intuition too, so I'm glad to see someone (a) agree, and (b) explain it where I could not. I believe it disqualifies all the arguments that are based on a b(c);, and partially satisfy the original question whose premise was "oft-heard" claims of context-sensitivity being due to ambiguity... especially when for the grammar there's actually no ambiguity even in the MVP.
  • rici
    rici over 11 years
    Note that my example is not ambiguous. It's an unambiguous expression of a valid program. If you change the value in line 21, it may become ill-formed. But in neither case is it ambiguous.
  • user541686
    user541686 over 11 years
    @rici: Yeah, but how would any CSG be able to parse C++ correctly, is the question? I can see the same ambiguity problems with C++ CSGs as I see in C++ CFGs, so I'm failing to see what makes C++ describable by a context-sensitive grammar but not by a context-free grammar.
  • user541686
    user541686 over 11 years
    @Non-StopTimeTravel: Yeah, that's another reason why I deleted my answer, after I thought about it some more. I think an ambiguous grammar is completely fine as long as the strings it produces are those (and only those) inside the language. Which is why I'm failing to see how a CSG would succeed at any problems that a CFG has failed in...
  • rici
    rici over 11 years
    @Mehrdad: I tried to answer that. I fear that it is now an essay on its way to being an introductory text on parsing theory. Prove me wrong by reading it all. (What CSGs are you referring to?)
  • user541686
    user541686 over 11 years
    I think C++ is formally context-sensitive, but the problem I'm having is that I don't understand how a context-sensitive grammar would have any more success at parsing C++ than a CFG would.
  • user541686
    user541686 over 11 years
    @rici: I read your answer, and it seems to be consistently dodging the question. The question the OP posed is: What makes C++ context-sensitive? The answer I think he (and others) expect to see is: <some particular problem> makes it context-sensitive, and here is how a similar problem, in a simple toy grammar, would be solved by <some sample CSG>, but is impossible to solve with a pure CFG. I have yet to see any answer that actually answers the question of what makes C++ CSL (i.e. describable by a CSG but not a CFG).
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    This would be a good answer were it not for the following sentence: “ Given you can theoretically write Turing-complete programs in templates and make a program ill-formed based on their result, it's not even context-sensitive.” – This is wrong, Context sensitive is equivalent to Turing complete.
  • Lightness Races in Orbit
    Lightness Races in Orbit over 11 years
    @KonradRudolph: Can you provide evidence of that? (though I'm not disputing it)
  • Admin
    Admin over 11 years
    I have one doubt: As you show, the result of template evaluation can make the difference between a well-formed and an ill-formed program. Template evaluation is turing-complete. So wouldn't correctly determining whether a string is in the language (C++) require turing-completeness? As you say, a context-sensitive language is "just" a "linear bounded automaton", which is not turing-complete AFAIK. Or is your argument making use of the limits the C++ standard puts on some things including template evaluation depth?
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    @Non-StopTimeTravel Is the Wikipedia article not enough for you? It’s textbook knowledge.
  • Admin
    Admin over 11 years
    @KonradRudolph I'd like a reference too. en.wikipedia.org/wiki/Context-sensitive_language says that context sensitive = Linear bounded automaton, which (from my limited understanding) is not turing-complete because of the limited tape length.
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    @delnan So do C++ templates. The standard defines a maximum depth that a conforming compiler has to provide (at minimum).
  • Lightness Races in Orbit
    Lightness Races in Orbit over 11 years
    @KonradRudolph: An assertion without any reference is not enough for me on SO, no. If you're citing a passage on Wikipedia, please provide a link and a quote.
  • Admin
    Admin over 11 years
    @KonradRudolph Okay, I know of this limitation but never see anyone base their argument on it, so I generally don't give people the benefit of doubt ;-)
  • rici
    rici over 11 years
    @delnan, I think I was wrong. I changed the answer to talk about unrestricted grammars.
  • rici
    rici over 11 years
    @mehrdad: the copy language is precisely that sort of toy language. Do you expect a formal proof that it's not CFG? Such a proof is in every introductory text that I've seen; it's the classic first use of the pumping lemma.
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    @Non-StopTimeTravel No, sorry. I do not provide sources for “general reference” knowledge that can be trivially found on Google or Wikipedia without having to dig for them.
  • user541686
    user541686 over 11 years
    @rici: Ahhh yes, now I finally see it... the answer holds the light! +1 for you, this is an awesome answer then.
  • Puppy
    Puppy over 11 years
    The question doesn't ask. It only asks if C++ is context sensitive. It is. That's all there is to it. I believe that you'd have to make the additional terminals match the typedef, then you can match it to a function decl, or you make the additional terminals match a variable definition, then you can match it to variable decl. But you'd have to ask someone who cares about context-sensitive grammars except not producing one or hacking around the context-sensitive part, not me.
  • rici
    rici over 11 years
    @KonradRudolph: What the standard says is "There is an implementation-defined quantity that specifies the limit on the total depth of recursive instantiations, which could involve more than one template. The result of an infinite recursion in instantiation is undefined." (14.7.1p15) I interpret that to mean that an implementation is not required to understand every valid c++ program, not that programs with too large a recursion depth are invalid. The only ones which are marked as invalid are those with an infinite recursion depth.
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    @rici Ah. Yes, I see. I think previous versions of the standard explicitly specified an upper level (which was pretty low, and the cause of much ire, if I remember correctly).
  • Lightness Races in Orbit
    Lightness Races in Orbit over 11 years
    @Konrad: That's not very scientific of you. I'm really good at Googling and didn't find anything "trivially". Since you have been unwilling or unable to provide a reference when asked, I have no choice but to declare your assertion as inadmissable!
  • Konrad Rudolph
    Konrad Rudolph over 11 years
    @Non-StopTimeTravel Nonsense. In fact, it’s common in scientific discussions to take “general reference” knowledge for granted. And I have even pointed you to Wikipedia. Do you really want me to believe that you have read the short article about “context-sensitive grammar” and not found the relevant paragraph?
  • Lightness Races in Orbit
    Lightness Races in Orbit over 11 years
    @KonradRudolph: I dispute that it's "general reference". The fact that I read that rather complex article and do not comprehend it sufficiently to piece out this little fact should be enough to demonstrate that. It's not as if you said something like "computers commonly use electricity", or "bits can be true or false".
  • Dan
    Dan over 11 years
    As far as I know, prime can be decided by a polynomial bounded Turing machine. Hence, the language of all prime numbers should be context-sensitive.
  • Dan
    Dan over 11 years
    You write "I think the prime computation can be performed by a linear-bounded Turing machine". I just wrote that it can be done.
  • Maciej Piechotka
    Maciej Piechotka over 11 years
    Sorry - could you make the example clearer? I can see why the program fails to compile but my feeling is that it would fail at the semantic analiser rather then parsing - i.e. it have valid syntax but is non a valid program nonetheless (similar problem to variable scope etc.).
  • markus
    markus over 11 years
    This discussion is growing too long and hard to follow. However it contains good information which should be integrated into the the question or an answer. Please do that and if needed, continue the discussion in the chat!
  • rici
    rici over 11 years
    @markus-tharkun, I've been doing that as the discussion progresses, so hopefully there is no useful unintegrated information. If you think I've missed something, let me know or edit it in.
  • markus
    markus over 11 years
    @rici no, I don't think you missed anything. I just posted my standard comment for such occasions ;)
  • LaC
    LaC over 11 years
    It's interesting to note that you often have to place the "mere syntax" level lower than you'd expect, in order to get a CFG for your programming language. Take C, for instance. You might think that the grammar rule for a simple variable declaration in C would be VARDECL : TYPENAME IDENTIFIER, but you cannot have that, because you cannot distinguish type names from other identifiers at a CF level. Another example: at a CF level, you cannot decide whether to parse a*b as a variable declaration (b of type pointer to a) or as a multiplication.
  • mmx
    mmx over 11 years
    I'd like to add to your comment about the standard not trying to define an "unrestricted grammar" for C++. It's not just impracticality, but also the fact that it would still not be enough for compiling. That grammar would only recognize a C++ program. Recognizing whether a program is valid or invalid C++ is of little use. You'd also need the parse tree "built" in a well-defined way in order to preserve all the semantics. As noted previously, as long as you can distinguish between a valid and invalid parse, you don't need to care about "ambiguity" at all from a theoretical perspective.
  • rici
    rici over 11 years
    @MehrdadAfshari, the problem is that general rewrite systems are not a very good way of writing parsers, just like Turing's state machines are not a very good way of writing algorithms. Nonetheless, I can't quite say that a rewrite recognizer would be useless. Augmenting the existing CFG with a some other formal system which could gate particular productions would actually produce a usable parser: the existing CFG provides the grammatical structure. Unfortunately, implementing template instantiation with a rewrite system would be somewhere between a good challenge and BF.
  • mmx
    mmx over 11 years
    @rici I tried not to use the word "useless". My specific point is that a grammar that could accept/reject C++ is not enough to describe the behavior of the output program, just like an ambiguous context-free grammar might not be enough for the same thing (which seems to have been a source of confusion to some people throughout this page confusing it with context-sensitivity of the language defined by the grammar). I agree that it is an interesting challenge. I'd be curious how difficult it can be (even for simpler languages and not C++).
  • Kaz
    Kaz over 11 years
    This answer contains numerous flaws. The word "context" is waved around too informally, which is wrong because the question is about "context-free", which is a formal term. The error in the template is not in fact a syntax error but a semantic error. A language spec can declare "by fiat" that a program which violates some semantic constraint is not well formed and should be diagnosed as if it has a syntax error. But that is just a case of pointing to a tail and calling it a leg. Fact is that the erroneous construct has a parse. That exact piece of text will parse under some conditions.
  • Dan
    Dan over 11 years
    @LaC: Yes, thanks for pointing this out! By the way, I'm sure that there is a more commonly used technical term for mere syntax. Does anyone the correct term?
  • reinierpost
    reinierpost over 11 years
    @Dan: what you're talking about is an approximation of the language given by some context-free grammar. Of course such an approximation is coontext-free by definition. This is the sense in which "syntax" is often use when discussing programming languages.
  • reinierpost
    reinierpost over 11 years
    To follow up to Kaz's remarks: the answer should clearly make a distinction between the task of parsing C++ with the objective of interpreting it, and merely recognizing whether a particular piece of code is syntactically valid C++. It should also state what exactly constitutes a definition of when a C++ program is 'syntactically valid'.
  • Samuel Edwin Ward
    Samuel Edwin Ward over 11 years
    If this fact really is so widely known I'd think it would be much easier to find a reference to it than to argue at length about whether or not one should be provided. Not to mention constructive.
  • Joseph Garvin
    Joseph Garvin over 11 years
    I don't see how template evaluation is the same as parsing, which is what the OP was asking about. I just read this answer as, "A subset of C++ is evaluated at compile time", which happens to happen after parsing but before the 'real' program really runs, so maybe that's why they're being confused here.
  • rici
    rici over 11 years
    @kaz, I made the definition of "context" in "context-sensitive" more precise. (I knew what I meant :), but I agree it wasn't sufficiently obvious.) There is no error in the template; the error (in those cases where there is one) is attempting to invoke a template which doesn't exist. Because of the way C++ parses <, >, and >> (sometimes as operators and sometimes as brackets), it is necessary to know whether a symbol which precedes a < is a template before it is possible to know what lexeme follows it. Once > is known to be an operator, value > () is a syntax error.
  • rici
    rici over 11 years
    @JosephGarvin, see the immediately preceding comment. You cannot finish the parse without instantiating the template, so the instantiation cannot happen "after the parse".
  • rici
    rici over 11 years
    @SamuelEdwinWard -std=c++11 (or -std=c++0x if you have an older g++)
  • Anton Golov
    Anton Golov over 11 years
    @rici: Perhaps the example would be clearer if both branches compiled, but produced clearly different parse trees.
  • rici
    rici over 11 years
    @AntonGolov: My original version of that example did just that (you can achieve it by putting 0 inside of (), for a simple one), but I think it's more interesting this way, because it demonstrates that you need the template instantiation even to recognize if a string is a syntactically correct C++ program. If both branches compile, then I'd have to work harder to contest the argument that the difference is "semantic". Curiously, although I'm often challenged to define "syntactic", no-one has ever offered a definition of "semantic" other than "stuff I don't think is syntactic" :)
  • Joseph Garvin
    Joseph Garvin over 11 years
    @rici: If that were true why do you need to use the template keyword when calling a method in a templated base class? I thought that was added to specifically prevent what you're describing. Maybe how we're defining parsing is the issue -- I would consider that as soon as the compiler knows that that typen<> must be a template that the file is 'parsed'. That you get an error when not inputting a prime number I wouldn't describe as a parsing error (even though it occurs before code generation) just like I wouldn't consider a static_assert failure a parsing error.
  • rici
    rici over 11 years
    @JosephGarvin: you need the template keyword only when the identifier which names the template is a dependent name. That doesn't apply in the case of my program. If the identifier is not a dependent name, then the template keyword is, ahem, a syntax error. (This surprised me, but it's true. Try it.) By the way, personally I don't think the file is parsed until the last character is parsed, but your definition works: The compiler doesn't know that typen is a template until after it instantiates foo<IsPrime<234799>> (since it wouldn't be with foo<IsPrime<234797>>).
  • rici
    rici over 11 years
    @JosephGarvin: small correction. Apparently, you can use template even if the compiler can figure it out for itself. (As long as the name is qualified.) So, indeed, I could have written template typen, which would have caused the syntax error to be noticed a couple of tokens earlier, since it would then be an immediate error after the non-prime foo<IsPrime<234797>> template was instantiated, instead of waiting until the > is parsed as an comparison operator and the () is found to not be parseable as an expression.
  • Joseph Garvin
    Joseph Garvin over 11 years
    @rici: I see, but I still find the terminology somewhat misleading. Calling it a syntax error suggests that passing a composite number to the IsPrime template is a syntax mistake, when only a language lawyer would identify it as such. To be really pedantic it seems like a higher order type error, where passing a composite number doesn't higher-type-check because typen is not a type of type that takes template arguments.
  • rici
    rici over 11 years
    @josephGarvin: passing a composite number to the IsPrime template is just fine. It's not a type error at all; it's a simple syntax error. You simply can't use () as an expression, that's all. (Of course, the real problem is that in this particular case, it's not easy to predict that < is an operator; in particular, human beings get it wrong because we do actually (and sometimes incorrectly) use whitespace-sensitive parsing. But the compiler knows the syntax better.) Anyway, I wasn't talking about language legality; I was talking about formal grammar theory.
  • pnkfelix
    pnkfelix over 11 years
    As far as I can tell, @Konrad was mistaken when he said "Context sensitive is equivalent to Turing complete." (at least, he was if he was denoting "Recursively enumerable" by "Turing complete"), and has since been unable to recognize this mistake. Here is a reference for the proper set inclusion relationship involved here: en.wikipedia.org/wiki/Chomsky_hierarchy
  • Joseph Garvin
    Joseph Garvin over 11 years
    I believe C++11 mandates the latter interpretation, and you have to add parens to opt-in to the former.
  • Joseph Garvin
    Joseph Garvin over 11 years
    @rici: Ah, now I get it. When you pass a composite the error technically comes from the () not being an expression even though the compiler only goes for that interpretation based on typen not being a template, rather than erroring due to typen not being a template.
  • rici
    rici over 11 years
    @JosephGarvin, no. C++ mandates that < must be a bracket if it could be (eg., it follows an identifier which names a template). C++11 added the requirement that > and the first character of >> be interpreted as close brackets if that use is plausible. This affects the parsing of a<b>c> where a is a template but has no effect on a<b<c>.
  • rici
    rici over 11 years
    @aaron: how is that simpler than a(); (which is either expr.call or expr.type.conv)?
  • Joseph Garvin
    Joseph Garvin over 11 years
    @rici: Oops, I didn't realize it was asymmetric.
  • etherice
    etherice about 11 years
    The IsPrime<...> example (complete with program) was extreme overkill for such a simple point... A more straightforward (as in less convoluted) example would have been something like: nullptr_t n = std::conditional<true, nullptr_t, int>::type{}; which is "syntactically correct" (as the OP phrased it) if and only if the first template parameter is true.
  • rici
    rici about 11 years
    @etherice: I agree with your usage of "syntactically correct" (which was my phrase, not OP's) but many commentators here don't; I think they would describe your example as being syntactically correct but semantically incorrect (since it is a type error). IsPrime came out of an investigation of how to syntax-colour C++; it was part of my realization that it's really not practical to definitively decide whether < is an operator or a bracket. (Fortunately, there are good heuristics, and syntax-colouring is allowed to be wrong in pathological cases.)
  • etherice
    etherice about 11 years
    @rici: I actually side with the other commenters that the error is a semantic one, not a syntactical one, but I admit it's hard to classify a template-meta-programming error. The point of my comment was to show that your IsPrime example was a bit "esoteric" for the relatively simple point you were trying to make. Ultimately, whether or not your program compiled depended on 1) the result of template-meta-programming, and 2) whether or not the resultant rvalue type is implicitly-convertible to the lvalue type. That seems like a semantic issue, not a syntactic one.
  • rici
    rici about 11 years
    @etherice: The only lvalue in my little program is auto so there is no type conversion at all; I think you're misunderstanding the nature of the program. The syntactic question is whether typen<1>() is (a) a template instantiation or (b) a simple expression. In other words, whether the < and > are template brackets or comparison operators. In the former case, () is an empty parameter list; in the latter case, it's a syntax error. I don't see how you can call this syntax error anything other than a syntax error, and that was the point of the example.
  • etherice
    etherice about 11 years
    @rici: There is type deduction, which is still a semantic not syntactic issue. I took another look at the program, and it can be simplified to auto b = std::enable_if< true/false , SomeType >(); ... whether or not it compiles depends on the C++ principle of SFINAE, which is definitely a semantic matter. I see what you're saying about parsing the RHS expression as (foo<false>::typen < 1) > () as a "most vexing parse" scenario that results in a syntax error at () as the 2nd argument to operator>. It's interesting enough to make its own SO question for the C++ experts to disect.
  • rici
    rici about 11 years
    @etherice: I think you need to look at it some more :). It's not the "most vexing parse" since it is clearly an assignment statement. Also note that if you put a value inside the parentheses, say typen<1>(42)', then it would not be a syntax error regardless of whether IsPrime` is instantiated with a prime or not, but the two cases produce radically different parses. What you're maybe missing is that typen might be a template alias, or it might be a static const int; you need to know which one it is in order to parse what follows typen. By parse, I mean parse. Hence syntax.
  • etherice
    etherice about 11 years
    @rici: I took another (final) look, this time actually using a compiler, and the program simplifies to these 3 statements: struct foo { typedef int type; }; struct bar { const int type = 0; }; and either auto b = foo::type(); or auto b = bar::type(); ... The foo case is fine as it creates a default int. The bar case becomes auto b = 0(); which is, in fact, a syntax error in this simplified form. However, in its template-form, it "seems" more like a semantic issue since it depends on the results of a template-meta-program (albeit a trivial one), which must be evaluated.
  • etherice
    etherice about 11 years
    @rici: The <false> case more specifically comes down to auto b = 0<1>(); which goes back to my previous point that it could be parsed as auto b = ((0 < 1) > ()); in which case the syntax error occurs at the 2nd argument to operator>, making the entire expression invalid. The reason typen<1>(42) works is that it's the same as X<1>(42) which implicitly constructs an X from int value 42. If the constructor were declared explicit, that would not be possible (once again, semantics).
  • rici
    rici about 11 years
    @etherice: The reason typen<1> works is that typen is a template. That's independent of the explicitness of the constructor. If typen were a non-template type, then typen<1> would be a different syntax error, because a typename cannot be an operand to the operator <. By the way, there is no optionality about the parse. It must be parsed either as a templated constructor or as an expression, depending on the "kind" of the member symbol typen; it is not a resolved ambiguity, unlike the "vexing parse". But enough.
  • etherice
    etherice about 11 years
    @rici: Yes, I know typen is a template<int> struct X in the <true> case, and therefore typen<1> is X<1>, and X<1>(42) implicitly constructs a X<1>. Where you're wrong is in the <false> case, where typen is a const int, which can obviously be the operand to an operator< invocation, making it possible to parse it as the 1st argument to a operator< for int arguments. I agree with the "enough" part, as we seem to have a disconnect here. For example, I said typen<1>(42) and you read it as typen<1>, etc.
  • rici
    rici about 11 years
  • corazza
    corazza about 10 years
    Are you describing ambiguity, or context sensitivity?
  • ShellFish
    ShellFish about 9 years
    "Writing a Turing machine to parse C++ would be an equally impossible undertaking." If any machine can ever exist to parse C++, which I assume does exist, a Turing machine to do so exists too. Just saying. Claiming a parser for C++ is Turing complete and saying it's impossible to write a Turing machine for it is a contradiction.
  • rici
    rici about 9 years
    @shellfish: That Turing machine exists in the same way that the decimal representation of the fifth Ackermann number (oeis.org/A189896) exists. I do not believe it is at all contradictory to assert to no human being will ever write that number out. It is impossible, within the physical constraints of the universe. That's the impossibility I was referring to. Probably, the Turing machine is not quite as big as a(5), but I don't feel like I go too far out on a limb asserting that the task of writing out it's state transition table is not a very plausible project.
  • ShellFish
    ShellFish about 9 years
    @rici Oh okay, I interpreted that differently. Let's call it semantics. Amazing post by the way, definitely the best discussion I've ever read on SO!
  • Nikos M.
    Nikos M. almost 9 years
    or people use the term "context-free language" for the opposite, a CFG can describe a superset of the (valid) strings comprising the language (including non-valid ones as per your example). sidenote: i would not label people working with implementations "ignorant", just because they use a specific implementation of CFGs instead of an abstract formality
  • Barrett Adair
    Barrett Adair over 8 years
    @rici Here is a slightly cleaner C++14 version of your (excellent) example: melpon.org/wandbox/permlink/8tpay0SxHvWXfDny
  • Barrett Adair
    Barrett Adair over 8 years
    I'd also like to point out that C++14 generalized constexpr can be used to create a similar example with much less mystique (no need for recursive inheritance).
  • Ira Baxter
    Ira Baxter over 8 years
    The (context-free) productions define the most vexing parse well enough so it can be parsed by a context free parsing engine. That delays the problem of deciding which of multiple interpretations are valid until after the parse is complete, but that just make the engineering of the parser and name resolver easier, because they are modular rather than tangled as in conventional C++ parsers. See AST for most vexing parse: stackoverflow.com/questions/17388771/…
  • Emil Jeřábek
    Emil Jeřábek about 8 years
    I don’t understand the point of the example with primes. Primality is decidable in NSPACE(O(n)), and therefore by a context-sensitive grammar.
  • Ira Baxter
    Ira Baxter about 8 years
    "You cannot build the syntax tree by simply parsing through the file " FALSE. See my answer.
  • Puppy
    Puppy about 8 years
    Determining if it's a variable declaration or a multiplication is not a type checking feature. Also I had to scrub your answer of that self-promotion stuff... again.
  • Ira Baxter
    Ira Baxter about 8 years
    @Puppy: you can say what you like, but that's how the tool works. Deleting the tool name is probably just going to make people ask what the tool name is.
  • Puppy
    Puppy about 8 years
    Whether or not that's how the tool works is irrelevant, since the question does not ask for the working of the tool. Also, I think that we can safely wait for that to actually happen.
  • user2023370
    user2023370 almost 6 years
    Can we say that the C++ language is defined by the grammar in the specification, with restrictions? Can this be modelled with a minimal grammar, also with restrictions, to illustrate the situation more easily?