What are the rules about using an underscore in a C++ identifier?

299,082

Solution 1

The rules (which did not change in C++11):

  • Reserved in any scope, including for use as implementation macros:
    • identifiers beginning with an underscore followed immediately by an uppercase letter
    • identifiers containing adjacent underscores (or "double underscore")
  • Reserved in the global namespace:
    • identifiers beginning with an underscore
  • Also, everything in the std namespace is reserved. (You are allowed to add template specializations, though.)

From the 2003 C++ Standard:

17.4.3.1.2 Global names [lib.global.names]

Certain sets of names and function signatures are always reserved to the implementation:

  • Each name that contains a double underscore (__) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
  • Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.165

165) Such names are also reserved in namespace ::std (17.4.3.1).

Because C++ is based on the C standard (1.1/2, C++03) and C99 is a normative reference (1.2/1, C++03) these also apply, from the 1999 C Standard:

7.1.3 Reserved identifiers

Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.

  • All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
  • All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
  • Each macro name in any of the following subclauses (including the future library directions) is reserved for use as specified if any of its associated headers is included; unless explicitly stated otherwise (see 7.1.4).
  • All identifiers with external linkage in any of the following subclauses (including the future library directions) are always reserved for use as identifiers with external linkage.154
  • Each identifier with file scope listed in any of the following subclauses (including the future library directions) is reserved for use as a macro name and as an identifier with file scope in the same name space if any of its associated headers is included.

No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.

If the program removes (with #undef) any macro definition of an identifier in the first group listed above, the behavior is undefined.

154) The list of reserved identifiers with external linkage includes errno, math_errhandling, setjmp, and va_end.

Other restrictions might apply. For example, the POSIX standard reserves a lot of identifiers that are likely to show up in normal code:

  • Names beginning with a capital E followed a digit or uppercase letter:
    • may be used for additional error code names.
  • Names that begin with either is or to followed by a lowercase letter
    • may be used for additional character testing and conversion functions.
  • Names that begin with LC_ followed by an uppercase letter
    • may be used for additional macros specifying locale attributes.
  • Names of all existing mathematics functions suffixed with f or l are reserved
    • for corresponding functions that operate on float and long double arguments, respectively.
  • Names that begin with SIG followed by an uppercase letter are reserved
    • for additional signal names.
  • Names that begin with SIG_ followed by an uppercase letter are reserved
    • for additional signal actions.
  • Names beginning with str, mem, or wcs followed by a lowercase letter are reserved
    • for additional string and array functions.
  • Names beginning with PRI or SCN followed by any lowercase letter or X are reserved
    • for additional format specifier macros
  • Names that end with _t are reserved
    • for additional type names.

While using these names for your own purposes right now might not cause a problem, they do raise the possibility of conflict with future versions of that standard.


Personally I just don't start identifiers with underscores. New addition to my rule: Don't use double underscores anywhere, which is easy as I rarely use underscore.

After doing research on this article I no longer end my identifiers with _t as this is reserved by the POSIX standard.

The rule about any identifier ending with _t surprised me a lot. I think that is a POSIX standard (not sure yet) looking for clarification and official chapter and verse. This is from the GNU libtool manual, listing reserved names.

CesarB provided the following link to the POSIX 2004 reserved symbols and notes 'that many other reserved prefixes and suffixes ... can be found there'. The POSIX 2008 reserved symbols are defined here. The restrictions are somewhat more nuanced than those above.

Solution 2

The rules to avoid collision of names are both in the C++ standard (see Stroustrup book) and mentioned by C++ gurus (Sutter, etc.).

Personal rule

Because I did not want to deal with cases, and wanted a simple rule, I have designed a personal one that is both simple and correct:

When naming a symbol, you will avoid collision with compiler/OS/standard libraries if you:

  • never start a symbol with an underscore
  • never name a symbol with two consecutive underscores inside.

Of course, putting your code in an unique namespace helps to avoid collision, too (but won't protect against evil macros)

Some examples

(I use macros because they are the more code-polluting of C/C++ symbols, but it could be anything from variable name to class name)

#define _WRONG
#define __WRONG_AGAIN
#define RIGHT_
#define WRONG__WRONG
#define RIGHT_RIGHT
#define RIGHT_x_RIGHT

Extracts from C++0x draft

From the n3242.pdf file (I expect the final standard text to be similar):

17.6.3.3.2 Global names [global.names]

Certain sets of names and function signatures are always reserved to the implementation:

— Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.

— Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

But also:

17.6.3.3.5 User-defined literal suffixes [usrlit.suffix]

Literal suffix identifiers that do not start with an underscore are reserved for future standardization.

This last clause is confusing, unless you consider that a name starting with one underscore and followed by a lowercase letter would be Ok if not defined in the global namespace...

Solution 3

From MSDN:

Use of two sequential underscore characters ( __ ) at the beginning of an identifier, or a single leading underscore followed by a capital letter, is reserved for C++ implementations in all scopes. You should avoid using one leading underscore followed by a lowercase letter for names with file scope because of possible conflicts with current or future reserved identifiers.

This means that you can use a single underscore as a member variable prefix, as long as it's followed by a lower-case letter.

This is apparently taken from section 17.4.3.1.2 of the C++ standard, but I can't find an original source for the full standard online.

See also this question.

Solution 4

As for the other part of the question, it's common to put the underscore at the end of the variable name to not clash with anything internal.

I do this even inside classes and namespaces because I then only have to remember one rule (compared to "at the end of the name in global scope, and the beginning of the name everywhere else").

Solution 5

Yes, underscores may be used anywhere in an identifier. I believe the rules are: any of a-z, A-Z, _ in the first character and those +0-9 for the following characters.

Underscore prefixes are common in C code -- a single underscore means "private", and double underscores are usually reserved for use by the compiler.

Share:
299,082
Roger Lipscombe
Author by

Roger Lipscombe

I was a Windows programmer (15 years of C++, C#, SQL Server), but I took a leap of faith and now I'm a backend software developer, using Erlang and Elixir, at Twilio.

Updated on April 20, 2020

Comments

  • Roger Lipscombe
    Roger Lipscombe about 4 years

    It's common in C++ to name member variables with some kind of prefix to denote the fact that they're member variables, rather than local variables or parameters. If you've come from an MFC background, you'll probably use m_foo. I've also seen myFoo occasionally.

    C# (or possibly just .NET) seems to recommend using just an underscore, as in _foo. Is this allowed by the C++ standard?

  • Martin York
    Martin York over 15 years
    They are common in libraries. They should not be common in user code.
  • Michael Burr
    Michael Burr over 15 years
    Just a note - with the exception of numbering, what Martin quoted from the draft standard is exactly what's in the C++03 standard (17.4.3.1.2).
  • fizzer
    fizzer over 15 years
    Your summary doesn't say the same thing as the quote from the Standard
  • Adam Mitz
    Adam Mitz over 15 years
    global names are different from "any identifier"
  • Martin York
    Martin York over 15 years
    @Adam Mitz: Global names also covers MACROS. Which will splatter your identifiers into a mush. This is what I was trying to convey.
  • John Millikin
    John Millikin over 15 years
    People do write libraries in C, you know.
  • CesarB
    CesarB over 15 years
    Here is the official chapter and verse, please add to your already excellent answer: opengroup.org/onlinepubs/009695399/functions/xsh_chap02_02.h‌​tml (and notice that many other reserved prefixes and suffixes you didn't mention can be found there).
  • josesuero
    josesuero about 15 years
    The C++ standard doesn't "import" the C one, does it? They import certain headers, but not the language as a whole, or naming rules, as far as I know. But yeah, the _t one surprised me as well. But since it's C, it can only apply to the global ns. Should be safe to use _t inside classes as I read it
  • Martin York
    Martin York about 15 years
    @jalf: The C++ standard is defined in terms of the C standard. Basically it says the C++ is C with these differences and additions.
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    Martin, in the answer you say "This at least means they are not macros.." which i read as "global names are not macros", which i also think they are not. macros are not members of ::, and are thus not global. but in the comment you say "global names also covers MACROS".
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    what is your final opinion on that? i've seen you added that thing into the answer after you did your comment. so do you have the same opinion as me with that macros are not global names?
  • Steve Jessop
    Steve Jessop almost 15 years
    Where does the C++ standard distinguish between things reserved "for the compiler" and things reserved "for the OS and libraries", please? I've seen where it reserves names to the implementation, but not where it specifies any distinction between a "compiler", "OS" and "libraries" as components of the implementation.
  • Johannes Schaub - litb
    Johannes Schaub - litb over 14 years
    The C++ Standard doesn't "import" the C Standard. It references the C Standard. The C++ library introduction says "The library also makes available the facilities of the Standard C Library". It does that by including headers of the C Standard library with appropriate changes, but not by "importing" it. The C++ Standard has an own set of rules that describes the reserved names. If a name reserved in C should be reserved in C++, that is the place to say this. But the C++ Standard doesn't say so. So i don't believe that things reserved in C are reserved in C++ - but i could well be wrong.
  • Johannes Schaub - litb
    Johannes Schaub - litb over 14 years
    This is what I found about the "_t" issue: n1256 (C99 TC3) says: "Typedef names beginning with int or uint and ending with _t" are reserved. I think that still allows using names like "foo_t" - but i think these are then reserved by POSIX.
  • Martin York
    Martin York over 14 years
    From the C++ standard 1.1. <quote>C++ is a general purpose programming language based on the C programming language as described in ISO/IEC 9899:1990 Programming languages — C (1.2). In addition to the facilities provided by C, C++ provides additional data types</quote>. My reading of this is that anything reserved in C is also reserved in C++ unless otherwise explicitly stated otherwise.
  • Martin York
    Martin York over 14 years
    As noted in the main article the '_t' suffix is reserved only by the POSIX standard not the C standard.
  • Sjoerd
    Sjoerd over 13 years
    So 'tolerance' is reserved by POSIX as it starts with 'to' + a lowercase letter? I bet a lot of code breaks this rule!
  • Martin York
    Martin York over 13 years
    @Sjoerd: Probably. Though I am sure that you will be fine as long as lerance does not become a real verb that can be applied to characters. Also note it is only reserved in global scope (C)_or the standard namespace (C++) so you can have function variables with this name without breaking the rule.
  • paercebal
    paercebal almost 13 years
    I found a similar text in n3092.pdf (the draft of C++0x standard) at section: "17.6.3.3.2 Global names"
  • paercebal
    paercebal over 12 years
    @Meysam : __WRONG_AGAIN__ contains two consecutive underscores (two at the beginning, and two at the end), so this is wrong according to the standard.
  • Martin York
    Martin York over 11 years
    @ReubenMorais: No. Read the Posix documentation.
  • Maxim Egorushkin
    Maxim Egorushkin about 11 years
    GNU getopt_long() is an offender of all rules: it defines macros no_argument, required_argument and optional_argument.
  • Martin York
    Martin York about 11 years
    @MaximYegorushkin: No rules broken. These identifiers are reserved for the implementation. getopt_long() is part of the GNU implementation of compilers and standard libraries.
  • Jonathan Wakely
    Jonathan Wakely over 10 years
    @LokiAstari, "The C++ standard is defined in terms of the C standard. Basically it says the C++ is C with these differences and additions." Nonsense! C++ only references the C standard in [basic.fundamental] and the library. If what you say is true, where does C++ say that _Bool and _Imaginary don't exist in C++? The C++ language are defined explicitly, not in terms of "edits" to C, otherwise the standard could be much shorter!
  • Martin York
    Martin York over 10 years
    @JonathanWakely: I was referring to the second paragraph in the standard: <quote>C++ is a general purpose programming language based on the C programming language as described in ISO/IEC 9899:1999 Programming languages — C (hereinafter referred to as the C standard). In addition to the facilities provided by C, C++ provides additional data types, classes, templates, exceptions, namespaces, operator overloading, function name overloading, references, free store management operators, and additional library facilities.</quote>
  • Martin York
    Martin York over 10 years
    If you interpret me above statement to mean something different, I apologies for being in-exact.
  • Jonathan Wakely
    Jonathan Wakely over 10 years
    @LokiAstari, that's a very general statement describing the scope of the language, it doesn't mean everything in C is imported into C++. The C++ language (not library) is precisely defined by its own standard, not by reference to another, except for one reference in [basic.fundamental].
  • Martin York
    Martin York over 10 years
    @JonathanWakely: <quote>In addition to the facilities provided by C, C++ provides additional ....</quote>. But you also have to take the comment in the context of the discussion as a whole. We are talking about "Reserved Names" or more particularly "underscores". Thus what I was trying to convey is that reserved names in C are also reserved in C++. litb disagrees with that interpretation and I know he reads the standard very carefully. But this is a conversation resolved over a year ago.
  • paercebal
    paercebal over 10 years
    @BЈовић : WRONG__WRONG contains two consecutive underscores (two in the middle), so this is wrong according to the standard
  • a.lasram
    a.lasram over 10 years
    In C++ I only see [lex.name] and for global names [global.names]. Can you explain how the fact that C++ is based on the C standard and C99 is a normative reference make C99 rules apply to C++. thanks
  • Martin York
    Martin York over 10 years
    see [intro.refs] from the standard it describes what that means. See here to get a copy
  • ruakh
    ruakh about 10 years
    @LokiAstari: I think your statement is backward. One of the facilities of C is that you can use the identifiers it doesn't reserve; so if we're going to consider it literally relevant that C++ includes "the facilities provided by C", then the identifiers reserved by C++ would actually have to be (at most) a subset of those reserved by C, not a superset. (But in fact, you and I both know that C++ does reserve some identifiers that C does not, so apparently the "facilities provided by C" statement is not literally relevant.)
  • Martin York
    Martin York about 10 years
    @ruakh: I provide above the quote from the C standard 7.1.3 Reserved identifiers. Please re-read.
  • ruakh
    ruakh about 10 years
    @LokiAstari: The problem is -- what would the question be? "Does C++ leave everything undefined that C leaves undefined?" is too tendentious (I assume you can't be going that far), whereas "Are all identifiers reserved in C, also reserved in C++?" would be closed as a dupe of this one. Should I just quote the section of the C++ spec that you quote, and ask what its normative consequences are?
  • hyde
    hyde over 9 years
    Interestingly, this seems to be the only answer which has direct, concise answer to the question.
  • Remember Monica
    Remember Monica over 9 years
    It might be useful to know that most of the POSIX reserved symbols are only reserved when including the corresponding include file, i..e "int stringptr" is "legal" until you include <string.h>.
  • Andy
    Andy almost 9 years
    @LokiAstari I understand that these kind of standards are necessary for C++. But for instance in Java, there are only a few reserved field names (e.g. serialVersionUID), and certainly no standards like variables ending in _t are reserved, because the language was designed such that everything is in a namespace. Are you saying that any language that can be compiled to machine code on multiple platforms would need to have these reserved variable name standards?
  • sbi
    sbi almost 9 years
    @hyde: Actually, it isn't, since it's skipping the rule to not to have any identifiers with a leading underscore in the global namespace. See Roger's answer. I'd be very wary of citations of MS VC docs as an authority on the C++ standard.
  • sbi
    sbi almost 9 years
    "Yes, underscores may be used anywhere in an identifier." This is wrong for global identifiers. See Roger's answer.
  • hyde
    hyde almost 9 years
    @sbi I was referring to "you can use a single underscore as a member variable prefix, as long as it's followed by a lower-case letter" in this answer, which answers the question on the question text directly and concisely, without being drowned in a wall of text.
  • sbi
    sbi almost 9 years
    First, I still consider the lack of any hint that the same rule does not apply to the global namespace a failure. What's worse, though, is that adjacent underscores are forbidden not only at the beginning of, but anywhere in, an identifier. So this answer isn't merely omitting a fact, but actually makes at least one actively wrong claim. As I said, referring to the MSVC docs is something I wouldn't do unless the question is solely about VC.
  • supercat
    supercat almost 8 years
    I wonder if there would be any problem specifying that a particular prefix was reserved for macros defined by future language versions, with a proviso that implementations must either process them in accordance with a C standard or leave them undefined. That would make it possible for code using certain new features to work on old compilers by defining macros to emulate them. For example, if __CPP_EITHER(x,y) took two expressions or statements and allowed a compiler to choose between them in arbitrary fashion (hopefully depending upon which could be compiled more efficiently), then...
  • supercat
    supercat almost 8 years
    ...code using that directive could work on existing implementations by simply #ifndef __CPP_EITHER/#define __CPP_EITHER(x,y) x/#endif, but an implementation that understood the directive could use it to improve code generation in cases where it could tell y would be more efficient than x (in cases where it couldn't tell, it could simply use x).
  • hobbs
    hobbs almost 8 years
    @Sjoerd roughly, yes. It says that any implementation can define a new ctype function tofoo for any identifier foo beginning with a letter, including lerance. If that happens and it causes a clash with your own global, well, you were warned. The practical impact to you is small, but it gives POSIX and implementers breathing room to add stuff without endless quibbling.
  • FrankHB
    FrankHB almost 8 years
    The rules may be better updated to reflect the fact that reserved name rules are moved from library (Clause 17) to core language (Clause 2) in current C++ standard working draft.
  • supercat
    supercat over 7 years
    @sbi: The internal-double-underscore rule was designed to reserve such identifiers for type-mangled names, but I would think names with double underscores could have been accommodated by saying that any occurrences of __ generated by type-based mangling would be __x, and then saying that any occurrences of __ in the specified name would be replaced with __y before such mangling.
  • Ruslan
    Ruslan over 7 years
    putting your code in an unique namespace helps to avoid collision, too: but this is still not enough, since the identifier may collide with a keyword regardless of scope (e.g. __attribute__ for GCC).
  • PSkocik
    PSkocik over 7 years
    What about single underscore as a complete member variable name?
  • Swift - Friday Pie
    Swift - Friday Pie over 7 years
    @sbi There is irony in that VC complies to ISO C++ which reserves names with single underscore as well, renaming some of posix functions at same time, e.g. _dup() instead of dup()
  • Jason S
    Jason S over 6 years
    Why is there any problem of having two consecutive underscores in the middle according to the standard? User-defined literal suffixes apply to literal values like 1234567L or 4.0f; IIRC this refers to ohttp://en.cppreference.com/w/cpp/language/user_literal
  • paercebal
    paercebal over 6 years
    Why is there any problem of having two consecutive underscores in the middle according to the standard? Because the standard say those are reserved. This is not an advice on good or bad style. It's a decision from the standard. Why they decided this? I guess the first compilers already used such conventions informally before standardization.
  • Henri Menke
    Henri Menke about 6 years
    I was unable to find the [global.names] clause or something similar in the current draft of the standard (eel.is/c++draft). It seems to have been removed.
  • Admin
    Admin about 5 years
    It should be noted that compilers will Not check if any these reservation rules are violated, so if they are used in code it may work today but break( potentially in a subtle way) the next time some innocuous seeming upgrade or patch is applied.
  • CoffeeTableEspresso
    CoffeeTableEspresso almost 5 years
    @paercebal it was originally so that compilers would always have an easy way to mangle names. In modern times it's not as useful, but retained for backwards compatibility.
  • Max Barraclough
    Max Barraclough over 4 years
    If these rules are broken, does it cause undefined behaviour?
  • Martin York
    Martin York over 4 years
    @MaxBarraclough Yes. Which could mean nothing happens. See Section 5.10 Identifiers. Paragraph 3 In addition, some identifiers are reserved for use by C++ implementations and **shall not be used otherwise**; no diagnostic is required.
  • Martin York
    Martin York over 4 years
    @MaxBarraclough The important term here is Shall Not. If you look at C++ Section 3 Terms and definitions For the purposes of this document, the terms and definitions given in ISO/IEC 2382-1:1993, the terms, definitions, and symbols given in ISO 80000-2:2009, and the following apply. You can search for these terms here: iso.org/obp/ui => is required to be not .
  • Martin York
    Martin York over 4 years
    @MaxBarraclough Thus if you break this condition your code is non conforming. If we then read Section 4 General principles paragraph 2.3 If a program contains a violation of a rule for which no diagnostic is required, this document places no requirement on implementations with respect to that program.
  • Martin York
    Martin York over 4 years
    @MaxBarraclough And finally. Looking at Section 3.27 undefined behavior behavior for which this document imposes no requirements.
  • supercat
    supercat over 4 years
    @MartinYork: What are there requirements for a conforming C program? In every version of the Standard I've seen, violation of a constraint would mean a program isn't strictly conforming, but implementations are allowed to document extensions that waive constraints, and a program that runs on such an implementation would be conforming even though it violates a constraint.
  • supercat
    supercat over 4 years
    @CoffeeTableEspresso: I'm still puzzled as to why any implementation would require that no source-code names contain double underscores. Even if an existing implementation exported double-underscore names itself and forbade them in source code, such an implementation could add support for such names without breaking linker compatibility with any existing object files by e.g. specifying that any run of N underscores in a source-code name would be replaced by 2N+1 underscores in the linker name.
  • Martin York
    Martin York over 4 years
    @supercat I rarely use C so I don't know.
  • supercat
    supercat over 4 years
    @MartinYork: Does the C++ Standard define a concept of performance for programs, or merely implementations? I seem to recall the prologue states that any reference to things programs may or may not do is purely meant to be interpreted with regards to the requirements for implementations.
  • supercat
    supercat over 4 years
    @MartinYork: The distinction is important because implementations are allowed to extend the language so expand the range of programs they can process usefully, and such expansion can include programs that violate constraints. Violating a constraint doesn't make a program non-conforming (since there is no such concept), but instead means that implementations need not process the program meaningfully if they don't wish to do so.
  • Martin York
    Martin York over 4 years
    @supercat Why are you asking in the comments (this is not the correct place for this discussion). Seems like you should ask this as a question. People with knowledge will then try and answer.
  • BenW
    BenW almost 4 years
    @sbi According to the C and C++ standards, yes, semantically, global identifiers with leading underscores are reserved. They are syntactically valid identifiers though, and the compiler won't stop you from naming a function _Foo, though by doing so you're relying on nonstandard implementation details and thus risk having your code broken by future versions of the language/standard library implementation/OS.
  • sbi
    sbi almost 4 years
    @BenW: TTBOMK, the C++ standard simply says that global identifiers starting with an underscore are not allowed, without making any distinction between syntax and semantic. (Also any identifiers starting with an underscore followed by a capital letter, and an identifiers with two consecutive underscores.)
  • JohnFilleau
    JohnFilleau over 3 years
    I don't believe this is spelled out in the standard, but does an identifier with a "triple underscore" (___) always count as having a double underscore? I... believe it should? But empirical evidence on my end shows that some people may find a triple underscore to be acceptable.
  • Martin York
    Martin York over 3 years
    But the actual wording is: Each identifier that contains a double underscore __ If you have a triple it contains a double!