Why can't variable names start with numbers?
Solution 1
Because then a string of digits would be a valid identifier as well as a valid number.
int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
Solution 2
Well think about this:
int 2d = 42;
double a = 2d;
What is a? 2.0? or 42?
Hint, if you don't get it, d after a number means the number before it is a double literal
Solution 3
It's a convention now, but it started out as a technical requirement.
In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:
10 V1=100
20 PRINT V1
and
10V1=100
20PRINTV1
Now suppose that numeral prefixes were allowed. How would you interpret this?
101V=100
as
10 1V = 100
or as
101 V = 100
or as
1 01V = 100
So, this was made illegal.
Solution 4
Because backtracking is avoided in lexical analysis while compiling. A variable like:
Apple;
the compiler will know it's a identifier right away when it meets letter 'A'.
However a variable like:
123apple;
compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.
Solution 5
Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.
Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.
This goes way back - before special notations to denote storage or numeric base.
Related videos on Youtube
Jeremiah
I get stuff done. I am a software engineer for Microsoft, working on XAML tooling in Visual Studio. These are my words, not my employer's.
Updated on November 24, 2020Comments
-
Jeremiah over 3 years
I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"
I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.
Was that the right answer? Are there any more reasons?
string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
-
Tim over 15 yearsAnd why can't they have spaces in them?
-
Tim Frey over 15 yearsRe-tagged this with "c++" because this is a language limitation. It's quite possible that some languages will allow this (though I can't think of any offhand).
-
Ken Gentle over 15 yearsThis issue predates C++ by at least 20 years, if not back to the first macro assemblers.
-
Tim Frey over 15 yearsThe OP mentioned C++ specifically, but I like the new set of tags better anyway.
-
Ingo over 10 yearsWell, in FORTH, you can do it. AFAIK, there is a word called
0
that pushes 0 onto the stack. another one is0=
that checks whether 0 is on the stack. -
david.pfx about 10 yearsWhy is this question so popular and the answers so wrong? Many languages do allow variables to start with numbers. C++ doesn't but it's just a convenient limitation that avoids certain ambiguities. Sometimes SO amazes me in all the wrong ways.
-
Boon almost 9 yearsIf this question was asked today on SO, it will be termed opinion-based and close out. Thanks for asking this.
-
jrh over 6 years@david.pfx Personally I expect that pretty much every single language limitation has a "why" question being asked somewhere, IMO that's a good thing, it means programmers are thinking about what they're doing and want to learn.
-
jrh over 6 years@Boon Well... i'ts still open. IMO the POB close reason would be incorrect, because somebody, at some point in time needed to implement this restriction, and there was a reason for it (even if it was just "I hate numbers" or "I wanted to leave early on Friday"), so that one person's answer would be the absolute truth. Hypothetically if that person showed up to this question, or somebody happened to read their book / paper / blog / magazine article, the true answer would be found.
-
jrh over 6 yearsAlso, related post on SE.SE
-
david.pfx over 6 years@jrh: No, the question is OK and it could have a good answer (which I could even write, but won't). The amazing thing is how many answers there are and how wrong most of them are (including the accepted answer).
-
phuclv over 5 years@OutlawProgrammer one example is batch: this is a
%valid variable name%
.%2 Be Or Not 2 Be %
is also valid. All the whitespaces are significant -
phuclv over 5 years@ChristianFritz why do you remove the c++ tag? This isn't language agnostic since many languages do allow variables to start with a number, like shell scripts
$1
-
phuclv over 5 years@Tim not in C++ but many other languages do allow that Why can't variable names have spaces in them?, Is there any language that allows spaces in its variable names, Why should identifiers not begin with a number?
-
O-9 over 3 yearsIt is techically possible in every language, but makes lexical analysis more complex. See en.wikipedia.org/wiki/Lexical_analysis
-
-
Pyrolistical over 15 yearsWell, what if they said variables cannot be only numbers. Then what?
-
Ken Gentle over 15 yearsThis is actually a [relatively] late coming notation ("d" for "double"), C89 standard IIRC. Leading numerics in identifiers aren't possible if this construct is in the language, but that is not the reason numerics can't start an identifier.
-
Yaser Har over 15 yearsIt'd take me longer to come up with a regular expression for the lexer to pick up identifiers using that rule, if it's even possible, so I can see why no language has ever been implemented that way, in addition to the reasons given in other answers.
-
Ferruccio over 15 yearsyou can make the rules as complex as you want, but you might regret it when you try to implement the compiler. ;-)
-
Tim over 15 yearsnote - I am not advocating it - just saying that that reason is way down on the list and most likely it is all just due to convention.
-
CB Bailey over 15 years
d
isn't a valid floating literal suffix in C++. Floating literals are doubles by default, you can usef
orl
if you need a float or a long double literal. -
Pyrolistical over 15 yearsIt is for Java, and while the original question was for C++, it also applies to many other languages, like Java. But I agree. This isn't the original reason why identifiers can't start with numbers.
-
paxdiablo over 15 yearsI particularly like the ability to change numbers - "int 1 = 2; int a = 1 + 1;" would set a to 4. :-)
-
Huntrods over 15 yearsIf people are going to be silly, then "L" looks like "1" - as in l234 (that's L234) - looks like a number but is legal. If you want to write obtuse code like "17 = 497" then using "L" makes it possible. But why? -R
-
kemiller2002 over 15 yearsBut that is not the point I'm making. It's analogy as to why there can't be numbers at the start of variable names, and the simplest answer is, because rules of the language don't allow it.
-
Steve Jessop over 15 yearsSure, but I don't think the questioner is an imbecile. He's probably worked out that far already by himself. The question IMO is "why don't the rules of the language allow it?". He wants to bridge the gap between knowing the rules and understanding them.
-
kemiller2002 over 15 yearsYeah, upon reflecting on this, I realized where you were going. You have a point. I guess I was a applying Occam's razor a little to freely and assumed there is no real answer to why except that variables don't start with numbers, because there not numbers.
-
Steve Jessop over 15 yearsI'm not saying you're wrong, mind, occasionally the decisions of the C++ standards bodies do surpass mortal understanding, and you end up with "because they had to decide something and they decided this". But there is at least a question there to be asked :-)
-
David Thornley about 15 yearsIt isn't that difficult. It makes the lexical phase more difficult, that's all. Of course, back when I took compilers, I was told that lexical scanning could take over a quarter of the total compilation time.
-
Jason Baker almost 15 yearsThe thing is that Forth doesn't really have a very sophisticated parser. Really, all it cares about is if an identifier is between two sets of whitespace.
-
Jason Baker almost 15 yearsThis answer is actually on the right track. The real problem lies in performance. Backtracking can make well-behaved regular expressions painfully slow.
-
Jason Baker almost 15 yearsThis restriction holds in languages where that kind of syntax isn't allowed though.
-
Jason Baker almost 15 yearsActually, there are several languages that allow you to have characters marking identifiers. They're called "sigils" and you have them in Perl and PHP.
-
Deva almost 15 yearsExcept you still aren't allowed to begin a variable name in PHP with a number - the language rules forbid it. :-) But you can in Qompose for exactly the same reason.
-
eaolson over 14 yearsIf it had to be numbers+alpha, then you could still do String 0x123 = "Hello World". Unless you state that variable names are "numbers+alpha that don't parse to a valid numeric designation", and that's just silly.
-
Admin about 13 yearsThe lexing process is rarely the bottleneck. Sure, it makes the regex for identifier tokens more complex, but they can still be super-fast DFAs. The runtime of those is peanuts compared to most other tasks compilers have to accomplish.
-
mr-euro about 13 yearsLOL - "The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P."
-
Brian over 12 yearsSome languages do support assigning on top of numbers. Those languages will allow code like assigning 3 to be 4.
-
comingstorm almost 12 yearsNever mind the compiler: the people using the language need to be able to readily (at a glance) distinguish variable names from numbers. If the first character didn't tell you -- instead, if you needed to search through the rest of the word to tell if there was a non-numeric alpha somewhere in there -- the code would be harder to read.
-
supercat over 11 years@eaolson: I've worked with an assembler which applied that rule to hex numbers which started with
A
-F
and ended withh
. Tripped me up the first time I tried to define a label to point to the music data for Bach's Two Part Invention #13 (logical name?Bach
). -
supercat over 11 yearsMinor nit: line numbers had to be in columns 1-6, and executable code following column 8. On the other hand
DO 10 I=1,50
could be ambiguously parsed asDO1 0I=1,50
[incidentally, if one uses a period instead of a comma, the statement becomes an assignment to a floating-point variable namedDO10I
. -
supercat over 11 yearsHow would one design a regular expression to allow a variable named
ifq
ordoublez
but notif
ordouble
? The fundamental problem with allowing identifiers to start with digits would be that there are existing forms of hex literals and floating-point numbers which consist entirely of alphanumeric characters (languages would use something like $1234 or h'1234 instead of 0x1234, and require numbers like 1E23 to include a period, could avoid that issue). Note that attempts to regex-parsing C can already get tripped up by things like0x12E+5
. -
supercat almost 11 yearsEven if one required that identifiers contain at least one non-digit character, one would also either have to require that numeric formats that contain letters must also contain a non-alphanumeric character [e.g. require 0x1234 to be written as $1234 and 1E6 to be written as 1.E6 or 1.0E6] or else have an odd combination of legal and illegal identifier names.
-
david.pfx about 10 yearsThis is wrong. The question was about variables starting with numbers, not consisting entirely of numbers.
-
nehem over 9 yearsTo answer by remembering my compiler designs class, This answer goes straight right ! Kudos
-
Charles Clayton about 9 yearsInteresting explanation! That makes sense for older languages, still makes me wonder why we've still continued the design choice for languages like Python or JavaScript or R.
-
munificent over 7 years"Unless you state that variable names are 'numbers+alpha that don't parse to a valid numeric designation', and that's just silly." But languages do exactly that for keywords: A variable name is a sequence of letters that don't parse to a valid reserved word.
-
Evan Cox over 6 years@munificent True, but the list of reserved words is finite, whereas the list of valid numeric designators is infinite, or nearly so.
-
david.pfx over 6 yearsThis is the accepted answer and it's dead wrong. I write compilers, and it's mind-numbingly easy to allow an identifier to be a string of characters containing at least one letter, regardless of what it starts with.
-
Brian Chandler over 5 yearsI definitely remember this with BASIC and feel this is probably the most valid practical reason of the practice. Technically though, I vaguely remember that it may actually go back to early assembly language. I'm unsure what assembler though, and I very well could be wrong.