Why can't variable names start with numbers?

110,273

Solution 1

Because then a string of digits would be a valid identifier as well as a valid number.

int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";

Solution 2

Well think about this:

int 2d = 42;
double a = 2d;

What is a? 2.0? or 42?

Hint, if you don't get it, d after a number means the number before it is a double literal

Solution 3

It's a convention now, but it started out as a technical requirement.

In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:

10 V1=100
20 PRINT V1

and

10V1=100
20PRINTV1

Now suppose that numeral prefixes were allowed. How would you interpret this?

101V=100

as

10 1V = 100

or as

101 V = 100

or as

1 01V = 100

So, this was made illegal.

Solution 4

Because backtracking is avoided in lexical analysis while compiling. A variable like:

Apple;

the compiler will know it's a identifier right away when it meets letter 'A'.

However a variable like:

123apple;

compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.

Solution 5

Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.

Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.

This goes way back - before special notations to denote storage or numeric base.

Share:
110,273

Related videos on Youtube

Jeremiah
Author by

Jeremiah

I get stuff done. I am a software engineer for Microsoft, working on XAML tooling in Visual Studio. These are my words, not my employer's.

Updated on November 24, 2020

Comments

  • Jeremiah
    Jeremiah over 3 years

    I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"

    I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.

    Was that the right answer? Are there any more reasons?

    string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
    
    • Tim
      Tim over 15 years
      And why can't they have spaces in them?
    • Tim Frey
      Tim Frey over 15 years
      Re-tagged this with "c++" because this is a language limitation. It's quite possible that some languages will allow this (though I can't think of any offhand).
    • Ken Gentle
      Ken Gentle over 15 years
      This issue predates C++ by at least 20 years, if not back to the first macro assemblers.
    • Tim Frey
      Tim Frey over 15 years
      The OP mentioned C++ specifically, but I like the new set of tags better anyway.
    • Ingo
      Ingo over 10 years
      Well, in FORTH, you can do it. AFAIK, there is a word called 0 that pushes 0 onto the stack. another one is 0= that checks whether 0 is on the stack.
    • david.pfx
      david.pfx about 10 years
      Why is this question so popular and the answers so wrong? Many languages do allow variables to start with numbers. C++ doesn't but it's just a convenient limitation that avoids certain ambiguities. Sometimes SO amazes me in all the wrong ways.
    • Boon
      Boon almost 9 years
      If this question was asked today on SO, it will be termed opinion-based and close out. Thanks for asking this.
    • jrh
      jrh over 6 years
      @david.pfx Personally I expect that pretty much every single language limitation has a "why" question being asked somewhere, IMO that's a good thing, it means programmers are thinking about what they're doing and want to learn.
    • jrh
      jrh over 6 years
      @Boon Well... i'ts still open. IMO the POB close reason would be incorrect, because somebody, at some point in time needed to implement this restriction, and there was a reason for it (even if it was just "I hate numbers" or "I wanted to leave early on Friday"), so that one person's answer would be the absolute truth. Hypothetically if that person showed up to this question, or somebody happened to read their book / paper / blog / magazine article, the true answer would be found.
    • jrh
      jrh over 6 years
      Also, related post on SE.SE
    • david.pfx
      david.pfx over 6 years
      @jrh: No, the question is OK and it could have a good answer (which I could even write, but won't). The amazing thing is how many answers there are and how wrong most of them are (including the accepted answer).
    • phuclv
      phuclv over 5 years
      @OutlawProgrammer one example is batch: this is a %valid variable name%. %2 Be Or Not 2 Be % is also valid. All the whitespaces are significant
    • phuclv
      phuclv over 5 years
      @ChristianFritz why do you remove the c++ tag? This isn't language agnostic since many languages do allow variables to start with a number, like shell scripts $1
    • phuclv
      phuclv over 5 years
    • O-9
      O-9 over 3 years
      It is techically possible in every language, but makes lexical analysis more complex. See en.wikipedia.org/wiki/Lexical_analysis
  • Pyrolistical
    Pyrolistical over 15 years
    Well, what if they said variables cannot be only numbers. Then what?
  • Ken Gentle
    Ken Gentle over 15 years
    This is actually a [relatively] late coming notation ("d" for "double"), C89 standard IIRC. Leading numerics in identifiers aren't possible if this construct is in the language, but that is not the reason numerics can't start an identifier.
  • Yaser Har
    Yaser Har over 15 years
    It'd take me longer to come up with a regular expression for the lexer to pick up identifiers using that rule, if it's even possible, so I can see why no language has ever been implemented that way, in addition to the reasons given in other answers.
  • Ferruccio
    Ferruccio over 15 years
    you can make the rules as complex as you want, but you might regret it when you try to implement the compiler. ;-)
  • Tim
    Tim over 15 years
    note - I am not advocating it - just saying that that reason is way down on the list and most likely it is all just due to convention.
  • CB Bailey
    CB Bailey over 15 years
    d isn't a valid floating literal suffix in C++. Floating literals are doubles by default, you can use f or l if you need a float or a long double literal.
  • Pyrolistical
    Pyrolistical over 15 years
    It is for Java, and while the original question was for C++, it also applies to many other languages, like Java. But I agree. This isn't the original reason why identifiers can't start with numbers.
  • paxdiablo
    paxdiablo over 15 years
    I particularly like the ability to change numbers - "int 1 = 2; int a = 1 + 1;" would set a to 4. :-)
  • Huntrods
    Huntrods over 15 years
    If people are going to be silly, then "L" looks like "1" - as in l234 (that's L234) - looks like a number but is legal. If you want to write obtuse code like "17 = 497" then using "L" makes it possible. But why? -R
  • kemiller2002
    kemiller2002 over 15 years
    But that is not the point I'm making. It's analogy as to why there can't be numbers at the start of variable names, and the simplest answer is, because rules of the language don't allow it.
  • Steve Jessop
    Steve Jessop over 15 years
    Sure, but I don't think the questioner is an imbecile. He's probably worked out that far already by himself. The question IMO is "why don't the rules of the language allow it?". He wants to bridge the gap between knowing the rules and understanding them.
  • kemiller2002
    kemiller2002 over 15 years
    Yeah, upon reflecting on this, I realized where you were going. You have a point. I guess I was a applying Occam's razor a little to freely and assumed there is no real answer to why except that variables don't start with numbers, because there not numbers.
  • Steve Jessop
    Steve Jessop over 15 years
    I'm not saying you're wrong, mind, occasionally the decisions of the C++ standards bodies do surpass mortal understanding, and you end up with "because they had to decide something and they decided this". But there is at least a question there to be asked :-)
  • David Thornley
    David Thornley about 15 years
    It isn't that difficult. It makes the lexical phase more difficult, that's all. Of course, back when I took compilers, I was told that lexical scanning could take over a quarter of the total compilation time.
  • Jason Baker
    Jason Baker almost 15 years
    The thing is that Forth doesn't really have a very sophisticated parser. Really, all it cares about is if an identifier is between two sets of whitespace.
  • Jason Baker
    Jason Baker almost 15 years
    This answer is actually on the right track. The real problem lies in performance. Backtracking can make well-behaved regular expressions painfully slow.
  • Jason Baker
    Jason Baker almost 15 years
    This restriction holds in languages where that kind of syntax isn't allowed though.
  • Jason Baker
    Jason Baker almost 15 years
    Actually, there are several languages that allow you to have characters marking identifiers. They're called "sigils" and you have them in Perl and PHP.
  • Deva
    Deva almost 15 years
    Except you still aren't allowed to begin a variable name in PHP with a number - the language rules forbid it. :-) But you can in Qompose for exactly the same reason.
  • eaolson
    eaolson over 14 years
    If it had to be numbers+alpha, then you could still do String 0x123 = "Hello World". Unless you state that variable names are "numbers+alpha that don't parse to a valid numeric designation", and that's just silly.
  • Admin
    Admin about 13 years
    The lexing process is rarely the bottleneck. Sure, it makes the regex for identifier tokens more complex, but they can still be super-fast DFAs. The runtime of those is peanuts compared to most other tasks compilers have to accomplish.
  • mr-euro
    mr-euro about 13 years
    LOL - "The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P."
  • Brian
    Brian over 12 years
    Some languages do support assigning on top of numbers. Those languages will allow code like assigning 3 to be 4.
  • comingstorm
    comingstorm almost 12 years
    Never mind the compiler: the people using the language need to be able to readily (at a glance) distinguish variable names from numbers. If the first character didn't tell you -- instead, if you needed to search through the rest of the word to tell if there was a non-numeric alpha somewhere in there -- the code would be harder to read.
  • supercat
    supercat over 11 years
    @eaolson: I've worked with an assembler which applied that rule to hex numbers which started with A-F and ended with h. Tripped me up the first time I tried to define a label to point to the music data for Bach's Two Part Invention #13 (logical name? Bach).
  • supercat
    supercat over 11 years
    Minor nit: line numbers had to be in columns 1-6, and executable code following column 8. On the other hand DO 10 I=1,50 could be ambiguously parsed as DO1 0I=1,50 [incidentally, if one uses a period instead of a comma, the statement becomes an assignment to a floating-point variable named DO10I.
  • supercat
    supercat over 11 years
    How would one design a regular expression to allow a variable named ifq or doublez but not if or double? The fundamental problem with allowing identifiers to start with digits would be that there are existing forms of hex literals and floating-point numbers which consist entirely of alphanumeric characters (languages would use something like $1234 or h'1234 instead of 0x1234, and require numbers like 1E23 to include a period, could avoid that issue). Note that attempts to regex-parsing C can already get tripped up by things like 0x12E+5.
  • supercat
    supercat almost 11 years
    Even if one required that identifiers contain at least one non-digit character, one would also either have to require that numeric formats that contain letters must also contain a non-alphanumeric character [e.g. require 0x1234 to be written as $1234 and 1E6 to be written as 1.E6 or 1.0E6] or else have an odd combination of legal and illegal identifier names.
  • david.pfx
    david.pfx about 10 years
    This is wrong. The question was about variables starting with numbers, not consisting entirely of numbers.
  • nehem
    nehem over 9 years
    To answer by remembering my compiler designs class, This answer goes straight right ! Kudos
  • Charles Clayton
    Charles Clayton about 9 years
    Interesting explanation! That makes sense for older languages, still makes me wonder why we've still continued the design choice for languages like Python or JavaScript or R.
  • munificent
    munificent over 7 years
    "Unless you state that variable names are 'numbers+alpha that don't parse to a valid numeric designation', and that's just silly." But languages do exactly that for keywords: A variable name is a sequence of letters that don't parse to a valid reserved word.
  • Evan Cox
    Evan Cox over 6 years
    @munificent True, but the list of reserved words is finite, whereas the list of valid numeric designators is infinite, or nearly so.
  • david.pfx
    david.pfx over 6 years
    This is the accepted answer and it's dead wrong. I write compilers, and it's mind-numbingly easy to allow an identifier to be a string of characters containing at least one letter, regardless of what it starts with.
  • Brian Chandler
    Brian Chandler over 5 years
    I definitely remember this with BASIC and feel this is probably the most valid practical reason of the practice. Technically though, I vaguely remember that it may actually go back to early assembly language. I'm unsure what assembler though, and I very well could be wrong.