What are "connecting characters" in Java identifiers?

67,476

Solution 1

Here is a list of connecting characters. These are characters used to connect words.

http://www.fileformat.info/info/unicode/category/Pc/list.htm

U+005F _ LOW LINE
U+203F ‿ UNDERTIE
U+2040 ⁀ CHARACTER TIE
U+2054 ⁔ INVERTED UNDERTIE
U+FE33 ︳ PRESENTATION FORM FOR VERTICAL LOW LINE
U+FE34 ︴ PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
U+FE4D ﹍ DASHED LOW LINE
U+FE4E ﹎ CENTRELINE LOW LINE
U+FE4F ﹏ WAVY LOW LINE
U+FF3F _ FULLWIDTH LOW LINE

This compiles on Java 7.

int _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _;

An example. In this case tp is the name of a column and the value for a given row.

Column<Double> ︴tp︴ = table.getColumn("tp", double.class);

double tp = row.getDouble(︴tp︴);

The following

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
    if (Character.isJavaIdentifierStart(i) && !Character.isAlphabetic(i))
        System.out.print((char) i + " ");
}

prints

$ _ ¢ £ ¤ ¥ ؋ ৲ ৳ ৻ ૱ ௹ ฿ ៛ ‿ ⁀ ⁔ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱ ₲ ₳ ₴ ₵ ₶ ₷ ₸ ₹ ꠸ ﷼ ︳ ︴ ﹍ ﹎ ﹏ ﹩ $ _ ¢ £ ¥ ₩

Solution 2

iterate through the whole 65k chars and ask Character.isJavaIdentifierStart(c). The answer is : "undertie" decimal 8255

Solution 3

The definitive specification of a legal Java identifier can be found in the Java Language Specification.

Solution 4

Here is a List of connector Characters in Unicode. You will not find them on your keyboard.

U+005F LOW LINE _
U+203F UNDERTIE ‿
U+2040 CHARACTER TIE ⁀
U+2054 INVERTED UNDERTIE ⁔
U+FE33 PRESENTATION FORM FOR VERTICAL LOW LINE ︳
U+FE34 PRESENTATION FORM FOR VERTICAL WAVY LOW LINE ︴
U+FE4D DASHED LOW LINE ﹍
U+FE4E CENTRELINE LOW LINE ﹎
U+FE4F WAVY LOW LINE ﹏
U+FF3F FULLWIDTH LOW LINE _

Solution 5

A connecting character is used to connect two characters.

In Java, a connecting character is the one for which Character.getType(int codePoint)/Character.getType(char ch) returns a value equal to Character.CONNECTOR_PUNCTUATION.

Note that in Java, the character information is based on Unicode standard which identifies connecting characters by assigning them the general category Pc, which is an alias for Connector_Punctuation.

The following code snippet,

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++) {
    if (Character.getType(i) == Character.CONNECTOR_PUNCTUATION
            && Character.isJavaIdentifierStart(i)) {
        System.out.println("character: " + String.valueOf(Character.toChars(i))
                + ", codepoint: " + i + ", hexcode: " + Integer.toHexString(i));
    }
}

prints the connecting characters that can be used to start an identifer on jdk1.6.0_45

character: _, codepoint: 95, hexcode: 5f
character: ‿, codepoint: 8255, hexcode: 203f
character: ⁀, codepoint: 8256, hexcode: 2040
character: ⁔, codepoint: 8276, hexcode: 2054
character: ・, codepoint: 12539, hexcode: 30fb
character: ︳, codepoint: 65075, hexcode: fe33
character: ︴, codepoint: 65076, hexcode: fe34
character: ﹍, codepoint: 65101, hexcode: fe4d
character: ﹎, codepoint: 65102, hexcode: fe4e
character: ﹏, codepoint: 65103, hexcode: fe4f
character: _, codepoint: 65343, hexcode: ff3f
character: ・, codepoint: 65381, hexcode: ff65

The following compiles on jdk1.6.0_45,

int _, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, _, ・ = 0;

Apparently, the above declaration fails to compile on jdk1.7.0_80 & jdk1.8.0_51 for the following two connecting characters (backward compatibility...oops!!!),

character: ・, codepoint: 12539, hexcode: 30fb
character: ・, codepoint: 65381, hexcode: ff65

Anyway, details aside, the exam focuses only on the Basic Latin character set.

Also, for legal identifers in Java, the spec is provided here. Use the Character class APIs to get more details.

Share:
67,476

Related videos on Youtube

LuckyLuke
Author by

LuckyLuke

Updated on April 30, 2022

Comments

  • LuckyLuke
    LuckyLuke about 2 years

    I am reading for SCJP and I have a question regarding this line:

    Identifiers must start with a letter, a currency character ($), or a connecting character such as the underscore ( _ ). Identifiers cannot start with a number!

    It states that a valid identifier name can start with a connecting character such as underscore. I thought underscores were the only valid option? What other connecting characters are there?

    • 8bitjunkie
      8bitjunkie over 10 years
      Regarding "a currency character": UK visitors to this question may be suprised and interested to know that, consistent with being able to start with "a" currency character, Java identifiers can, legally, begin with the pound symbol (£).
    • aioobe
      aioobe over 10 years
      Note that since Java 8, _ is a "deprecated" identifier. Specifically, the compiler emits the following warning: (use of '_' as an identifier might not be supported in releases after Java SE 8).
    • Boann
      Boann about 10 years
      @aioobe Yup. Brian Goetz says they are "reclaiming" _ for use in future language features. Identifiers that start with an underscore are still okay, but a single underscore is an error if used as a lambda parameter name, and a warning everywhere else.
    • Ciro Santilli OurBigBook.com
      Ciro Santilli OurBigBook.com about 9 years
      For the bytecode, anything by sequence that does not contain . ; [ / < > : goes: stackoverflow.com/questions/26791204/… docs.oracle.com/javase/specs/jvms/se7/html/… Everything else is a Java-only restriction.
    • user31389
      user31389 over 8 years
      @Boann The funny thing is they are disallowing its use in lambdas, but it will probably come back as an "ignore this argument" identifier, which will be used e.g. in lambdas. I just tried to use it like this: _, _ -> doSomething();.
  • Tomasz Nurkiewicz
    Tomasz Nurkiewicz almost 12 years
    I couldn't resist (in Scala): (1 to 65535).map(_.toChar).filter(Character.isJavaIdentifierStart)‌​.size - yields 48529 characters...
  • Markus Mikkolainen
    Markus Mikkolainen almost 12 years
    there seems to be a few characters near 65k and 12k and 8.5k etc.
  • Markus Mikkolainen
    Markus Mikkolainen almost 12 years
    doesnt yield if you say "!isLetter" and "!isDigit"
  • Markus Mikkolainen
    Markus Mikkolainen almost 12 years
    2546+2547 atleast "box drawing..."
  • Martijn Courteaux
    Martijn Courteaux almost 12 years
    Total count = 90648, but I'm going to Character.MAX_CODE_POINT, which is probably more than 2<<16.
  • Marko Topolnik
    Marko Topolnik almost 12 years
    I am looking forward to the day when I inherit some code that uses these identifiers!
  • Vishy
    Vishy almost 12 years
    BTW You can use any of the currency symbols as well. int ৲, ¤, ₪₪₪₪; :D
  • user
    user almost 12 years
    I'm not sure that actually fully answers the (implied) question of which characters may start a Java identifier. Following links we end up at Character.isJavaIdentifierStart() which states A character may start a Java identifier if and only if one of the following conditions is true: ... ch is a currency symbol (such as "$"); ch is a connecting punctuation character (such as "_").
  • Greg Hewgill
    Greg Hewgill almost 12 years
    It seems that the specification leaves the final list of acceptable characters up to the implementation, so it could potentially be different for everybody.
  • Vishy
    Vishy almost 12 years
    @GrahamBorland How about if( ⁀ ‿ ⁀ == ⁀ ⁔ ⁀) or if ($ == $) or if (¢ + ¢== ₡) or if (B + ︳!= ฿)
  • Random832
    Random832 almost 12 years
    @GregHewgill That'd be foolish, considering how tightly specified everything else is. I think that these are actual Unicode character classes, which are defined (where else?) in the Unicode standard. isJavaIdentifierStart() mentions getType(), and currency symbol and connector punctuation are both also types that can be returned by that function, so the lists might be given there. "General category" is in fact a specific term in the Unicode standard. So the valid values would be L [all], Nl, Sc, Pc.
  • Vishy
    Vishy almost 12 years
    @FredOverflow It is the Drachma currency sign. No country uses it, but if the worst happen in Europe it may come back. en.wikipedia.org/wiki/Greek_drachma
  • James Moore
    James Moore over 11 years
    @GregHewgill is correct. The specification is short and clear, and it's defined by Character.isJavaIdentifierStart() and Character.isJavaIdentifierPart(). The End. The key thing to remember is that Unicode is evolving; don't fall into the trap of thinking of character sets as finished (Latin is a terrible example; ignore it). Characters are created all the time. Ask your Japanese friends. Expect legal java identifiers to change over time - and that's intentional. The point is to let people write code in human languages. That leads to a hard requirement for allowing change.
  • James Moore
    James Moore over 11 years
    Scalaz uses stuff like KleisliArrow[M[]: Monad]: Arrow[({type λ[α, β]=Kleisli[M, α, β]})#λ] = new Arrow[({type λ[α, β]=Kleisli[M, α, β]})#λ] and ☆(f() η) all the time.
  • bdonlan
    bdonlan almost 11 years
    I don't know what keyboard layout you're using, but I can certainly type _ (U+005F) easily enough :)
  • Aleksandr Dubinsky
    Aleksandr Dubinsky almost 8 years
    Try checking isJavaIdentifierPart instead of isJavaIdentifierStart. It's much more fun!
  • Todd O'Bryan
    Todd O'Bryan about 6 years
    It includes \u007f, the DEL character. :-(