What are "connecting characters" in Java identifiers?
Solution 1
Here is a list of connecting characters. These are characters used to connect words.
http://www.fileformat.info/info/unicode/category/Pc/list.htm
U+005F _ LOW LINE
U+203F ‿ UNDERTIE
U+2040 ⁀ CHARACTER TIE
U+2054 ⁔ INVERTED UNDERTIE
U+FE33 ︳ PRESENTATION FORM FOR VERTICAL LOW LINE
U+FE34 ︴ PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
U+FE4D ﹍ DASHED LOW LINE
U+FE4E ﹎ CENTRELINE LOW LINE
U+FE4F ﹏ WAVY LOW LINE
U+FF3F _ FULLWIDTH LOW LINE
This compiles on Java 7.
int _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _;
An example. In this case tp
is the name of a column and the value for a given row.
Column<Double> ︴tp︴ = table.getColumn("tp", double.class);
double tp = row.getDouble(︴tp︴);
The following
for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
if (Character.isJavaIdentifierStart(i) && !Character.isAlphabetic(i))
System.out.print((char) i + " ");
}
prints
$ _ ¢ £ ¤ ¥ ؋ ৲ ৳ ৻ ૱ ௹ ฿ ៛ ‿ ⁀ ⁔ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱ ₲ ₳ ₴ ₵ ₶ ₷ ₸ ₹ ꠸ ﷼ ︳ ︴ ﹍ ﹎ ﹏ ﹩ $ _ ¢ £ ¥ ₩
Solution 2
iterate through the whole 65k chars and ask Character.isJavaIdentifierStart(c)
.
The answer is : "undertie" decimal 8255
Solution 3
The definitive specification of a legal Java identifier can be found in the Java Language Specification.
Solution 4
Here is a List of connector Characters in Unicode. You will not find them on your keyboard.
U+005F LOW LINE _
U+203F UNDERTIE ‿
U+2040 CHARACTER TIE ⁀
U+2054 INVERTED UNDERTIE ⁔
U+FE33 PRESENTATION FORM FOR VERTICAL LOW LINE ︳
U+FE34 PRESENTATION FORM FOR VERTICAL WAVY LOW LINE ︴
U+FE4D DASHED LOW LINE ﹍
U+FE4E CENTRELINE LOW LINE ﹎
U+FE4F WAVY LOW LINE ﹏
U+FF3F FULLWIDTH LOW LINE _
Solution 5
A connecting character is used to connect two characters.
In Java, a connecting character is the one for which Character.getType(int codePoint)/Character.getType(char ch) returns a value equal to Character.CONNECTOR_PUNCTUATION.
Note that in Java, the character information is based on Unicode standard which identifies connecting characters by assigning them the general category Pc, which is an alias for Connector_Punctuation.
The following code snippet,
for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++) {
if (Character.getType(i) == Character.CONNECTOR_PUNCTUATION
&& Character.isJavaIdentifierStart(i)) {
System.out.println("character: " + String.valueOf(Character.toChars(i))
+ ", codepoint: " + i + ", hexcode: " + Integer.toHexString(i));
}
}
prints the connecting characters that can be used to start an identifer on jdk1.6.0_45
character: _, codepoint: 95, hexcode: 5f
character: ‿, codepoint: 8255, hexcode: 203f
character: ⁀, codepoint: 8256, hexcode: 2040
character: ⁔, codepoint: 8276, hexcode: 2054
character: ・, codepoint: 12539, hexcode: 30fb
character: ︳, codepoint: 65075, hexcode: fe33
character: ︴, codepoint: 65076, hexcode: fe34
character: ﹍, codepoint: 65101, hexcode: fe4d
character: ﹎, codepoint: 65102, hexcode: fe4e
character: ﹏, codepoint: 65103, hexcode: fe4f
character: _, codepoint: 65343, hexcode: ff3f
character: ・, codepoint: 65381, hexcode: ff65
The following compiles on jdk1.6.0_45,
int _, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, _, ・ = 0;
Apparently, the above declaration fails to compile on jdk1.7.0_80 & jdk1.8.0_51 for the following two connecting characters (backward compatibility...oops!!!),
character: ・, codepoint: 12539, hexcode: 30fb
character: ・, codepoint: 65381, hexcode: ff65
Anyway, details aside, the exam focuses only on the Basic Latin character set.
Also, for legal identifers in Java, the spec is provided here. Use the Character class APIs to get more details.
Related videos on Youtube
LuckyLuke
Updated on April 30, 2022Comments
-
LuckyLuke about 2 years
I am reading for SCJP and I have a question regarding this line:
Identifiers must start with a letter, a currency character ($), or a connecting character such as the underscore ( _ ). Identifiers cannot start with a number!
It states that a valid identifier name can start with a connecting character such as underscore. I thought underscores were the only valid option? What other connecting characters are there?
-
8bitjunkie over 10 yearsRegarding "a currency character": UK visitors to this question may be suprised and interested to know that, consistent with being able to start with "a" currency character, Java identifiers can, legally, begin with the pound symbol (£).
-
aioobe over 10 yearsNote that since Java 8,
_
is a "deprecated" identifier. Specifically, the compiler emits the following warning: (use of '_' as an identifier might not be supported in releases after Java SE 8). -
Boann about 10 years@aioobe Yup. Brian Goetz says they are "reclaiming"
_
for use in future language features. Identifiers that start with an underscore are still okay, but a single underscore is an error if used as a lambda parameter name, and a warning everywhere else. -
Ciro Santilli OurBigBook.com about 9 yearsFor the bytecode, anything by sequence that does not contain
. ; [ / < > :
goes: stackoverflow.com/questions/26791204/… docs.oracle.com/javase/specs/jvms/se7/html/… Everything else is a Java-only restriction. -
user31389 over 8 years@Boann The funny thing is they are disallowing its use in lambdas, but it will probably come back as an "ignore this argument" identifier, which will be used e.g. in lambdas. I just tried to use it like this:
_, _ -> doSomething();
.
-
-
Tomasz Nurkiewicz almost 12 yearsI couldn't resist (in Scala):
(1 to 65535).map(_.toChar).filter(Character.isJavaIdentifierStart).size
- yields 48529 characters... -
Markus Mikkolainen almost 12 yearsthere seems to be a few characters near 65k and 12k and 8.5k etc.
-
Markus Mikkolainen almost 12 yearsdoesnt yield if you say "!isLetter" and "!isDigit"
-
Markus Mikkolainen almost 12 years2546+2547 atleast "box drawing..."
-
Martijn Courteaux almost 12 yearsTotal count = 90648, but I'm going to
Character.MAX_CODE_POINT
, which is probably more than2<<16
. -
Marko Topolnik almost 12 yearsI am looking forward to the day when I inherit some code that uses these identifiers!
-
Vishy almost 12 yearsBTW You can use any of the currency symbols as well.
int ৲, ¤, ₪₪₪₪;
:D -
user almost 12 yearsI'm not sure that actually fully answers the (implied) question of which characters may start a Java identifier. Following links we end up at Character.isJavaIdentifierStart() which states A character may start a Java identifier if and only if one of the following conditions is true: ... ch is a currency symbol (such as "$"); ch is a connecting punctuation character (such as "_").
-
Greg Hewgill almost 12 yearsIt seems that the specification leaves the final list of acceptable characters up to the implementation, so it could potentially be different for everybody.
-
Vishy almost 12 years@GrahamBorland How about
if( ⁀ ‿ ⁀ == ⁀ ⁔ ⁀)
orif ($ == $)
orif (¢ + ¢== ₡)
orif (B + ︳!= ฿)
-
Random832 almost 12 years@GregHewgill That'd be foolish, considering how tightly specified everything else is. I think that these are actual Unicode character classes, which are defined (where else?) in the Unicode standard. isJavaIdentifierStart() mentions getType(), and currency symbol and connector punctuation are both also types that can be returned by that function, so the lists might be given there. "General category" is in fact a specific term in the Unicode standard. So the valid values would be
L
[all],Nl
,Sc
,Pc
. -
Vishy almost 12 years@FredOverflow It is the Drachma currency sign. No country uses it, but if the worst happen in Europe it may come back. en.wikipedia.org/wiki/Greek_drachma
-
James Moore over 11 years@GregHewgill is correct. The specification is short and clear, and it's defined by Character.isJavaIdentifierStart() and Character.isJavaIdentifierPart(). The End. The key thing to remember is that Unicode is evolving; don't fall into the trap of thinking of character sets as finished (Latin is a terrible example; ignore it). Characters are created all the time. Ask your Japanese friends. Expect legal java identifiers to change over time - and that's intentional. The point is to let people write code in human languages. That leads to a hard requirement for allowing change.
-
James Moore over 11 yearsScalaz uses stuff like KleisliArrow[M[]: Monad]: Arrow[({type λ[α, β]=Kleisli[M, α, β]})#λ] = new Arrow[({type λ[α, β]=Kleisli[M, α, β]})#λ] and ☆(f() η) all the time.
-
bdonlan almost 11 yearsI don't know what keyboard layout you're using, but I can certainly type _ (U+005F) easily enough :)
-
Aleksandr Dubinsky almost 8 yearsTry checking
isJavaIdentifierPart
instead ofisJavaIdentifierStart
. It's much more fun! -
Todd O'Bryan about 6 yearsIt includes \u007f, the DEL character. :-(