An identifier is an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and most Unicode characters (see below for details). A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode non-digit character). Identifiers are case-sensitive (lowercase and uppercase letters are distinct), and every character is significant.
Note: C++ grammar formally requires Unicode characters to be escaped with \u
or \U
, but due to translation phase 1, that is exactly how raw unicode characters from the source code are presented to the compiler. Also note that support of this feature may be limited, e.g. gcc.
An identifier can be used to name objects, references, functions, enumerators, types, class members, namespaces, templates, template specializations, parameter packs, goto labels, and other entities, with the following exceptions:
"Reserved" here means that the standard library headers #define or declare such identifiers for their internal needs, the compiler may predefine non-standard identifiers of that kind, and that name mangling algorithm may assume that some of these identifiers are not in use. If the programmer uses such identifiers, the behavior is undefined.
In addition, it's undefined behavior to #define or #undef names identical to keywords. If at least one standard library header is included, it's undefined behavior to #define or #undef identifiers identical to names declared in any standard library header.
An identifier that names a variable, a function, or an enumerator can be used as an expression. The expression consisting of just the identifier returns the entity named by the identifier. The value category of the expression is lvalue if the identifier names a function, a variable, or a data member, and prvalue otherwise (e.g. an enumerator is a prvalue expression).
Within the body of a non-static member function, each identifier that names a non-static member is implicitly transformed to a class member access expression this->member
.
Besides suitably declared identifiers, the following can be used in expressions in the same role:
operator+
or operator new
; operator bool
; operator "" _km
; MyTemplate<int>
; ~
followed by a class name, such as ~MyClass
; ~
followed by a decltype specifier, such as ~decltype(str)
. Together with identifiers they are known as unqualified id-expressions.
A qualified id-expression is an unqualified id-expression prepended by a scope resolution operator ::
, and optionally, a sequence of enumeration, (since C++11)class or namespace names or decltype
expressions (since C++11) separated by scope resolution operators. For example, the expression std::string::npos
is an expression that names the static member npos
in the class string
in namespace std
. The expression ::tolower
names the function tolower
in the global namespace. The expression ::std::cout
names the global variable cout
in namespace std
, which is a top-level namespace. The expression boost::signals2::connection
names the type connection
declared in namespace signals2
, which is declared in namespace boost
.
The keyword template
may appear in qualified identifiers as necessary to disambiguate dependent template names.
See qualified lookup for the details of the name lookup for qualified identifiers.
A name is the use of one of the following to refer to an entity or to a label:
operator+
, operator new
); operator bool
); operator "" _km
); MyTemplate<int>
). Every name that denotes an entity is introduced into the program by a declaration. Every name that denotes a label is introduced into the program either by a goto statement or by a labeled statement. A name used in more than one translation unit may refer to the same or different entities, depending on linkage.
When the compiler encounters an unknown name in a program, it associates it with the declaration that introduced the name by means of name lookup, except for the dependent names in template declarations and definitions (for those names, the compiler determines whether they name a type, a template, or some other entity, which may require explicit disambiguation).
The following Unicode character ranges are allowed in identifiers:
Code points | Description | Characters |
---|---|---|
U+00A8 | DIARESIS |
¨ |
U+00AA | FEMININE ORDINAL INDICATOR |
ª |
U+00AD | SOFT HYPHEN |
|
U+00AF | MACRON |
¯ |
U+00B2 - U+00B5 | SUPERSCRIPT TWO - MICRO SIGN |
²³´µ |
U+00B7 - U+00BA | MIDDLE DOT - MASCULINE ORDINAL INDICATOR |
·¸¹º |
U+00BC - U+00BE | VULGAR FRACTION ONE QUARTER - VULGAR FRACTION THREE QUARTERS |
¼½¾ |
U+00C0 - U+00D6 | LATIN CAPITAL LETTER A WITH GRAVE - LATIN CAPITAL LETTER O WITH DIAERESIS |
ÀÁÂ...ÔÕÖ |
U+00D8 - U+00F6 | LATIN CAPITAL LETTER O WITH STROKE - LATIN SMALL LETTER O WITH DIAERESIS |
ØÙÚ...ôõö |
U+00F8 - U+167F | LATIN SMALL LETTER O WITH STROKE - CANADIAN SYLLABICS BLACKFOOT W |
øùú...ᙽᙾᙿ |
U+1681 - U+180D | OGHAM LETTER BEITH - MONGOLIAN FREE VARIATION SELECTOR THREE |
ᚁᚂᚃ...᠋᠌᠍ |
U+180F - U+1FFF | SYRIAC LETTER BETH - GREEK DASIA |
᠏ܒܓ...´῾ |
U+200B - U+200D | ZERO WIDTH SPACE - ZERO WIDTH JOINER |
|
U+202A - U+202E | LEFT-TO-RIGHT EMBEDDING - RIGHT-TO-LEFT OVERRIDE | |
U+203F - U+2040 | UNDERTIE - CHARACTER TIE |
‿⁀ |
U+2054 | INVERTED UNDERTIE |
⁔ |
U+2060 - U+218F | WORD JOINER - TURNED DIGIT THREE |
...↉↊↋ |
U+2460 - U+24FF | CIRCLED DIGIT ONE - NEGATIVE CIRCLED DIGIT ZERO |
①②③...⓽⓾⓿ |
U+2776 - U+2793 | DINGBAT NEGATIVE CIRCLED DIGIT ONE - DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN |
❶❷❸...➑➒➓ |
U+2C00 - U+2DFF | GLAGOLITIC CAPITAL LETTER AZU - COMBINING CYRILLIC LETTER IOTIFIED BIG YUS |
ⰀⰁⰂ... |
U+2E80 - U+2FFF | CJK RADICAL REPEAT - IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID |
⺀⺁⺂...⿹⿺⿻ |
U+3004 - U+3007 | JAPANESE INDUSTRIAL STANDARD SYMBOL - IDEOGRAPHIC NUMBER ZERO |
〄々〆〇 |
U+3021 - U+302F | HANGZHOU NUMERAL ONE - HANGUL DOUBLE DOT TONE MARK |
〡〢〣... |
U+3031 - U+D7FF | VERTICAL KANA REPEAT MARK - HANGUL JONGSEONG PHIEUPH-THIEUTH |
... |
U+F900 - U+FD3D | CJK COMPATIBILITY IDEOGRAPH-F900 - ARABIC LIGATURE ALEF WITH FATHATAN ISOLATED FORM |
豈更車...ﴻﴼﴽ |
U+FD40 - U+FDCF | ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM - ARABIC LIGATURE NOON WITH JEEM WITH YEH FINAL FORM | |
U+FDF0 - U+FE44 | ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM - PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET |
...﹂﹃﹄ |
U+FE47 - U+FFFD | PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET - REPLACEMENT CHARACTER |
﹇﹈﹉...� |
U+10000 - U+1FFFD | LINEAR B SYLLABLE B008 A - CHEESE WEDGE (U+1F9C0) | |
U+20000 - U+2FFFD | <CJK Ideograph Extension B, First> - CJK COMPATIBILITY IDEOGRAPH-2FA1D (U+2FA1D) | |
U+30000 - U+3FFFD | ||
U+40000 - U+4FFFD | ||
U+50000 - U+5FFFD | ||
U+60000 - U+6FFFD | ||
U+70000 - U+7FFFD | ||
U+80000 - U+8FFFD | ||
U+90000 - U+9FFFD | ||
U+A0000 - U+AFFFD | ||
U+B0000 - U+BFFFD | ||
U+C0000 - U+CFFFD | ||
U+D0000 - U+DFFFD | ||
U+E0000 - U+EFFFD | LANGUAGE TAG (U+E0001) - VARIATION SELECTOR-256 (U+E01EF) |
The following Unicode character ranges are not allowed to begin an identifier:
Code points | Description | Characters |
---|---|---|
U+0300 - U+036F | COMBINING GRAVE ACCENT - COMBINING LATIN SMALL LETTER X | |
U+1DC0 - U+1DFF | COMBINING DOTTED GRAVE ACCENT - COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW | |
U+20D0 - U+20FF | COMBINING LEFT HARPOON ABOVE - COMBINING ASTERISK ABOVE | |
U+FE20 - U+FE2F | COMBINING LIGATURE LEFT HALF - COMBINING CYRILLIC TITLO RIGHT HALF |
C documentation for identifier |
© cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
http://en.cppreference.com/w/cpp/language/name