A character type.
The char
type represents a single character. More specifically, since 'character' isn't a well-defined concept in Unicode, char
is a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'.
This documentation describes a number of methods and trait implementations on the char
type. For technical reasons, there is additional, separate documentation in the std::char
module as well.
char
is always four bytes in size. This is a different representation than a given character would have as part of a String
. For example:
let v = vec!['h', 'e', 'l', 'l', 'o']; // five elements times four bytes for each element assert_eq!(20, v.len() * std::mem::size_of::<char>()); let s = String::from("hello"); // five elements times one byte per element assert_eq!(5, s.len() * std::mem::size_of::<u8>());
As always, remember that a human intuition for 'character' may not map to Unicode's definitions. For example, emoji symbols such as '❤️' can be more than one Unicode code point; this ❤️ in particular is two:
let s = String::from("❤️"); // we get two chars out of a single ❤️ let mut iter = s.chars(); assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next());
This means it won't fit into a char
. Trying to create a literal with let heart = '❤️';
gives an error:
error: character literal may only contain one codepoint: '❤ let heart = '❤️'; ^~
Another implication of the 4-byte fixed size of a char
is that per-char
processing can end up using a lot more memory:
let s = String::from("love: ❤️"); let v: Vec<char> = s.chars().collect(); assert_eq!(12, s.len() * std::mem::size_of::<u8>()); assert_eq!(32, v.len() * std::mem::size_of::<char>());
impl char
[src]
fn is_digit(self, radix: u32) -> bool
Checks if a char
is a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radices are supported.
Compared to is_numeric()
, this function only recognizes the characters 0-9
, a-z
and A-Z
.
'Digit' is defined to be only the following characters:
0-9
a-z
A-Z
For a more comprehensive understanding of 'digit', see is_numeric()
.
Panics if given a radix larger than 36.
Basic usage:
assert!('1'.is_digit(10)); assert!('f'.is_digit(16)); assert!(!'f'.is_digit(10));
Passing a large radix, causing a panic:
use std::thread; let result = thread::spawn(|| { // this panics '1'.is_digit(37); }).join(); assert!(result.is_err());
fn to_digit(self, radix: u32) -> Option<u32>
Converts a char
to a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radices are supported.
'Digit' is defined to be only the following characters:
0-9
a-z
A-Z
Returns None
if the char
does not refer to a digit in the given radix.
Panics if given a radix larger than 36.
Basic usage:
assert_eq!('1'.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15));
Passing a non-digit results in failure:
assert_eq!('f'.to_digit(10), None); assert_eq!('z'.to_digit(16), None);
Passing a large radix, causing a panic:
use std::thread; let result = thread::spawn(|| { '1'.to_digit(37); }).join(); assert!(result.is_err());
fn escape_unicode(self) -> EscapeUnicode
Returns an iterator that yields the hexadecimal Unicode escape of a character as char
s.
This will escape characters with the Rust syntax of the form \u{NNNNNN}
where NNNNNN
is a hexadecimal representation.
As an iterator:
for c in '❤'.escape_unicode() { print!("{}", c); } println!();
Using println!
directly:
println!("{}", '❤'.escape_unicode());
Both are equivalent to:
println!("\\u{{2764}}");
Using to_string
:
assert_eq!('❤'.escape_unicode().to_string(), "\\u{2764}");
fn escape_debug(self) -> EscapeDebug
Returns an iterator that yields the literal escape code of a character as char
s.
This will escape the characters similar to the Debug
implementations of str
or char
.
As an iterator:
for c in '\n'.escape_debug() { print!("{}", c); } println!();
Using println!
directly:
println!("{}", '\n'.escape_debug());
Both are equivalent to:
println!("\\n");
Using to_string
:
assert_eq!('\n'.escape_debug().to_string(), "\\n");
fn escape_default(self) -> EscapeDefault
Returns an iterator that yields the literal escape code of a character as char
s.
The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:
\t
.\r
.\n
.\'
.\"
.\\
.0x20
.. 0x7e
inclusive is not escaped.escape_unicode
.As an iterator:
for c in '"'.escape_default() { print!("{}", c); } println!();
Using println!
directly:
println!("{}", '"'.escape_default());
Both are equivalent to:
println!("\\\"");
Using to_string
:
assert_eq!('"'.escape_default().to_string(), "\\\"");
fn len_utf8(self) -> usize
Returns the number of bytes this char
would need if encoded in UTF-8.
That number of bytes is always between 1 and 4, inclusive.
Basic usage:
let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4);
The &str
type guarantees that its contents are UTF-8, and so we can compare the length it would take if each code point was represented as a char
vs in the &str
itself:
// as chars let eastern = '東'; let capitol = '京'; // both can be represented as three bytes assert_eq!(3, eastern.len_utf8()); assert_eq!(3, capitol.len_utf8()); // as a &str, these two are encoded in UTF-8 let tokyo = "東京"; let len = eastern.len_utf8() + capitol.len_utf8(); // we can see that they take six bytes total... assert_eq!(6, tokyo.len()); // ... just like the &str assert_eq!(len, tokyo.len());
fn len_utf16(self) -> usize
Returns the number of 16-bit code units this char
would need if encoded in UTF-16.
See the documentation for len_utf8()
for more explanation of this concept. This function is a mirror, but for UTF-16 instead of UTF-8.
Basic usage:
let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2);
fn encode_utf8(self, dst: &mut [u8]) -> &mut str
Encodes this character as UTF-8 into the provided byte buffer, and then returns the subslice of the buffer that contains the encoded character.
Panics if the buffer is not large enough. A buffer of length four is large enough to encode any char
.
In both of these examples, 'ß' takes two bytes to encode.
let mut b = [0; 2]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, "ß"); assert_eq!(result.len(), 2);
A buffer that's too small:
use std::thread; let result = thread::spawn(|| { let mut b = [0; 1]; // this panics 'ß'.encode_utf8(&mut b); }).join(); assert!(result.is_err());
fn encode_utf16(self, dst: &mut [u16]) -> &mut [u16]
Encodes this character as UTF-16 into the provided u16
buffer, and then returns the subslice of the buffer that contains the encoded character.
Panics if the buffer is not large enough. A buffer of length 2 is large enough to encode any char
.
In both of these examples, '𝕊' takes two u16
s to encode.
let mut b = [0; 2]; let result = '𝕊'.encode_utf16(&mut b); assert_eq!(result.len(), 2);
A buffer that's too small:
use std::thread; let result = thread::spawn(|| { let mut b = [0; 1]; // this panics '𝕊'.encode_utf16(&mut b); }).join(); assert!(result.is_err());
fn is_alphabetic(self) -> bool
Returns true if this char
is an alphabetic code point, and false if not.
Basic usage:
assert!('a'.is_alphabetic()); assert!('京'.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic());
fn is_xid_start(self) -> bool
Returns true if this char
satisfies the 'XID_Start' Unicode property, and false otherwise.
'XID_Start' is a Unicode Derived Property specified in UAX #31, mostly similar to ID_Start
but modified for closure under NFKx
.
fn is_xid_continue(self) -> bool
Returns true if this char
satisfies the 'XID_Continue' Unicode property, and false otherwise.
'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.
fn is_lowercase(self) -> bool
Returns true if this char
is lowercase, and false otherwise.
'Lowercase' is defined according to the terms of the Unicode Derived Core Property Lowercase
.
Basic usage:
assert!('a'.is_lowercase()); assert!('δ'.is_lowercase()); assert!(!'A'.is_lowercase()); assert!(!'Δ'.is_lowercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_lowercase());
fn is_uppercase(self) -> bool
Returns true if this char
is uppercase, and false otherwise.
'Uppercase' is defined according to the terms of the Unicode Derived Core Property Uppercase
.
Basic usage:
assert!(!'a'.is_uppercase()); assert!(!'δ'.is_uppercase()); assert!('A'.is_uppercase()); assert!('Δ'.is_uppercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_uppercase());
fn is_whitespace(self) -> bool
Returns true if this char
is whitespace, and false otherwise.
'Whitespace' is defined according to the terms of the Unicode Derived Core Property White_Space
.
Basic usage:
assert!(' '.is_whitespace()); // a non-breaking space assert!('\u{A0}'.is_whitespace()); assert!(!'越'.is_whitespace());
fn is_alphanumeric(self) -> bool
Returns true if this char
is alphanumeric, and false otherwise.
'Alphanumeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.
Basic usage:
assert!('٣'.is_alphanumeric()); assert!('7'.is_alphanumeric()); assert!('৬'.is_alphanumeric()); assert!('K'.is_alphanumeric()); assert!('و'.is_alphanumeric()); assert!('藏'.is_alphanumeric()); assert!(!'¾'.is_alphanumeric()); assert!(!'①'.is_alphanumeric());
fn is_control(self) -> bool
Returns true if this char
is a control code point, and false otherwise.
'Control code point' is defined in terms of the Unicode General Category Cc
.
Basic usage:
// U+009C, STRING TERMINATOR assert!(''.is_control()); assert!(!'q'.is_control());
fn is_numeric(self) -> bool
Returns true if this char
is numeric, and false otherwise.
'Numeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No'.
Basic usage:
assert!('٣'.is_numeric()); assert!('7'.is_numeric()); assert!('৬'.is_numeric()); assert!(!'K'.is_numeric()); assert!(!'و'.is_numeric()); assert!(!'藏'.is_numeric()); assert!(!'¾'.is_numeric()); assert!(!'①'.is_numeric());
fn to_lowercase(self) -> ToLowercase
Returns an iterator that yields the lowercase equivalent of a char
as one or more char
s.
If a character does not have a lowercase equivalent, the same character will be returned back by the iterator.
This performs complex unconditional mappings with no tailoring: it maps one Unicode character to its lowercase equivalent according to the Unicode database and the additional complex mappings SpecialCasing.txt
. Conditional mappings (based on context or language) are not considered here.
For a full reference, see here.
As an iterator:
for c in 'İ'.to_lowercase() { print!("{}", c); } println!();
Using println!
directly:
println!("{}", 'İ'.to_lowercase());
Both are equivalent to:
println!("i\u{307}");
Using to_string
:
assert_eq!('C'.to_lowercase().to_string(), "c"); // Sometimes the result is more than one character: assert_eq!('İ'.to_lowercase().to_string(), "i\u{307}"); // Japanese scripts do not have case, and so: assert_eq!('山'.to_lowercase().to_string(), "山");
fn to_uppercase(self) -> ToUppercase
Returns an iterator that yields the uppercase equivalent of a char
as one or more char
s.
If a character does not have a uppercase equivalent, the same character will be returned back by the iterator.
This performs complex unconditional mappings with no tailoring: it maps one Unicode character to its lowercase equivalent according to the Unicode database and the additional complex mappings SpecialCasing.txt
. Conditional mappings (based on context or language) are not considered here.
For a full reference, see here.
As an iterator:
for c in 'ß'.to_uppercase() { print!("{}", c); } println!();
Using println!
directly:
println!("{}", 'ß'.to_uppercase());
Both are equivalent to:
println!("SS");
Using to_string
:
assert_eq!('c'.to_uppercase().to_string(), "C"); // Sometimes the result is more than one character: assert_eq!('ß'.to_uppercase().to_string(), "SS"); // Japanese does not have case, and so: assert_eq!('山'.to_uppercase().to_string(), "山");
In Turkish, the equivalent of 'i' in Latin has five forms instead of two:
Note that the lowercase dotted 'i' is the same as the Latin. Therefore:
let upper_i = 'i'.to_uppercase().to_string();
The value of upper_i
here relies on the language of the text: if we're in en-US
, it should be "I"
, but if we're in tr_TR
, it should be "İ"
. to_uppercase()
does not take this into account, and so:
let upper_i = 'i'.to_uppercase().to_string(); assert_eq!(upper_i, "I");
holds across languages.
impl PartialEq<char> for char
[src]
fn eq(&self, other: &char) -> bool
This method tests for self
and other
values to be equal, and is used by ==
. Read more
fn ne(&self, other: &char) -> bool
This method tests for !=
.
impl TryFrom<u32> for char
[src]
type Err = CharTryFromError
The type returned in the event of a conversion error.
fn try_from(i: u32) -> Result<char, char::Err>
Performs the conversion.
impl Default for char
[src]
fn default() -> char
Returns the "default value" for a type. Read more
impl Clone for char
[src]
fn clone(&self) -> char
Returns a deep copy of the value.
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from source
. Read more
impl Display for char
[src]
fn fmt(&self, f: &mut Formatter) -> Result<(), Error>
Formats the value using the given formatter.
impl Eq for char
[src]
impl<'a> Pattern<'a> for char
[src]
Searches for chars that are equal to a given char
type Searcher = CharSearcher<'a>
Associated searcher for this pattern
fn into_searcher(self, haystack: &'a str) -> CharSearcher<'a>
Constructs the associated searcher from self
and the haystack
to search in. Read more
fn is_contained_in(self, haystack: &'a str) -> bool
Checks whether the pattern matches anywhere in the haystack
fn is_prefix_of(self, haystack: &'a str) -> bool
Checks whether the pattern matches at the front of the haystack
fn is_suffix_of(self, haystack: &'a str) -> bool where CharSearcher<'a>: ReverseSearcher<'a>
Checks whether the pattern matches at the back of the haystack
impl From<u8> for char
Maps a byte in 0x00...0xFF to a char
whose code point has the same value, in U+0000 to U+00FF.
Unicode is designed such that this effectively decodes bytes with the character encoding that IANA calls ISO-8859-1. This encoding is compatible with ASCII.
Note that this is different from ISO/IEC 8859-1 a.k.a. ISO 8859-1 (with one less hypen), which leaves some "blanks", byte values that are not assigned to any character. ISO-8859-1 (the IANA one) assigns them to the C0 and C1 control codes.
Note that this is also different from Windows-1252 a.k.a. code page 1252, which is a superset ISO/IEC 8859-1 that assigns some (not all!) blanks to punctuation and various Latin characters.
To confuse things further, on the Web ascii
, iso-8859-1
, and windows-1252
are all aliases for a superset of Windows-1252 that fills the remaining blanks with corresponding C0 and C1 control codes.
fn from(i: u8) -> char
Performs the conversion.
impl Debug for char
[src]
fn fmt(&self, f: &mut Formatter) -> Result<(), Error>
Formats the value using the given formatter.
impl Ord for char
[src]
fn cmp(&self, other: &char) -> Ordering
This method returns an Ordering
between self
and other
. Read more
impl Hash for char
[src]
fn hash<H>(&self, state: &mut H) where H: Hasher
Feeds this value into the state given, updating the hasher as necessary.
fn hash_slice<H>(data: &[Self], state: &mut H) where H: Hasher
Feeds a slice of this type into the state provided.
impl PartialOrd<char> for char
[src]
fn partial_cmp(&self, other: &char) -> Option<Ordering>
This method returns an ordering between self
and other
values if one exists. Read more
fn lt(&self, other: &char) -> bool
This method tests less than (for self
and other
) and is used by the <
operator. Read more
fn le(&self, other: &char) -> bool
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
fn ge(&self, other: &char) -> bool
This method tests greater than or equal to (for self
and other
) and is used by the >=
operator. Read more
fn gt(&self, other: &char) -> bool
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
impl AsciiExt for char
[src]
type Owned = char
Container type for copied ASCII characters.
fn is_ascii(&self) -> bool
Checks if the value is within the ASCII range. Read more
fn to_ascii_uppercase(&self) -> char
Makes a copy of the string in ASCII upper case. Read more
fn to_ascii_lowercase(&self) -> char
Makes a copy of the string in ASCII lower case. Read more
fn eq_ignore_ascii_case(&self, other: &char) -> bool
Checks that two strings are an ASCII case-insensitive match. Read more
fn make_ascii_uppercase(&mut self)
Converts this type to its ASCII upper case equivalent in-place. Read more
fn make_ascii_lowercase(&mut self)
Converts this type to its ASCII lower case equivalent in-place. Read more
© 2010 The Rust Project Developers
Licensed under the Apache License, Version 2.0 or the MIT license, at your option.
https://doc.rust-lang.org/std/primitive.char.html