W3cubDocs

/Rust

Primitive Type char

A character type.

The char type represents a single character. More specifically, since 'character' isn't a well-defined concept in Unicode, char is a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'.

This documentation describes a number of methods and trait implementations on the char type. For technical reasons, there is additional, separate documentation in the std::char module as well.

Representation

char is always four bytes in size. This is a different representation than a given character would have as part of a String. For example:

let v = vec!['h', 'e', 'l', 'l', 'o'];

// five elements times four bytes for each element
assert_eq!(20, v.len() * std::mem::size_of::<char>());

let s = String::from("hello");

// five elements times one byte per element
assert_eq!(5, s.len() * std::mem::size_of::<u8>());

As always, remember that a human intuition for 'character' may not map to Unicode's definitions. For example, emoji symbols such as '❤️' can be more than one Unicode code point; this ❤️ in particular is two:

let s = String::from("❤️");

// we get two chars out of a single ❤️
let mut iter = s.chars();
assert_eq!(Some('\u{2764}'), iter.next());
assert_eq!(Some('\u{fe0f}'), iter.next());
assert_eq!(None, iter.next());

This means it won't fit into a char. Trying to create a literal with let heart = '❤️'; gives an error:

error: character literal may only contain one codepoint: '❤
let heart = '❤️';
            ^~

Another implication of the 4-byte fixed size of a char is that per-char processing can end up using a lot more memory:

let s = String::from("love: ❤️");
let v: Vec<char> = s.chars().collect();

assert_eq!(12, s.len() * std::mem::size_of::<u8>());
assert_eq!(32, v.len() * std::mem::size_of::<char>());

Methods

impl char [src]

Checks if a char is a digit in the given radix.

A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radices are supported.

Compared to is_numeric(), this function only recognizes the characters 0-9, a-z and A-Z.

'Digit' is defined to be only the following characters:

  • 0-9
  • a-z
  • A-Z

For a more comprehensive understanding of 'digit', see is_numeric().

Panics

Panics if given a radix larger than 36.

Examples

Basic usage:

assert!('1'.is_digit(10));
assert!('f'.is_digit(16));
assert!(!'f'.is_digit(10));

Passing a large radix, causing a panic:

use std::thread;

let result = thread::spawn(|| {
    // this panics
    '1'.is_digit(37);
}).join();

assert!(result.is_err());

Converts a char to a digit in the given radix.

A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radices are supported.

'Digit' is defined to be only the following characters:

  • 0-9
  • a-z
  • A-Z

Errors

Returns None if the char does not refer to a digit in the given radix.

Panics

Panics if given a radix larger than 36.

Examples

Basic usage:

assert_eq!('1'.to_digit(10), Some(1));
assert_eq!('f'.to_digit(16), Some(15));

Passing a non-digit results in failure:

assert_eq!('f'.to_digit(10), None);
assert_eq!('z'.to_digit(16), None);

Passing a large radix, causing a panic:

use std::thread;

let result = thread::spawn(|| {
    '1'.to_digit(37);
}).join();

assert!(result.is_err());

Returns an iterator that yields the hexadecimal Unicode escape of a character as chars.

This will escape characters with the Rust syntax of the form \u{NNNNNN} where NNNNNN is a hexadecimal representation.

Examples

As an iterator:

for c in '❤'.escape_unicode() {
    print!("{}", c);
}
println!();

Using println! directly:

println!("{}", '❤'.escape_unicode());

Both are equivalent to:

println!("\\u{{2764}}");

Using to_string:

assert_eq!('❤'.escape_unicode().to_string(), "\\u{2764}");

🔬 This is a nightly-only experimental API. (char_escape_debug #35068)

Returns an iterator that yields the literal escape code of a character as chars.

This will escape the characters similar to the Debug implementations of str or char.

Examples

As an iterator:

for c in '\n'.escape_debug() {
    print!("{}", c);
}
println!();

Using println! directly:

println!("{}", '\n'.escape_debug());

Both are equivalent to:

println!("\\n");

Using to_string:

assert_eq!('\n'.escape_debug().to_string(), "\\n");

Returns an iterator that yields the literal escape code of a character as chars.

The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:

  • Tab is escaped as \t.
  • Carriage return is escaped as \r.
  • Line feed is escaped as \n.
  • Single quote is escaped as \'.
  • Double quote is escaped as \".
  • Backslash is escaped as \\.
  • Any character in the 'printable ASCII' range 0x20 .. 0x7e inclusive is not escaped.
  • All other characters are given hexadecimal Unicode escapes; see escape_unicode.

Examples

As an iterator:

for c in '"'.escape_default() {
    print!("{}", c);
}
println!();

Using println! directly:

println!("{}", '"'.escape_default());

Both are equivalent to:

println!("\\\"");

Using to_string:

assert_eq!('"'.escape_default().to_string(), "\\\"");

Returns the number of bytes this char would need if encoded in UTF-8.

That number of bytes is always between 1 and 4, inclusive.

Examples

Basic usage:

let len = 'A'.len_utf8();
assert_eq!(len, 1);

let len = 'ß'.len_utf8();
assert_eq!(len, 2);

let len = 'ℝ'.len_utf8();
assert_eq!(len, 3);

let len = '💣'.len_utf8();
assert_eq!(len, 4);

The &str type guarantees that its contents are UTF-8, and so we can compare the length it would take if each code point was represented as a char vs in the &str itself:

// as chars
let eastern = '東';
let capitol = '京';

// both can be represented as three bytes
assert_eq!(3, eastern.len_utf8());
assert_eq!(3, capitol.len_utf8());

// as a &str, these two are encoded in UTF-8
let tokyo = "東京";

let len = eastern.len_utf8() + capitol.len_utf8();

// we can see that they take six bytes total...
assert_eq!(6, tokyo.len());

// ... just like the &str
assert_eq!(len, tokyo.len());

Returns the number of 16-bit code units this char would need if encoded in UTF-16.

See the documentation for len_utf8() for more explanation of this concept. This function is a mirror, but for UTF-16 instead of UTF-8.

Examples

Basic usage:

let n = 'ß'.len_utf16();
assert_eq!(n, 1);

let len = '💣'.len_utf16();
assert_eq!(len, 2);

Encodes this character as UTF-8 into the provided byte buffer, and then returns the subslice of the buffer that contains the encoded character.

Panics

Panics if the buffer is not large enough. A buffer of length four is large enough to encode any char.

Examples

In both of these examples, 'ß' takes two bytes to encode.

let mut b = [0; 2];

let result = 'ß'.encode_utf8(&mut b);

assert_eq!(result, "ß");

assert_eq!(result.len(), 2);

A buffer that's too small:

use std::thread;

let result = thread::spawn(|| {
    let mut b = [0; 1];

    // this panics
   'ß'.encode_utf8(&mut b);
}).join();

assert!(result.is_err());

Encodes this character as UTF-16 into the provided u16 buffer, and then returns the subslice of the buffer that contains the encoded character.

Panics

Panics if the buffer is not large enough. A buffer of length 2 is large enough to encode any char.

Examples

In both of these examples, '𝕊' takes two u16s to encode.

let mut b = [0; 2];

let result = '𝕊'.encode_utf16(&mut b);

assert_eq!(result.len(), 2);

A buffer that's too small:

use std::thread;

let result = thread::spawn(|| {
    let mut b = [0; 1];

    // this panics
    '𝕊'.encode_utf16(&mut b);
}).join();

assert!(result.is_err());

Returns true if this char is an alphabetic code point, and false if not.

Examples

Basic usage:

assert!('a'.is_alphabetic());
assert!('京'.is_alphabetic());

let c = '💝';
// love is many things, but it is not alphabetic
assert!(!c.is_alphabetic());

🔬 This is a nightly-only experimental API. (unicode)mainly needed for compiler internals

Returns true if this char satisfies the 'XID_Start' Unicode property, and false otherwise.

'XID_Start' is a Unicode Derived Property specified in UAX #31, mostly similar to ID_Start but modified for closure under NFKx.

🔬 This is a nightly-only experimental API. (unicode)mainly needed for compiler internals

Returns true if this char satisfies the 'XID_Continue' Unicode property, and false otherwise.

'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.

Returns true if this char is lowercase, and false otherwise.

'Lowercase' is defined according to the terms of the Unicode Derived Core Property Lowercase.

Examples

Basic usage:

assert!('a'.is_lowercase());
assert!('δ'.is_lowercase());
assert!(!'A'.is_lowercase());
assert!(!'Δ'.is_lowercase());

// The various Chinese scripts do not have case, and so:
assert!(!'中'.is_lowercase());

Returns true if this char is uppercase, and false otherwise.

'Uppercase' is defined according to the terms of the Unicode Derived Core Property Uppercase.

Examples

Basic usage:

assert!(!'a'.is_uppercase());
assert!(!'δ'.is_uppercase());
assert!('A'.is_uppercase());
assert!('Δ'.is_uppercase());

// The various Chinese scripts do not have case, and so:
assert!(!'中'.is_uppercase());

Returns true if this char is whitespace, and false otherwise.

'Whitespace' is defined according to the terms of the Unicode Derived Core Property White_Space.

Examples

Basic usage:

assert!(' '.is_whitespace());

// a non-breaking space
assert!('\u{A0}'.is_whitespace());

assert!(!'越'.is_whitespace());

Returns true if this char is alphanumeric, and false otherwise.

'Alphanumeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.

Examples

Basic usage:

assert!('٣'.is_alphanumeric());
assert!('7'.is_alphanumeric());
assert!('৬'.is_alphanumeric());
assert!('K'.is_alphanumeric());
assert!('و'.is_alphanumeric());
assert!('藏'.is_alphanumeric());
assert!(!'¾'.is_alphanumeric());
assert!(!'①'.is_alphanumeric());

Returns true if this char is a control code point, and false otherwise.

'Control code point' is defined in terms of the Unicode General Category Cc.

Examples

Basic usage:

// U+009C, STRING TERMINATOR
assert!('œ'.is_control());
assert!(!'q'.is_control());

Returns true if this char is numeric, and false otherwise.

'Numeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No'.

Examples

Basic usage:

assert!('٣'.is_numeric());
assert!('7'.is_numeric());
assert!('৬'.is_numeric());
assert!(!'K'.is_numeric());
assert!(!'و'.is_numeric());
assert!(!'藏'.is_numeric());
assert!(!'¾'.is_numeric());
assert!(!'①'.is_numeric());

Returns an iterator that yields the lowercase equivalent of a char as one or more chars.

If a character does not have a lowercase equivalent, the same character will be returned back by the iterator.

This performs complex unconditional mappings with no tailoring: it maps one Unicode character to its lowercase equivalent according to the Unicode database and the additional complex mappings SpecialCasing.txt. Conditional mappings (based on context or language) are not considered here.

For a full reference, see here.

Examples

As an iterator:

for c in 'İ'.to_lowercase() {
    print!("{}", c);
}
println!();

Using println! directly:

println!("{}", 'İ'.to_lowercase());

Both are equivalent to:

println!("i\u{307}");

Using to_string:

assert_eq!('C'.to_lowercase().to_string(), "c");

// Sometimes the result is more than one character:
assert_eq!('İ'.to_lowercase().to_string(), "i\u{307}");

// Japanese scripts do not have case, and so:
assert_eq!('山'.to_lowercase().to_string(), "山");

Returns an iterator that yields the uppercase equivalent of a char as one or more chars.

If a character does not have a uppercase equivalent, the same character will be returned back by the iterator.

This performs complex unconditional mappings with no tailoring: it maps one Unicode character to its lowercase equivalent according to the Unicode database and the additional complex mappings SpecialCasing.txt. Conditional mappings (based on context or language) are not considered here.

For a full reference, see here.

Examples

As an iterator:

for c in 'ß'.to_uppercase() {
    print!("{}", c);
}
println!();

Using println! directly:

println!("{}", 'ß'.to_uppercase());

Both are equivalent to:

println!("SS");

Using to_string:

assert_eq!('c'.to_uppercase().to_string(), "C");

// Sometimes the result is more than one character:
assert_eq!('ß'.to_uppercase().to_string(), "SS");

// Japanese does not have case, and so:
assert_eq!('山'.to_uppercase().to_string(), "山");

Note on locale

In Turkish, the equivalent of 'i' in Latin has five forms instead of two:

  • 'Dotless': I / ı, sometimes written ï
  • 'Dotted': İ / i

Note that the lowercase dotted 'i' is the same as the Latin. Therefore:

let upper_i = 'i'.to_uppercase().to_string();

The value of upper_i here relies on the language of the text: if we're in en-US, it should be "I", but if we're in tr_TR, it should be "İ". to_uppercase() does not take this into account, and so:

let upper_i = 'i'.to_uppercase().to_string();

assert_eq!(upper_i, "I");

holds across languages.

Trait Implementations

impl PartialEq<char> for char [src]

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

impl TryFrom<u32> for char [src]

🔬 This is a nightly-only experimental API. (try_from #33417)

The type returned in the event of a conversion error.

🔬 This is a nightly-only experimental API. (try_from #33417)

Performs the conversion.

impl Default for char [src]

Returns the "default value" for a type. Read more

impl Clone for char [src]

Returns a deep copy of the value.

Performs copy-assignment from source. Read more

impl Display for char [src]

Formats the value using the given formatter.

impl Eq for char [src]

impl<'a> Pattern<'a> for char [src]

Searches for chars that are equal to a given char

🔬 This is a nightly-only experimental API. (pattern #27721)API not fully fleshed out and ready to be stabilized

Associated searcher for this pattern

🔬 This is a nightly-only experimental API. (pattern #27721)API not fully fleshed out and ready to be stabilized

Constructs the associated searcher from self and the haystack to search in. Read more

🔬 This is a nightly-only experimental API. (pattern #27721)API not fully fleshed out and ready to be stabilized

Checks whether the pattern matches anywhere in the haystack

🔬 This is a nightly-only experimental API. (pattern #27721)API not fully fleshed out and ready to be stabilized

Checks whether the pattern matches at the front of the haystack

🔬 This is a nightly-only experimental API. (pattern #27721)API not fully fleshed out and ready to be stabilized

Checks whether the pattern matches at the back of the haystack

impl From<u8> for char
1.13.0
[src]

Maps a byte in 0x00...0xFF to a char whose code point has the same value, in U+0000 to U+00FF.

Unicode is designed such that this effectively decodes bytes with the character encoding that IANA calls ISO-8859-1. This encoding is compatible with ASCII.

Note that this is different from ISO/IEC 8859-1 a.k.a. ISO 8859-1 (with one less hypen), which leaves some "blanks", byte values that are not assigned to any character. ISO-8859-1 (the IANA one) assigns them to the C0 and C1 control codes.

Note that this is also different from Windows-1252 a.k.a. code page 1252, which is a superset ISO/IEC 8859-1 that assigns some (not all!) blanks to punctuation and various Latin characters.

To confuse things further, on the Web ascii, iso-8859-1, and windows-1252 are all aliases for a superset of Windows-1252 that fills the remaining blanks with corresponding C0 and C1 control codes.

Performs the conversion.

impl Debug for char [src]

Formats the value using the given formatter.

impl Ord for char [src]

This method returns an Ordering between self and other. Read more

impl Hash for char [src]

Feeds this value into the state given, updating the hasher as necessary.

Feeds a slice of this type into the state provided.

impl PartialOrd<char> for char [src]

This method returns an ordering between self and other values if one exists. Read more

This method tests less than (for self and other) and is used by the < operator. Read more

This method tests less than or equal to (for self and other) and is used by the <= operator. Read more

This method tests greater than or equal to (for self and other) and is used by the >= operator. Read more

This method tests greater than (for self and other) and is used by the > operator. Read more

impl AsciiExt for char [src]

Container type for copied ASCII characters.

Checks if the value is within the ASCII range. Read more

Makes a copy of the string in ASCII upper case. Read more

Makes a copy of the string in ASCII lower case. Read more

Checks that two strings are an ASCII case-insensitive match. Read more

Converts this type to its ASCII upper case equivalent in-place. Read more

Converts this type to its ASCII lower case equivalent in-place. Read more

© 2010 The Rust Project Developers
Licensed under the Apache License, Version 2.0 or the MIT license, at your option.
https://doc.rust-lang.org/std/primitive.char.html