Package com.globalmentor.java
Class Characters
java.lang.Object
com.globalmentor.java.Characters
An immutable set of characters that supports various searching and other functions. This essentially provides an efficient yet immutable array with
object-oriented functionality.
This class is similar to String
, except that it discards duplicate characters. Furthermore, this class allows no Unicode surrogates; the characters
contained are interpreted as complete Unicode code points. This also makes comparison more efficient. As this class is similar to an ordered set than a list,
it doesn't implement CharSequence
in order to prevent signature conflicts; and provides size()
to count its contents instead of a "length"
property.
This class also provides static utilities and constants for interacting with characters in general.
In most cases, names of constants are derived from Unicode names.
- Author:
- Garret Wilson
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final char
An apostrophe character.static final char
A backspace.static final char
The blank symbol (2422;BLANK SYMBOL;So;0;ON;;;;;N;BLANK;;;;).static final char
The Byte Order Mark (BOM).static final char
Unicode bullet character.static final char
A carriage return.static final char
The symbol for carriage return (240D;SYMBOL FOR CARRIAGE RETURN;So;0;ON;;;;;N;GRAPHIC FOR CARRIAGE RETURN;;;;).static final char
A horizontal tab (0009;<control>;Cc;0;S;;;;;N;CHARACTER TABULATION;;;;).static final char
A colon character.static final char
A comma character.static final String
Unicode control characters (0x0000-0x001F, 0x007F-0x09F).static final char
The copyright symbol.static final char
Data Link Escape control character (0010;<control>;Cc;0;BN;;;;;N;DATA LINK ESCAPE;;;;).static final Characters
Punctuation that expects a character to follow at some point.static final char
A double high-reversed-9 quotation mark.static final char
A double low-9 quotation mark.static final char
A double prime quotation mark.static final char
Unicode em dash character.static final char[]
Deprecated.static final char
Unicode en dash character.static final char
The symbol for end of transmission (2404;SYMBOL FOR END OF TRANSMISSION;So;0;ON;;;;;N;GRAPHIC FOR END OF TRANSMISSION;;;;).static final Characters
Characters considered to be end-of-line markers (e.g.static final char
An equals sign character (003D;EQUALS SIGN;Sm;0;ON;;;;;N;;;;;).static final char
A form feed (FF).static final char
The symbol for form feed (240C;SYMBOL FOR FORM FEED;So;0;ON;;;;;N;GRAPHIC FOR FORM FEED;;;;).static final String
Unicode formatting characters; Unicode characters marked with "Cf", such asWORD_JOINER
.static final char
A full width quotation mark.static final char
A grave accent character.static final char
A greater-than sign character (003E;GREATER-THAN SIGN;Sm;0;ON;;;;;Y;;;;;).static final Characters
Punctuation used to group characters.static final char
Unicode horizontal ellipsis.static final char
A hyphen or minus character.static final char
Infinity symbol (221E;INFINITY;Sm;0;ON;;;;;N;;;;;).static final char
The information separator four character.static final char
The information separator one character.static final char
The information separator three character.static final char
The information separator two character.static final char
A Y umlaut.static final char
An uppercase oe ligature.static final char
A lowercase oe ligature.static final char
A left double quote.static final Characters
Left punctuation used to group characters.static final char
A left-pointing angle bracket character (2329;LEFT-POINTING ANGLE BRACKET;Ps;0;ON;3008;;;;Y;BRA;;;;).static final char
A left-pointing guillemet character.static final String
Characters that could be considered the start of a quotation.static final char
A left single quote.static final char
A left-to-right mark (200E;LEFT-TO-RIGHT MARK;Cf;0;L;;;;;N;;;;;).static final char
A less-than sign character (003C;LESS-THAN SIGN;Sm;0;ON;;;;;Y;;;;;).static final char
A line feed (LF).static final char
The symbol for line feed (240A;SYMBOL FOR LINE FEED;So;0;ON;;;;;N;GRAPHIC FOR LINE FEED;;;;).static final char
A line separator character (2028;LINE SEPARATOR;Zl;0;WS;;;;;N;;;;;).static final Characters
Characters in the UnicodeLine_Separator
(Zl
) category as of Unicode 9.0.0.static final char
A vertical tab (000B;<control>;Cc;0;S;;;;;N;LINE TABULATION;;;;).static final String
Characters that delimit a list separated by trim characters, commas, and/or semicolons.static final char
A low double prime quotation mark.static final char
A middle dot character.static final Characters
Unicode newline characters.static final char
A next line (NEL) control character.static final char
Unicode no-break space (NBSP).static final char[]
A shared instance of an empty array of characters.static final Characters
The shared instance of no characters.static final char
The character with Unicode code point zero.static final char
The symbol for NULL (2400;SYMBOL FOR NULL;So;0;ON;;;;;N;GRAPHIC FOR NULL;;;;).static final char
A character for a placeholder in text for an otherwise unspecified object.static final char
A paragraph separator character (2029;PARAGRAPH SEPARATOR;Zp;0;B;;;;;N;;;;;).static final Characters
Characters in the UnicodeParagraph_Separator
(Zp
) category as of Unicode 9.0.0.static final String
Unicode paragraph separator characters.static final char
The paragraph sign.static final char
The percent sign.static final Characters
Characters used to punctuate phrases and sentences.static final char
The pilcrow or paragraph sign.static final char
A plus sign character.static final Characters
Characters used to punctuate phrases and sentences, as well as general punctuation such as quotes.static final char
A question mark character (003F;QUESTION MARK;Po;0;ON;;;;;N;;;;;).static final char
A quotation mark character.static final String
Characters that start or end quotations.static final char
Represents a character that is unknown or unrepresentable in Unicode.static final char
A reversed double prime quotation mark.static final char
A right double quote.static final Characters
Right punctuation used to group characters.static final char
A right-pointing angle bracket character (232A;RIGHT-POINTING ANGLE BRACKET;Pe;0;ON;3009;;;;Y;KET;;;;).static final char
A right-pointing guillemet character.static final String
Characters that could be considered the end of a quotation.static final char
A right single quote.static final char
A right-to-right mark (200F;RIGHT-TO-LEFT MARK;Cf;0;R;;;;;N;;;;;).static final String
Unicode segment separator characters.static final char
A semicolon character.static final Characters
Characters in the UnicodeSeparator
(Z
) group as of Unicode 9.0.0.static final char
A single high-reversed-9 quotation mark.static final char
A left-pointing single guillemet character.static final char
A single low-9 quotation mark.static final char
A right-pointing single guillemet character.static final char
A solidus or slash character (002F;SOLIDUS;Po;0;CS;;;;;N;SLASH;;;;).static final char
A space character.static final Characters
Characters in the UnicodeSpace_Separator
(Zs
) category as of Unicode 9.0.0.static final char
The symbol for space (2420;SYMBOL FOR SPACE;So;0;ON;;;;;N;GRAPHIC FOR SPACE;;;;).static final char
A start of string control character.static final char
A string terminator control character.static final char
A tilde character (007E;TILDE;Sm;0;ON;;;;;N;;;;;).static final char
Trademark character.static final Characters
Characters that do not contain visible "content", and may be trimmed from ends of a string.static final char
An invalid, undefined Unicode character which is "guaranteed not to be a Unicode character at all.static final char
A unit separator character.static final char
The symbol for vertical tab (240B;SYMBOL FOR VERTICAL TABULATION;So;0;ON;;;;;N;GRAPHIC FOR VERTICAL TABULATION;;;;).static final Characters
Unicode whitespace characters.static final Characters
Characters that separate words.static final char
A zero-width non-breaking space—word joiner (WJ).static final String
Characters that allow words to wrap.static final char
A zero-width joiner (200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;).static final char
A zero-width no-breaking space (ZWNBSP)—the Byte Order Mark (BOM) (FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;).static final char
A zero-width non-joiner (200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;).static final char
A zero-width space (ZWSP) that may expand during justification. -
Method Summary
Modifier and TypeMethodDescriptionadd
(char... characters) Creates a new object with these characters and the given characters.add
(Characters characters) Creates a new object with these characters and the given characters.add
(CharSequence charSequence) Creates a new object with these characters and the given characters.addRange
(char first, char last) Adds a range of characters.static StringBuilder
appendLabel
(StringBuilder stringBuilder, int c) Appends a string representing the character as 'x', or if the character is a control character or a surrogate, either a special representation such as '\n' or the Unicode code point of this character, e.g.static StringBuilder
appendLabelArrayString
(StringBuilder stringBuilder, char[] characters) Appends a string representing an array of characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g.static StringBuilder
appendUnicodeString
(StringBuilder stringBuilder, int c) Appends a string representing the Unicode code point of this character, e.g.boolean
contains
(char character) Determines whether the given character is contained in these characters.static Characters
from
(CharSequence charSequence) Character sequence factory method.static String
getLabel
(int c) Returns a string representing the character as 'x', or if the character is a control character, the Unicode code point of this character, e.g.static boolean
isASCII
(char c) Determines whether a character is in the ASCII character range (0x00-0x7F).static boolean
isCharInRange
(char c, char[][] ranges) Sees if the specified character is in one of the specified ranges.boolean
isEmpty()
static final boolean
isLatinDigit
(char c) Deprecated.static boolean
isPunctuation
(char c) Specifies whether or not a given character is a punctuation mark.static boolean
isRomanNumeral
(char c) Determines whether a character is a Roman numeral.static boolean
isWhitespace
(char c) Specifies whether or not a given character is whitespace.static boolean
isWordDelimiter
(char c) Specifies whether or not a given character is a word delimiter, such as whitespace or punctuation.static boolean
isWordWrap
(char c) Specifies whether or not a given character allows a word wrap.static Characters
of
(char... characters) Characters factory method.static Characters
of
(char[] characters, int start, int end) Characters factory method.static Characters
of
(Characters... multipleCharacters) Characters factory method from existingCharacters
instances.static Characters
ofRange
(char first, char last) Creates a range of characters.static final Character
parseCharacter
(String string) Parses a string and returns its character value.remove
(char... characters) Creates a new object with these characters, with the given characters removed.remove
(Characters characters) Creates a new object with these characters, with the given characters removed.remove
(CharSequence charSequence) Creates a new object with these characters, with the given characters removed.int
size()
split
(CharSequence charSequence) Splits a character sequence on the these characters.static byte[]
toByteArray
(char[] characters) Converts an array of characters to an array of bytes, using the UTF-8 charset.static byte[]
toByteArray
(char[] characters, Charset charset) Converts an array of characters to an array of bytes, using the given character encoding.Returns a string representing an array of these characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g.static String
toLabelArrayString
(char... characters) Returns a string representing an array of these characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g.static String
toLabelArrayString
(CharSequence characters) Returns a string representing an array of these characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g.toString()
A string builder containing these characters.toStringBuilder
(int extraCapacity) A string builder containing these characters, with an initial capacity with room for the specified number of extra characters.
-
Field Details
-
NO_CHARS
public static final char[] NO_CHARSA shared instance of an empty array of characters. -
EMPTY_ARRAY
Deprecated.to be removed in favor ofNO_CHARS
.A shared instance of an empty array of characters. -
NONE
The shared instance of no characters. -
NULL_CHAR
public static final char NULL_CHARThe character with Unicode code point zero.- See Also:
-
BACKSPACE_CHAR
public static final char BACKSPACE_CHARA backspace.- See Also:
-
CHARACTER_TABULATION_CHAR
public static final char CHARACTER_TABULATION_CHARA horizontal tab (0009;<control>;Cc;0;S;;;;;N;CHARACTER TABULATION;;;;).- See Also:
-
LINE_FEED_CHAR
public static final char LINE_FEED_CHARA line feed (LF).- See Also:
-
LINE_TABULATION_CHAR
public static final char LINE_TABULATION_CHARA vertical tab (000B;<control>;Cc;0;S;;;;;N;LINE TABULATION;;;;).- See Also:
-
FORM_FEED_CHAR
public static final char FORM_FEED_CHARA form feed (FF).- See Also:
-
CARRIAGE_RETURN_CHAR
public static final char CARRIAGE_RETURN_CHARA carriage return.- See Also:
-
DATA_LINK_ESCAPE_CHAR
public static final char DATA_LINK_ESCAPE_CHARData Link Escape control character (0010;<control>;Cc;0;BN;;;;;N;DATA LINK ESCAPE;;;;).- See Also:
-
INFORMATION_SEPARATOR_FOUR_CHAR
public static final char INFORMATION_SEPARATOR_FOUR_CHARThe information separator four character.- See Also:
-
INFORMATION_SEPARATOR_THREE_CHAR
public static final char INFORMATION_SEPARATOR_THREE_CHARThe information separator three character.- See Also:
-
INFORMATION_SEPARATOR_TWO_CHAR
public static final char INFORMATION_SEPARATOR_TWO_CHARThe information separator two character.- See Also:
-
INFORMATION_SEPARATOR_ONE_CHAR
public static final char INFORMATION_SEPARATOR_ONE_CHARThe information separator one character.- See Also:
-
UNIT_SEPARATOR_CHAR
public static final char UNIT_SEPARATOR_CHARA unit separator character.- See Also:
-
SPACE_CHAR
public static final char SPACE_CHARA space character.- See Also:
-
QUOTATION_MARK_CHAR
public static final char QUOTATION_MARK_CHARA quotation mark character.- See Also:
-
PERCENT_SIGN_CHAR
public static final char PERCENT_SIGN_CHARThe percent sign.- See Also:
-
APOSTROPHE_CHAR
public static final char APOSTROPHE_CHARAn apostrophe character.- See Also:
-
PLUS_SIGN_CHAR
public static final char PLUS_SIGN_CHARA plus sign character.- See Also:
-
COMMA_CHAR
public static final char COMMA_CHARA comma character.- See Also:
-
HYPHEN_MINUS_CHAR
public static final char HYPHEN_MINUS_CHARA hyphen or minus character.- See Also:
-
SOLIDUS_CHAR
public static final char SOLIDUS_CHARA solidus or slash character (002F;SOLIDUS;Po;0;CS;;;;;N;SLASH;;;;).- See Also:
-
COLON_CHAR
public static final char COLON_CHARA colon character.- See Also:
-
SEMICOLON_CHAR
public static final char SEMICOLON_CHARA semicolon character.- See Also:
-
LESS_THAN_CHAR
public static final char LESS_THAN_CHARA less-than sign character (003C;LESS-THAN SIGN;Sm;0;ON;;;;;Y;;;;;).- See Also:
-
EQUALS_SIGN_CHAR
public static final char EQUALS_SIGN_CHARAn equals sign character (003D;EQUALS SIGN;Sm;0;ON;;;;;N;;;;;).- See Also:
-
GREATER_THAN_CHAR
public static final char GREATER_THAN_CHARA greater-than sign character (003E;GREATER-THAN SIGN;Sm;0;ON;;;;;Y;;;;;).- See Also:
-
QUESTION_MARK_CHAR
public static final char QUESTION_MARK_CHARA question mark character (003F;QUESTION MARK;Po;0;ON;;;;;N;;;;;).- See Also:
-
GRAVE_ACCENT_CHAR
public static final char GRAVE_ACCENT_CHARA grave accent character.- See Also:
-
TILDE_CHAR
public static final char TILDE_CHARA tilde character (007E;TILDE;Sm;0;ON;;;;;N;;;;;).- See Also:
-
NEXT_LINE_CHAR
public static final char NEXT_LINE_CHARA next line (NEL) control character.- See Also:
-
START_OF_STRING_CHAR
public static final char START_OF_STRING_CHARA start of string control character.- See Also:
-
STRING_TERMINATOR_CHAR
public static final char STRING_TERMINATOR_CHARA string terminator control character.- See Also:
-
NO_BREAK_SPACE_CHAR
public static final char NO_BREAK_SPACE_CHARUnicode no-break space (NBSP).- See Also:
-
COPYRIGHT_SIGN
public static final char COPYRIGHT_SIGNThe copyright symbol.- See Also:
-
LEFT_POINTING_DOUBLE_ANGLE_QUOTATION_MARK_CHAR
public static final char LEFT_POINTING_DOUBLE_ANGLE_QUOTATION_MARK_CHARA left-pointing guillemet character.- See Also:
-
PILCROW_SIGN_CHAR
public static final char PILCROW_SIGN_CHARThe pilcrow or paragraph sign.- See Also:
-
PARAGRAPH_SIGN_CHAR
public static final char PARAGRAPH_SIGN_CHARThe paragraph sign.- See Also:
-
MIDDLE_DOT_CHAR
public static final char MIDDLE_DOT_CHARA middle dot character.- See Also:
-
RIGHT_POINTING_DOUBLE_ANGLE_QUOTATION_MARK_CHAR
public static final char RIGHT_POINTING_DOUBLE_ANGLE_QUOTATION_MARK_CHARA right-pointing guillemet character.- See Also:
-
LATIN_CAPITAL_LIGATURE_OE_CHAR
public static final char LATIN_CAPITAL_LIGATURE_OE_CHARAn uppercase oe ligature.- See Also:
-
LATIN_SMALL_LIGATURE_OE_CHAR
public static final char LATIN_SMALL_LIGATURE_OE_CHARA lowercase oe ligature.- See Also:
-
LATIN_CAPITAL_LETTER_Y_WITH_DIAERESIS_CHAR
public static final char LATIN_CAPITAL_LETTER_Y_WITH_DIAERESIS_CHARA Y umlaut.- See Also:
-
ZERO_WIDTH_SPACE_CHAR
public static final char ZERO_WIDTH_SPACE_CHARA zero-width space (ZWSP) that may expand during justification.- See Also:
-
ZERO_WIDTH_NON_JOINER_CHAR
public static final char ZERO_WIDTH_NON_JOINER_CHARA zero-width non-joiner (200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;).- See Also:
-
ZERO_WIDTH_JOINER_CHAR
public static final char ZERO_WIDTH_JOINER_CHARA zero-width joiner (200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;).- See Also:
-
LEFT_TO_RIGHT_MARK_CHAR
public static final char LEFT_TO_RIGHT_MARK_CHARA left-to-right mark (200E;LEFT-TO-RIGHT MARK;Cf;0;L;;;;;N;;;;;).- See Also:
-
RIGHT_TO_LEFT_MARK_CHAR
public static final char RIGHT_TO_LEFT_MARK_CHARA right-to-right mark (200F;RIGHT-TO-LEFT MARK;Cf;0;R;;;;;N;;;;;).- See Also:
-
WORD_JOINER_CHAR
public static final char WORD_JOINER_CHARA zero-width non-breaking space—word joiner (WJ).- See Also:
-
LEFT_SINGLE_QUOTATION_MARK_CHAR
public static final char LEFT_SINGLE_QUOTATION_MARK_CHARA left single quote.- See Also:
-
RIGHT_SINGLE_QUOTATION_MARK_CHAR
public static final char RIGHT_SINGLE_QUOTATION_MARK_CHARA right single quote.- See Also:
-
SINGLE_LOW_9_QUOTATION_MARK_CHAR
public static final char SINGLE_LOW_9_QUOTATION_MARK_CHARA single low-9 quotation mark.- See Also:
-
SINGLE_HIGH_REVERSED_9_QUOTATION_MARK_CHAR
public static final char SINGLE_HIGH_REVERSED_9_QUOTATION_MARK_CHARA single high-reversed-9 quotation mark.- See Also:
-
LEFT_DOUBLE_QUOTATION_MARK_CHAR
public static final char LEFT_DOUBLE_QUOTATION_MARK_CHARA left double quote.- See Also:
-
RIGHT_DOUBLE_QUOTATION_MARK_CHAR
public static final char RIGHT_DOUBLE_QUOTATION_MARK_CHARA right double quote.- See Also:
-
DOUBLE_LOW_9_QUOTATION_MARK_CHAR
public static final char DOUBLE_LOW_9_QUOTATION_MARK_CHARA double low-9 quotation mark.- See Also:
-
DOUBLE_HIGH_REVERSED_9_QUOTATION_MARK_CHAR
public static final char DOUBLE_HIGH_REVERSED_9_QUOTATION_MARK_CHARA double high-reversed-9 quotation mark.- See Also:
-
EN_DASH_CHAR
public static final char EN_DASH_CHARUnicode en dash character.- See Also:
-
EM_DASH_CHAR
public static final char EM_DASH_CHARUnicode em dash character.- See Also:
-
BULLET_CHAR
public static final char BULLET_CHARUnicode bullet character.- See Also:
-
HORIZONTAL_ELLIPSIS_CHAR
public static final char HORIZONTAL_ELLIPSIS_CHARUnicode horizontal ellipsis.- See Also:
-
LINE_SEPARATOR_CHAR
public static final char LINE_SEPARATOR_CHARA line separator character (2028;LINE SEPARATOR;Zl;0;WS;;;;;N;;;;;).- See Also:
-
PARAGRAPH_SEPARATOR_CHAR
public static final char PARAGRAPH_SEPARATOR_CHARA paragraph separator character (2029;PARAGRAPH SEPARATOR;Zp;0;B;;;;;N;;;;;).- See Also:
-
SINGLE_LEFT_POINTING_ANGLE_QUOTATION_MARK_CHAR
public static final char SINGLE_LEFT_POINTING_ANGLE_QUOTATION_MARK_CHARA left-pointing single guillemet character.- See Also:
-
SINGLE_RIGHT_POINTING_ANGLE_QUOTATION_MARK_CHAR
public static final char SINGLE_RIGHT_POINTING_ANGLE_QUOTATION_MARK_CHARA right-pointing single guillemet character.- See Also:
-
TRADE_MARK_SIGN_CHAR
public static final char TRADE_MARK_SIGN_CHARTrademark character.- See Also:
-
INFINITY_CHAR
public static final char INFINITY_CHARInfinity symbol (221E;INFINITY;Sm;0;ON;;;;;N;;;;;).- See Also:
-
LEFT_POINTING_ANGLE_BRACKET
public static final char LEFT_POINTING_ANGLE_BRACKETA left-pointing angle bracket character (2329;LEFT-POINTING ANGLE BRACKET;Ps;0;ON;3008;;;;Y;BRA;;;;).- See Also:
-
RIGHT_POINTING_ANGLE_BRACKET
public static final char RIGHT_POINTING_ANGLE_BRACKETA right-pointing angle bracket character (232A;RIGHT-POINTING ANGLE BRACKET;Pe;0;ON;3009;;;;Y;KET;;;;).- See Also:
-
NULL_SYMBOL
public static final char NULL_SYMBOLThe symbol for NULL (2400;SYMBOL FOR NULL;So;0;ON;;;;;N;GRAPHIC FOR NULL;;;;).- See Also:
-
LINE_FEED_SYMBOL
public static final char LINE_FEED_SYMBOLThe symbol for line feed (240A;SYMBOL FOR LINE FEED;So;0;ON;;;;;N;GRAPHIC FOR LINE FEED;;;;).- See Also:
-
VERTICAL_TAB_SYMBOL
public static final char VERTICAL_TAB_SYMBOLThe symbol for vertical tab (240B;SYMBOL FOR VERTICAL TABULATION;So;0;ON;;;;;N;GRAPHIC FOR VERTICAL TABULATION;;;;).- See Also:
-
FORM_FEED_SYMBOL
public static final char FORM_FEED_SYMBOLThe symbol for form feed (240C;SYMBOL FOR FORM FEED;So;0;ON;;;;;N;GRAPHIC FOR FORM FEED;;;;).- See Also:
-
CARRIAGE_RETURN_SYMBOL
public static final char CARRIAGE_RETURN_SYMBOLThe symbol for carriage return (240D;SYMBOL FOR CARRIAGE RETURN;So;0;ON;;;;;N;GRAPHIC FOR CARRIAGE RETURN;;;;).- See Also:
-
END_OF_TRANSMISSION_SYMBOL
public static final char END_OF_TRANSMISSION_SYMBOLThe symbol for end of transmission (2404;SYMBOL FOR END OF TRANSMISSION;So;0;ON;;;;;N;GRAPHIC FOR END OF TRANSMISSION;;;;).- See Also:
-
SPACE_SYMBOL
public static final char SPACE_SYMBOLThe symbol for space (2420;SYMBOL FOR SPACE;So;0;ON;;;;;N;GRAPHIC FOR SPACE;;;;).- See Also:
-
BLANK_SYMBOL
public static final char BLANK_SYMBOLThe blank symbol (2422;BLANK SYMBOL;So;0;ON;;;;;N;BLANK;;;;).- See Also:
-
REVERSED_DOUBLE_PRIME_QUOTATION_MARK_CHAR
public static final char REVERSED_DOUBLE_PRIME_QUOTATION_MARK_CHARA reversed double prime quotation mark.- See Also:
-
DOUBLE_PRIME_QUOTATION_MARK_CHAR
public static final char DOUBLE_PRIME_QUOTATION_MARK_CHARA double prime quotation mark.- See Also:
-
LOW_DOUBLE_PRIME_QUOTATION_MARK_CHAR
public static final char LOW_DOUBLE_PRIME_QUOTATION_MARK_CHARA low double prime quotation mark.- See Also:
-
FULLWIDTH_QUOTATION_MARK_CHAR
public static final char FULLWIDTH_QUOTATION_MARK_CHARA full width quotation mark.- See Also:
-
ZERO_WIDTH_NO_BREAK_SPACE_CHAR
public static final char ZERO_WIDTH_NO_BREAK_SPACE_CHARA zero-width no-breaking space (ZWNBSP)—the Byte Order Mark (BOM) (FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;). For non-breaking purposes, deprecated in favor ofWORD_JOINER_CHAR
.- See Also:
-
BOM_CHAR
public static final char BOM_CHARThe Byte Order Mark (BOM). -
OBJECT_REPLACEMENT_CHAR
public static final char OBJECT_REPLACEMENT_CHARA character for a placeholder in text for an otherwise unspecified object.- See Also:
-
REPLACEMENT_CHAR
public static final char REPLACEMENT_CHARRepresents a character that is unknown or unrepresentable in Unicode.- See Also:
-
UNDEFINED_CHAR
public static final char UNDEFINED_CHARAn invalid, undefined Unicode character which is "guaranteed not to be a Unicode character at all.- See Also:
-
CONTROL_CHARS
Unicode control characters (0x0000-0x001F, 0x007F-0x09F).- See Also:
-
PARAGRAPH_SEPARATOR_CHARS
Unicode paragraph separator characters.- See Also:
-
SEGMENT_SEPARATOR_CHARS
Unicode segment separator characters.- See Also:
-
NEWLINE_CHARACTERS
Unicode newline characters.- See Also:
-
WHITESPACE_CHARACTERS
Unicode whitespace characters. -
FORMAT_CHARS
Unicode formatting characters; Unicode characters marked with "Cf", such asWORD_JOINER
.- See Also:
-
TRIM_CHARACTERS
Characters that do not contain visible "content", and may be trimmed from ends of a string. These include whitespace, control characters, and formatting characters. -
LIST_DELIMITER_CHARS
Characters that delimit a list separated by trim characters, commas, and/or semicolons.- See Also:
-
LEFT_QUOTE_CHARS
Characters that could be considered the start of a quotation.- See Also:
-
RIGHT_QUOTE_CHARS
Characters that could be considered the end of a quotation.- See Also:
-
QUOTE_CHARS
Characters that start or end quotations.- See Also:
-
PHRASE_PUNCTUATION_CHARACTERS
Characters used to punctuate phrases and sentences. -
DEPENDENT_PUNCTUATION_CHARACTERS
Punctuation that expects a character to follow at some point. -
LEFT_GROUP_PUNCTUATION_CHARACTERS
Left punctuation used to group characters. -
RIGHT_GROUP_PUNCTUATION_CHARACTERS
Right punctuation used to group characters. -
GROUP_PUNCTUATION_CHARACTERS
Punctuation used to group characters. -
PUNCTUATION_CHARS
Characters used to punctuate phrases and sentences, as well as general punctuation such as quotes. -
WORD_DELIMITER_CHARACTERS
Characters that separate words. -
SPACE_SEPARATOR_CHARACTERS
Characters in the UnicodeSpace_Separator
(Zs
) category as of Unicode 9.0.0. -
LINE_SEPARATOR_CHARACTERS
Characters in the UnicodeLine_Separator
(Zl
) category as of Unicode 9.0.0. -
PARAGRAPH_SEPARATOR_CHARACTERS
Characters in the UnicodeParagraph_Separator
(Zp
) category as of Unicode 9.0.0. -
EOL_CHARACTERS
Characters considered to be end-of-line markers (e.g. CR and LF). -
SEPARATOR_CHARACTERS
Characters in the UnicodeSeparator
(Z
) group as of Unicode 9.0.0. -
WORD_WRAP_CHARS
Characters that allow words to wrap.
-
-
Method Details
-
of
Characters factory method. Duplicates are ignored.- Parameters:
characters
- The characters to store.- Returns:
- An instance of
Characters
with the given characters stored. - Throws:
NullPointerException
- if the given characters isnull
.IllegalArgumentException
- if the given characters contain Unicode surrogate characters.
-
of
Characters factory method. Duplicates are ignored.- Parameters:
characters
- The characters to store.start
- The start index, inclusive.end
- The end index, exclusive.- Returns:
- An instance of
Characters
with the given characters stored. - Throws:
NullPointerException
- if the given characters isnull
.IllegalArgumentException
- if the given characters contain Unicode surrogate characters.
-
ofRange
Creates a range of characters.- Parameters:
first
- The first of the range, inclusive.last
- The last of the range, inclusive.- Returns:
- Characters representing the indicated range.
- Throws:
IllegalArgumentException
- if the last character comes before the first character.
-
of
Characters factory method from existingCharacters
instances. Duplicates are ignored.- Parameters:
multipleCharacters
- TheCharacters
instances containing characters to store.- Returns:
- An instance of
Characters
with the given characters stored. - Throws:
NullPointerException
- if the given characters isnull
.IllegalArgumentException
- if the given characters contain Unicode surrogate characters.
-
from
Character sequence factory method. Duplicates are ignored.- Parameters:
charSequence
- The character sequence containing characters to store.- Returns:
- An instance of
Characters
with the characters contained on the given char sequence. - Throws:
NullPointerException
- if the given character sequence isnull
.IllegalArgumentException
- if the given character sequence contains Unicode surrogate characters.
-
isEmpty
public boolean isEmpty()- Returns:
true
if this object contains no characters.
-
size
public int size()- Returns:
- The number of characters.
-
add
Creates a new object with these characters and the given characters. Duplicates are ignored.- Parameters:
characters
- The characters to add.- Returns:
- A new object containing these characters and the given characters.
- Throws:
NullPointerException
- if the given characters isnull
.IllegalArgumentException
- if the given characters contain Unicode surrogate characters.
-
add
Creates a new object with these characters and the given characters. Duplicates are ignored.- Parameters:
characters
- The characters to add.- Returns:
- A new object containing these characters and the given characters.
- Throws:
NullPointerException
- if the given characters isnull
.IllegalArgumentException
- if the given characters contain Unicode surrogate characters.
-
add
Creates a new object with these characters and the given characters. Duplicates are ignored.- Parameters:
charSequence
- The characters to add.- Returns:
- A new object containing these characters and the given characters.
- Throws:
NullPointerException
- if the given character sequencenull
.IllegalArgumentException
- if the given character sequence contains Unicode surrogate characters.
-
addRange
Adds a range of characters.- Parameters:
first
- The first of the range, inclusive.last
- The last of the range, inclusive.- Returns:
- A new object containing these characters and the given range of characters.
- Throws:
IllegalArgumentException
- if the last character comes before the first character.
-
remove
Creates a new object with these characters, with the given characters removed.- Parameters:
characters
- The characters to remove.- Returns:
- A new object containing these characters without the given characters.
- Throws:
NullPointerException
- if the given characters isnull
.
-
remove
Creates a new object with these characters, with the given characters removed.- Parameters:
characters
- The characters to remove.- Returns:
- A new object containing these characters without the given characters.
- Throws:
NullPointerException
- if the given characters isnull
.
-
remove
Creates a new object with these characters, with the given characters removed.- Parameters:
charSequence
- The characters to add.- Returns:
- A new object containing these characters without the given characters.
- Throws:
NullPointerException
- if the given character sequencenull
.
-
split
Splits a character sequence on the these characters. Runs of matching characters are removed and the interspersed tokens are returned.- API Note:
- This method produces the same result without regard to whether one or more character sequences begin and/or end with the delimiter.
- Implementation Specification:
- The current implementation does not support surrogate characters.
- Implementation Note:
- This method is likely more efficient than a regular expression-based approach, especially in situations in which splitting is likely to occur at a small frequency, because the setup cost is low and individual character testing is efficient.
- Parameters:
charSequence
- The character sequence to split.- Returns:
- A list of subsequences; the list may not be mutable.
-
toString
-
toLabelArrayString
Returns a string representing an array of these characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g. "U+1234". Example: "['a', 0x0020]"- Implementation Specification:
- This method does not treat surrogate characters specially.
- Returns:
- A string containing an array representation of these characters.
-
toStringBuilder
A string builder containing these characters. This implementation provides an initial capacity for 16 more characters.- Returns:
- A string builder containing these characters.
- See Also:
-
toStringBuilder
A string builder containing these characters, with an initial capacity with room for the specified number of extra characters.- Parameters:
extraCapacity
- The extra initial capacity.- Returns:
- A string builder containing these characters.
- Throws:
IllegalArgumentException
- if the given capacity is negative.- See Also:
-
contains
public boolean contains(char character) Determines whether the given character is contained in these characters.- Parameters:
character
- The character to check.- Returns:
true
if the character exists in these characters.
-
isCharInRange
public static boolean isCharInRange(char c, char[][] ranges) Sees if the specified character is in one of the specified ranges.- Parameters:
c
- The character to check.ranges
- An array of character pair arrays, in order, the first of each pair specifying the bottom inclusive character of a range, the second of which specifying the top inclusive character of the range.- Returns:
true
if the character is in one of the ranges, elsefalse
.
-
isASCII
public static boolean isASCII(char c) Determines whether a character is in the ASCII character range (0x00-0x7F).- Parameters:
c
- The character to examine.- Returns:
true
if the character is an ASCII character.
-
isLatinDigit
Deprecated.Determines whether a character is one of the digits '0'-'9'.- Parameters:
c
- The character to examine.- Returns:
true
if the character is an ISO_LATIN_1 digit.
-
isPunctuation
public static boolean isPunctuation(char c) Specifies whether or not a given character is a punctuation mark.- Parameters:
c
- Character to analyze.- Returns:
true
if the character is punctuation.
-
isRomanNumeral
public static boolean isRomanNumeral(char c) Determines whether a character is a Roman numeral.- Parameters:
c
- The character to examine.- Returns:
true
if the character is a Roman numeral.
-
isWhitespace
public static boolean isWhitespace(char c) Specifies whether or not a given character is whitespace.- Parameters:
c
- Character to analyze.- Returns:
true
if the character is whitespace.
-
isWordDelimiter
public static boolean isWordDelimiter(char c) Specifies whether or not a given character is a word delimiter, such as whitespace or punctuation.- Parameters:
c
- Character to analyze.- Returns:
true
if the character allows word wrapping.
-
isWordWrap
public static boolean isWordWrap(char c) Specifies whether or not a given character allows a word wrap.- Parameters:
c
- Character to analyze.- Returns:
true
if the character allows word wrapping.
-
getLabel
Returns a string representing the character as 'x', or if the character is a control character, the Unicode code point of this character, e.g. "U+1234".- Implementation Specification:
- This method supports Unicode supplementary code points.
- Parameters:
c
- The code point a string representation of which to append.- Returns:
- The string label representing the character.
- See Also:
-
toLabelArrayString
Returns a string representing an array of these characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g. "U+1234". Example: "['a', 0x0020]"- Implementation Specification:
- This method does not treat surrogate characters specially.
- Parameters:
characters
- The characters to return as a string of an array.- Returns:
- A string containing an array representation of these characters.
-
toLabelArrayString
Returns a string representing an array of these characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g. "U+1234". Example: "['a', 0x0020]"- Implementation Specification:
- This method does not treat surrogate characters specially.
- Parameters:
characters
- The characters to return as a string of an array.- Returns:
- A string containing an array representation of these characters.
-
toByteArray
public static byte[] toByteArray(char[] characters) Converts an array of characters to an array of bytes, using the UTF-8 charset.- Parameters:
characters
- The characters to convert to bytes.- Returns:
- An array of bytes representing the given characters in the UTF-8 charset.
-
toByteArray
Converts an array of characters to an array of bytes, using the given character encoding.- Parameters:
characters
- The characters to convert to bytes.charset
- The charset to use when converting characters to bytes.- Returns:
- An array of bytes representing the given characters in the specified encoding.
-
appendLabelArrayString
Appends a string representing an array of characters, each character represented as 'x', or if the character is a control character, the Unicode code point of this character, e.g. "U+1234". Example: "['a', 0x0020]"- Implementation Specification:
- This method does not treat surrogate characters specially.
- Parameters:
stringBuilder
- The string builder to which the string will be appended.characters
- The characters the strings of the Unicode code points to append.- Returns:
- The string builder.
- Throws:
NullPointerException
- if the given string builder isnull
.
-
appendLabel
Appends a string representing the character as 'x', or if the character is a control character or a surrogate, either a special representation such as '\n' or the Unicode code point of this character, e.g. "U+1234".- Implementation Specification:
- This method supports Unicode supplementary code points.
- Parameters:
stringBuilder
- The string builder to which the string will be appended.c
- The code point a string representation of which to append.- Returns:
- The string builder.
- Throws:
NullPointerException
- if the given string builder isnull
.- See Also:
-
appendUnicodeString
Appends a string representing the Unicode code point of this character, e.g. "U+1234". The length of the added string depends on the Unicode code point; most code points will result in four hex characters.- Parameters:
stringBuilder
- The string builder to which the string will be appended.c
- The code point the Unicode string of which to append.- Returns:
- The string builder.
- Throws:
NullPointerException
- if the given string builder isnull
.
-
parseCharacter
Parses a string and returns its character value.- Parameters:
string
- A string expected to contain a single character.- Returns:
- The single character contained by the string.
- Throws:
NullPointerException
- if the given string isnull
IllegalArgumentException
- if the string is not composed of a single character.
-
NO_CHARS
.