Package com.globalmentor.io
Class UTF8
- java.lang.Object
-
- com.globalmentor.io.UTF8
-
public class UTF8 extends java.lang.Object
Constants and methods for working with the UTF-8 encoding.- Author:
- Garret Wilson
- See Also:
- RFC 3629: UTF-8, a transformation format of ISO 10646, Wikipedia: UTF-8, Sun: What Is UTF-8 And Why Is It Important?, UTF-8 Test
-
-
Field Summary
Fields Modifier and Type Field Description static int
MAX_ENCODED_BYTE_COUNT_LENGTH
The maximum number of octets used to encoded a character in UTF-8.static int
MAX_ENCODED_BYTE_COUNT1
The largest code point value that can be encoded in one byte.static int
MAX_ENCODED_BYTE_COUNT2
The largest code point value that can be encoded in two bytes.static int
MAX_ENCODED_BYTE_COUNT3
The largest code point value that can be encoded in three bytes.
-
Constructor Summary
Constructors Constructor Description UTF8()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
getEncodedByteCountForCodePoint(int c)
Determines how many bytes are needed to encode a single character in UTF-8.static int
getEncodedByteCountFromInitialByte(byte initialByte)
Determines how many bytes are used to encoded a sequence based on its first encoded byte.static int
getEncodedByteCountFromInitialOctet(int initialOctet)
Determines how many bytes are used to encoded a sequence based on its first encoded octet.
-
-
-
Field Detail
-
MAX_ENCODED_BYTE_COUNT1
public static final int MAX_ENCODED_BYTE_COUNT1
The largest code point value that can be encoded in one byte.- See Also:
- Constant Field Values
-
MAX_ENCODED_BYTE_COUNT2
public static final int MAX_ENCODED_BYTE_COUNT2
The largest code point value that can be encoded in two bytes.- See Also:
- Constant Field Values
-
MAX_ENCODED_BYTE_COUNT3
public static final int MAX_ENCODED_BYTE_COUNT3
The largest code point value that can be encoded in three bytes.- See Also:
- Constant Field Values
-
MAX_ENCODED_BYTE_COUNT_LENGTH
public static final int MAX_ENCODED_BYTE_COUNT_LENGTH
The maximum number of octets used to encoded a character in UTF-8.- See Also:
- Constant Field Values
-
-
Method Detail
-
getEncodedByteCountForCodePoint
public static int getEncodedByteCountForCodePoint(int c)
Determines how many bytes are needed to encode a single character in UTF-8.- Parameters:
c
- The character to encode.- Returns:
- The minimum number of bytes needed to encode a single character.
-
getEncodedByteCountFromInitialByte
public static int getEncodedByteCountFromInitialByte(byte initialByte)
Determines how many bytes are used to encoded a sequence based on its first encoded byte.- Parameters:
initialByte
- The value of the first byte (which in Java may be a negative number, as bytes are signed) in a UTF-8 sequence.- Returns:
- The number of octets to expect in the sequence beginning with the given byte.
- Throws:
java.lang.IllegalArgumentException
- if the given value is not a valid initial octet of UTF-8.- See Also:
getEncodedByteCountFromInitialOctet(int)
-
getEncodedByteCountFromInitialOctet
public static int getEncodedByteCountFromInitialOctet(int initialOctet)
Determines how many bytes are used to encoded a sequence based on its first encoded octet.- Parameters:
initialOctet
- The value of the first octet in a UTF-8 sequence.- Returns:
- The number of octets to expect in the sequence beginning with the given octet.
- Throws:
java.lang.IllegalArgumentException
- if the given value is not a valid initial octet of UTF-8.
-
-