Class UTF8

java.lang.Object
com.globalmentor.io.UTF8

public class UTF8 extends Object
Constants and methods for working with the UTF-8 encoding.
Author:
Garret Wilson
See Also:
  • Field Details

    • MAX_ENCODED_BYTE_COUNT1

      public static final int MAX_ENCODED_BYTE_COUNT1
      The largest code point value that can be encoded in one byte.
      See Also:
    • MAX_ENCODED_BYTE_COUNT2

      public static final int MAX_ENCODED_BYTE_COUNT2
      The largest code point value that can be encoded in two bytes.
      See Also:
    • MAX_ENCODED_BYTE_COUNT3

      public static final int MAX_ENCODED_BYTE_COUNT3
      The largest code point value that can be encoded in three bytes.
      See Also:
    • MAX_ENCODED_BYTE_COUNT_LENGTH

      public static final int MAX_ENCODED_BYTE_COUNT_LENGTH
      The maximum number of octets used to encoded a character in UTF-8.
      See Also:
  • Constructor Details

    • UTF8

      public UTF8()
  • Method Details

    • getEncodedByteCountForCodePoint

      public static int getEncodedByteCountForCodePoint(int c)
      Determines how many bytes are needed to encode a single character in UTF-8.
      Parameters:
      c - The character to encode.
      Returns:
      The minimum number of bytes needed to encode a single character.
    • getEncodedByteCountFromInitialByte

      public static int getEncodedByteCountFromInitialByte(byte initialByte)
      Determines how many bytes are used to encoded a sequence based on its first encoded byte.
      Parameters:
      initialByte - The value of the first byte (which in Java may be a negative number, as bytes are signed) in a UTF-8 sequence.
      Returns:
      The number of octets to expect in the sequence beginning with the given byte.
      Throws:
      IllegalArgumentException - if the given value is not a valid initial octet of UTF-8.
      See Also:
    • getEncodedByteCountFromInitialOctet

      public static int getEncodedByteCountFromInitialOctet(int initialOctet)
      Determines how many bytes are used to encoded a sequence based on its first encoded octet.
      Parameters:
      initialOctet - The value of the first octet in a UTF-8 sequence.
      Returns:
      The number of octets to expect in the sequence beginning with the given octet.
      Throws:
      IllegalArgumentException - if the given value is not a valid initial octet of UTF-8.