Class Utf8Safe

java.lang.Object
io.objectbox.flatbuffers.Utf8
io.objectbox.flatbuffers.Utf8Safe

public final class Utf8Safe extends Utf8
A set of low-level, high-performance static utility methods related to the UTF-8 character encoding. This class has no dependencies outside of the core JDK libraries.

There are several variants of UTF-8. The one implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1, which mandates the rejection of "overlong" byte sequences as well as rejection of 3-byte surrogate codepoint byte sequences. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate codepoint byte sequences.

The byte sequences considered valid by this class are exactly those that can be roundtrip converted to Strings and back to bytes using the UTF-8 charset, without loss:

 
 Arrays.equals(bytes, new String(bytes, Internal.UTF_8).getBytes(Internal.UTF_8))
 

See the Unicode Standard,
Table 3-6. UTF-8 Bit Distribution,
Table 3-7. Well Formed UTF-8 Byte Sequences.

  • Constructor Details

    • Utf8Safe

      public Utf8Safe()
  • Method Details

    • decodeUtf8Array

      public static String decodeUtf8Array(byte[] bytes, int index, int size)
    • decodeUtf8Buffer

      public static String decodeUtf8Buffer(ByteBuffer buffer, int offset, int length)
    • encodedLength

      public int encodedLength(CharSequence in)
      Description copied from class: Utf8
      Returns the number of bytes in the UTF-8-encoded form of sequence. For a string, this method is equivalent to string.getBytes(UTF_8).length, but is more efficient in both time and space.
      Specified by:
      encodedLength in class Utf8
    • decodeUtf8

      public String decodeUtf8(ByteBuffer buffer, int offset, int length) throws IllegalArgumentException
      Decodes the given UTF-8 portion of the ByteBuffer into a String.
      Specified by:
      decodeUtf8 in class Utf8
      Throws:
      IllegalArgumentException - if the input is not valid UTF-8.
    • encodeUtf8

      public void encodeUtf8(CharSequence in, ByteBuffer out)
      Encodes the given characters to the target ByteBuffer using UTF-8 encoding.

      Selects an optimal algorithm based on the type of ByteBuffer (i.e. heap or direct) and the capabilities of the platform.

      Specified by:
      encodeUtf8 in class Utf8
      Parameters:
      in - the source string to be encoded
      out - the target buffer to receive the encoded string.