Class Utf8Safe
There are several variants of UTF-8. The one implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1, which mandates the rejection of "overlong" byte sequences as well as rejection of 3-byte surrogate codepoint byte sequences. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate codepoint byte sequences.
The byte sequences considered valid by this class are exactly those that can be roundtrip converted to Strings and back to bytes using the UTF-8 charset, without loss:
Arrays.equals(bytes, new String(bytes, Internal.UTF_8).getBytes(Internal.UTF_8))
See the Unicode Standard, Table 3-6. UTF-8 Bit Distribution, Table 3-7. Well Formed UTF-8 Byte Sequences.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondecodeUtf8(ByteBuffer buffer, int offset, int length) Decodes the given UTF-8 portion of theByteBufferinto aString.static StringdecodeUtf8Array(byte[] bytes, int index, int size) static StringdecodeUtf8Buffer(ByteBuffer buffer, int offset, int length) intReturns the number of bytes in the UTF-8-encoded form ofsequence.voidencodeUtf8(CharSequence in, ByteBuffer out) Encodes the given characters to the targetByteBufferusing UTF-8 encoding.Methods inherited from class io.objectbox.flatbuffers.Utf8
encodeUtf8CodePoint, getDefault, setDefault
-
Constructor Details
-
Utf8Safe
public Utf8Safe()
-
-
Method Details
-
decodeUtf8Array
-
decodeUtf8Buffer
-
encodedLength
Description copied from class:Utf8Returns the number of bytes in the UTF-8-encoded form ofsequence. For a string, this method is equivalent tostring.getBytes(UTF_8).length, but is more efficient in both time and space.- Specified by:
encodedLengthin classUtf8
-
decodeUtf8
Decodes the given UTF-8 portion of theByteBufferinto aString.- Specified by:
decodeUtf8in classUtf8- Throws:
IllegalArgumentException- if the input is not valid UTF-8.
-
encodeUtf8
Encodes the given characters to the targetByteBufferusing UTF-8 encoding.Selects an optimal algorithm based on the type of
ByteBuffer(i.e. heap or direct) and the capabilities of the platform.- Specified by:
encodeUtf8in classUtf8- Parameters:
in- the source string to be encodedout- the target buffer to receive the encoded string.
-