Package com.globalmentor.io
Enum ByteOrderMark
- java.lang.Object
-
- java.lang.Enum<ByteOrderMark>
-
- com.globalmentor.io.ByteOrderMark
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Comparable<ByteOrderMark>
public enum ByteOrderMark extends java.lang.Enum<ByteOrderMark>
The Byte Order Mark (BOM) designations for different character encodings.This implementation only supports UTF-8, UTF-16, and UTF-32 BOM variants.
- Author:
- Garret Wilson
- See Also:
- Unicode Byte Order Mark (BOM) FAQ
-
-
Enum Constant Summary
Enum Constants Enum Constant Description UTF_16BE
UTF-16, big-endian BOMUTF_16LE
UTF-16, little-endian BOMUTF_32BE
UTF-32, big-endian (1234 order) BOMUTF_32BE_MIXED
UTF-32, big-endian, with word swapped byte order (2143 order) BOMUTF_32LE
UTF-32, little-endian (4321 order) BOMUTF_32LE_MIXED
UTF-32, little-endian, with word swapped byte order (3412 order) BOMUTF_8
UTF-8 BOM
-
Field Summary
Fields Modifier and Type Field Description static int
MAX_BYTE_COUNT
The maximum number of bytes used by any of the byte order marks in this implementation.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ByteOrderMark
checkUsualIO()
Checks to ensure this byte order mark is a usual one (i.e.static java.util.Optional<ByteOrderMark>
detect(byte[] bytes)
Returns the byte order mark with which the given bytes start.static ByteOrderMark
forCharset(java.nio.charset.Charset charset)
Determines the byte order mark (BOM) needed to represent the given charset.java.nio.ByteOrder
getByteOrder()
byte[]
getBytes()
int
getLeastSignificantByteIndex()
The index, of a group of encoded bytes for this identified byte order, of the least significant byte.int
getLength()
int
getMinimumBytesPerCharacter()
static java.util.Optional<ByteOrderMark>
impute(byte[] bytes, java.lang.CharSequence expectedChars, ObjectHolder<ByteOrderMark> actualBOM)
Determines an imputed Byte Order Mark by detecting a BOM in the actual bytes or, if a true BOM is not present, by comparing the bytes to expected characters.boolean
isMixed()
java.nio.charset.Charset
toCharset()
Returns a charset corresponding to this byte order mark.static ByteOrderMark
valueOf(java.lang.String name)
Returns the enum constant of this type with the specified name.static ByteOrderMark[]
values()
Returns an array containing the constants of this enum type, in the order they are declared.
-
-
-
Enum Constant Detail
-
UTF_8
public static final ByteOrderMark UTF_8
UTF-8 BOM
-
UTF_16BE
public static final ByteOrderMark UTF_16BE
UTF-16, big-endian BOM
-
UTF_16LE
public static final ByteOrderMark UTF_16LE
UTF-16, little-endian BOM
-
UTF_32BE
public static final ByteOrderMark UTF_32BE
UTF-32, big-endian (1234 order) BOM
-
UTF_32LE
public static final ByteOrderMark UTF_32LE
UTF-32, little-endian (4321 order) BOM
-
UTF_32BE_MIXED
public static final ByteOrderMark UTF_32BE_MIXED
UTF-32, big-endian, with word swapped byte order (2143 order) BOM
-
UTF_32LE_MIXED
public static final ByteOrderMark UTF_32LE_MIXED
UTF-32, little-endian, with word swapped byte order (3412 order) BOM
-
-
Field Detail
-
MAX_BYTE_COUNT
public static final int MAX_BYTE_COUNT
The maximum number of bytes used by any of the byte order marks in this implementation.- See Also:
- Constant Field Values
-
-
Method Detail
-
values
public static ByteOrderMark[] values()
Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:for (ByteOrderMark c : ByteOrderMark.values()) System.out.println(c);
- Returns:
- an array containing the constants of this enum type, in the order they are declared
-
valueOf
public static ByteOrderMark valueOf(java.lang.String name)
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)- Parameters:
name
- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
java.lang.IllegalArgumentException
- if this enum type has no constant with the specified namejava.lang.NullPointerException
- if the argument is null
-
getBytes
public byte[] getBytes()
- Returns:
- The bytes of this byte order mark.
-
getLength
public int getLength()
- Returns:
- The number of bytes in this byte order mark.
-
isMixed
public boolean isMixed()
- Returns:
true
if the byte order mark is one of the "middle-endian" or "mixed-endian" orders for which no charset exists.- See Also:
UTF_32BE_MIXED
,UTF_32LE_MIXED
-
checkUsualIO
public ByteOrderMark checkUsualIO() throws java.io.IOException
Checks to ensure this byte order mark is a usual one (i.e. one for which there exists a valid charset).- Returns:
- This byte order mark.
- Throws:
java.io.IOException
- if this is a mixed byte order mark.- See Also:
isMixed()
-
getMinimumBytesPerCharacter
public int getMinimumBytesPerCharacter()
- Returns:
- The minimum number of bytes used for each character in the charset represented by this byte order mark.
-
getByteOrder
public java.nio.ByteOrder getByteOrder()
- Returns:
- The byte order of this byte order mark, or
null
if there is one byte per character or the byte order is mixed.
-
getLeastSignificantByteIndex
public int getLeastSignificantByteIndex()
The index, of a group of encoded bytes for this identified byte order, of the least significant byte. For example, this method would return0
and1
for UTF-16LE and UTF-16BE, respectively. The index will be less thangetMinimumBytesPerCharacter()
.- Returns:
- The index of the least significant byte within an encoded group for this identified byte order.
-
detect
public static java.util.Optional<ByteOrderMark> detect(@Nonnull byte[] bytes)
Returns the byte order mark with which the given bytes start. If no valid byte order mark is present,null
is returned.- Parameters:
bytes
- The array of bytes potentially starting with a byte order mark.- Returns:
- The byte order mark detected.
-
impute
public static java.util.Optional<ByteOrderMark> impute(byte[] bytes, java.lang.CharSequence expectedChars, ObjectHolder<ByteOrderMark> actualBOM)
Determines an imputed Byte Order Mark by detecting a BOM in the actual bytes or, if a true BOM is not present, by comparing the bytes to expected characters. Regardless of the number of expected characters given, only those characters necessary for detecting the byte order will be used.- Parameters:
bytes
- The array of bytes representing the possible Byte Order Mark and possible expected characters.expectedChars
- The characters expected, regardless of the encoding method used. At least four characters should included.actualBOM
- Receives the actual byte order mark present, if any.- Returns:
- The actual Byte Order Mark encountered; or, if no Byte Order Mark was present, the Byte Order Mark representing the character encoding assumed by
comparing bytes to the expected characters; or
Optional.empty()
if neither approach could determine a character encoding. - See Also:
- XML 1.0 (Fifth Edition): F.1 Detection Without External Encoding Information)
-
forCharset
public static ByteOrderMark forCharset(java.nio.charset.Charset charset)
Determines the byte order mark (BOM) needed to represent the given charset. For charsets that do not specify endianness, big-endian is assumed as per theCharset
documentation.- Parameters:
charset
- The charset for which a byte order mark should be returned.- Returns:
- The byte order mark for the given character set, or
null
if there is no byte order mark to represent the given character set. - Throws:
java.lang.NullPointerException
- if the given charset isnull
.
-
toCharset
public java.nio.charset.Charset toCharset()
Returns a charset corresponding to this byte order mark.The byte order marks
UTF_32BE_MIXED
andUTF_32LE_MIXED
are defined in XML but have no corresponding charset, so calling this method for them (i.e. for byte order marks for whichisMixed()
returnstrue
) will result in aUnsupportedOperationException
being thrown.- Returns:
- a charset corresponding to this byte order mark.
- Throws:
java.lang.UnsupportedOperationException
- if this byte order mark has no corresponding charset.- See Also:
isMixed()
-
-