Enum ByteOrderMark

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Comparable<ByteOrderMark>

    public enum ByteOrderMark
    extends java.lang.Enum<ByteOrderMark>
    The Byte Order Mark (BOM) designations for different character encodings.

    This implementation only supports UTF-8, UTF-16, and UTF-32 BOM variants.

    Author:
    Garret Wilson
    See Also:
    Unicode Byte Order Mark (BOM) FAQ
    • Enum Constant Summary

      Enum Constants 
      Enum Constant Description
      UTF_16BE
      UTF-16, big-endian BOM
      UTF_16LE
      UTF-16, little-endian BOM
      UTF_32BE
      UTF-32, big-endian (1234 order) BOM
      UTF_32BE_MIXED
      UTF-32, big-endian, with word swapped byte order (2143 order) BOM
      UTF_32LE
      UTF-32, little-endian (4321 order) BOM
      UTF_32LE_MIXED
      UTF-32, little-endian, with word swapped byte order (3412 order) BOM
      UTF_8
      UTF-8 BOM
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int MAX_BYTE_COUNT
      The maximum number of bytes used by any of the byte order marks in this implementation.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      ByteOrderMark checkUsualIO()
      Checks to ensure this byte order mark is a usual one (i.e.
      static java.util.Optional<ByteOrderMark> detect​(byte[] bytes)
      Returns the byte order mark with which the given bytes start.
      static ByteOrderMark forCharset​(java.nio.charset.Charset charset)
      Determines the byte order mark (BOM) needed to represent the given charset.
      java.nio.ByteOrder getByteOrder()  
      byte[] getBytes()  
      int getLeastSignificantByteIndex()
      The index, of a group of encoded bytes for this identified byte order, of the least significant byte.
      int getLength()  
      int getMinimumBytesPerCharacter()  
      static java.util.Optional<ByteOrderMark> impute​(byte[] bytes, java.lang.CharSequence expectedChars, ObjectHolder<ByteOrderMark> actualBOM)
      Determines an imputed Byte Order Mark by detecting a BOM in the actual bytes or, if a true BOM is not present, by comparing the bytes to expected characters.
      boolean isMixed()  
      java.nio.charset.Charset toCharset()
      Returns a charset corresponding to this byte order mark.
      static ByteOrderMark valueOf​(java.lang.String name)
      Returns the enum constant of this type with the specified name.
      static ByteOrderMark[] values()
      Returns an array containing the constants of this enum type, in the order they are declared.
      • Methods inherited from class java.lang.Enum

        clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf
      • Methods inherited from class java.lang.Object

        getClass, notify, notifyAll, wait, wait, wait
    • Enum Constant Detail

      • UTF_16BE

        public static final ByteOrderMark UTF_16BE
        UTF-16, big-endian BOM
      • UTF_16LE

        public static final ByteOrderMark UTF_16LE
        UTF-16, little-endian BOM
      • UTF_32BE

        public static final ByteOrderMark UTF_32BE
        UTF-32, big-endian (1234 order) BOM
      • UTF_32LE

        public static final ByteOrderMark UTF_32LE
        UTF-32, little-endian (4321 order) BOM
      • UTF_32BE_MIXED

        public static final ByteOrderMark UTF_32BE_MIXED
        UTF-32, big-endian, with word swapped byte order (2143 order) BOM
      • UTF_32LE_MIXED

        public static final ByteOrderMark UTF_32LE_MIXED
        UTF-32, little-endian, with word swapped byte order (3412 order) BOM
    • Field Detail

      • MAX_BYTE_COUNT

        public static final int MAX_BYTE_COUNT
        The maximum number of bytes used by any of the byte order marks in this implementation.
        See Also:
        Constant Field Values
    • Method Detail

      • values

        public static ByteOrderMark[] values()
        Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
        for (ByteOrderMark c : ByteOrderMark.values())
            System.out.println(c);
        
        Returns:
        an array containing the constants of this enum type, in the order they are declared
      • valueOf

        public static ByteOrderMark valueOf​(java.lang.String name)
        Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
        Parameters:
        name - the name of the enum constant to be returned.
        Returns:
        the enum constant with the specified name
        Throws:
        java.lang.IllegalArgumentException - if this enum type has no constant with the specified name
        java.lang.NullPointerException - if the argument is null
      • getBytes

        public byte[] getBytes()
        Returns:
        The bytes of this byte order mark.
      • getLength

        public int getLength()
        Returns:
        The number of bytes in this byte order mark.
      • isMixed

        public boolean isMixed()
        Returns:
        true if the byte order mark is one of the "middle-endian" or "mixed-endian" orders for which no charset exists.
        See Also:
        UTF_32BE_MIXED, UTF_32LE_MIXED
      • checkUsualIO

        public ByteOrderMark checkUsualIO()
                                   throws java.io.IOException
        Checks to ensure this byte order mark is a usual one (i.e. one for which there exists a valid charset).
        Returns:
        This byte order mark.
        Throws:
        java.io.IOException - if this is a mixed byte order mark.
        See Also:
        isMixed()
      • getMinimumBytesPerCharacter

        public int getMinimumBytesPerCharacter()
        Returns:
        The minimum number of bytes used for each character in the charset represented by this byte order mark.
      • getByteOrder

        public java.nio.ByteOrder getByteOrder()
        Returns:
        The byte order of this byte order mark, or null if there is one byte per character or the byte order is mixed.
      • getLeastSignificantByteIndex

        public int getLeastSignificantByteIndex()
        The index, of a group of encoded bytes for this identified byte order, of the least significant byte. For example, this method would return 0 and 1 for UTF-16LE and UTF-16BE, respectively. The index will be less than getMinimumBytesPerCharacter().
        Returns:
        The index of the least significant byte within an encoded group for this identified byte order.
      • detect

        public static java.util.Optional<ByteOrderMark> detect​(@Nonnull
                                                               byte[] bytes)
        Returns the byte order mark with which the given bytes start. If no valid byte order mark is present, null is returned.
        Parameters:
        bytes - The array of bytes potentially starting with a byte order mark.
        Returns:
        The byte order mark detected.
      • impute

        public static java.util.Optional<ByteOrderMark> impute​(byte[] bytes,
                                                               java.lang.CharSequence expectedChars,
                                                               ObjectHolder<ByteOrderMark> actualBOM)
        Determines an imputed Byte Order Mark by detecting a BOM in the actual bytes or, if a true BOM is not present, by comparing the bytes to expected characters. Regardless of the number of expected characters given, only those characters necessary for detecting the byte order will be used.
        Parameters:
        bytes - The array of bytes representing the possible Byte Order Mark and possible expected characters.
        expectedChars - The characters expected, regardless of the encoding method used. At least four characters should included.
        actualBOM - Receives the actual byte order mark present, if any.
        Returns:
        The actual Byte Order Mark encountered; or, if no Byte Order Mark was present, the Byte Order Mark representing the character encoding assumed by comparing bytes to the expected characters; or Optional.empty() if neither approach could determine a character encoding.
        See Also:
        XML 1.0 (Fifth Edition): F.1 Detection Without External Encoding Information)
      • forCharset

        public static ByteOrderMark forCharset​(java.nio.charset.Charset charset)
        Determines the byte order mark (BOM) needed to represent the given charset. For charsets that do not specify endianness, big-endian is assumed as per the Charset documentation.
        Parameters:
        charset - The charset for which a byte order mark should be returned.
        Returns:
        The byte order mark for the given character set, or null if there is no byte order mark to represent the given character set.
        Throws:
        java.lang.NullPointerException - if the given charset is null.
      • toCharset

        public java.nio.charset.Charset toCharset()
        Returns a charset corresponding to this byte order mark.

        The byte order marks UTF_32BE_MIXED and UTF_32LE_MIXED are defined in XML but have no corresponding charset, so calling this method for them (i.e. for byte order marks for which isMixed() returns true) will result in a UnsupportedOperationException being thrown.

        Returns:
        a charset corresponding to this byte order mark.
        Throws:
        java.lang.UnsupportedOperationException - if this byte order mark has no corresponding charset.
        See Also:
        isMixed()