Class StringUtil


  • @Internal
    public class StringUtil
    extends Object
    Collection of string handling utilities
    • Field Detail

      • UTF16LE

        public static final Charset UTF16LE
      • UTF8

        public static final Charset UTF8
      • WIN_1252

        public static final Charset WIN_1252
      • BIG5

        public static final Charset BIG5
    • Method Detail

      • getFromUnicodeLE

        public static String getFromUnicodeLE​(byte[] string,
                                              int offset,
                                              int len)
                                       throws ArrayIndexOutOfBoundsException,
                                              IllegalArgumentException
        Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.

        { 0x16, 0x00 } -0x16

        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the length of the final string
        Returns:
        the converted string, never null.
        Throws:
        ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
      • getFromUnicodeLE

        public static String getFromUnicodeLE​(byte[] string)
        Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.

        { 0x16, 0x00 } -0x16

        Parameters:
        string - the byte array to be converted
        Returns:
        the converted string, never null
      • getToUnicodeLE

        public static byte[] getToUnicodeLE​(String string)
        Convert String to 16-bit unicode characters in little endian format
        Parameters:
        string - the string
        Returns:
        the byte array of 16-bit unicode characters
      • getFromCompressedUnicode

        public static String getFromCompressedUnicode​(byte[] string,
                                                      int offset,
                                                      int len)
        Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
        Parameters:
        string - byte array to read
        offset - offset to read byte array
        len - length to read byte array
        Returns:
        String generated String instance by reading byte array
      • readUnicodeString

        public static String readUnicodeString​(LittleEndianInput in)
        InputStream in is expected to contain:
        1. ushort nChars
        2. byte is16BitFlag
        3. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.

        This structure is also known as a XLUnicodeString.

      • readUnicodeString

        public static String readUnicodeString​(LittleEndianInput in,
                                               int nChars)
        InputStream in is expected to contain:
        1. byte is16BitFlag
        2. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
        This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, readUnicodeString(LittleEndianInput) can be used.
      • writeUnicodeString

        public static void writeUnicodeString​(LittleEndianOutput out,
                                              String value)
        OutputStream out will get:
        1. ushort nChars
        2. byte is16BitFlag
        3. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
      • writeUnicodeStringFlagAndData

        public static void writeUnicodeStringFlagAndData​(LittleEndianOutput out,
                                                         String value)
        OutputStream out will get:
        1. byte is16BitFlag
        2. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
        This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, writeUnicodeString(LittleEndianOutput, String) can be used.
      • putCompressedUnicode

        public static void putCompressedUnicode​(String input,
                                                byte[] output,
                                                int offset)
        Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)
        Parameters:
        input - the String containing the data to be written
        output - the byte array to which the data is to be written
        offset - an offset into the byte arrat at which the data is start when written
      • putUnicodeLE

        public static void putUnicodeLE​(String input,
                                        byte[] output,
                                        int offset)
        Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)
        Parameters:
        input - the String containing the unicode data to be written
        output - the byte array to hold the uncompressed unicode, should be twice the length of the String
        offset - the offset to start writing into the byte array
      • getPreferredEncoding

        public static String getPreferredEncoding()
        Returns:
        the encoding we want to use, currently hardcoded to ISO-8859-1
      • hasMultibyte

        public static boolean hasMultibyte​(String value)
        check the parameter has multibyte character
        Parameters:
        value - string to check
        Returns:
        boolean result true:string has at least one multibyte character
      • isUnicodeString

        public static boolean isUnicodeString​(String value)
        Checks to see if a given String needs to be represented as Unicode
        Parameters:
        value - The string to look at.
        Returns:
        true if string needs Unicode to be represented.
      • startsWithIgnoreCase

        public static boolean startsWithIgnoreCase​(String haystack,
                                                   String prefix)
        Tests if the string starts with the specified prefix, ignoring case consideration.
      • endsWithIgnoreCase

        public static boolean endsWithIgnoreCase​(String haystack,
                                                 String suffix)
        Tests if the string ends with the specified suffix, ignoring case consideration.
      • isUpperCase

        @Internal
        public static boolean isUpperCase​(char c)
      • mapMsCodepoint

        public static void mapMsCodepoint​(int msCodepoint,
                                          int unicodeCodepoint)
      • countMatches

        public static int countMatches​(CharSequence haystack,
                                       char needle)
        Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
        Parameters:
        haystack - the CharSequence to check, may be null
        needle - the character to count the quantity of
        Returns:
        the number of occurrences, 0 if the CharSequence is null
      • getFromUnicodeLE0Terminated

        public static String getFromUnicodeLE0Terminated​(byte[] string,
                                                         int offset,
                                                         int len)
                                                  throws ArrayIndexOutOfBoundsException,
                                                         IllegalArgumentException
        Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. Scans the byte array for two continous 0 bytes and returns the string before.

        #61881: there seem to be programs out there, which write the 0-termination also at the beginning of the string. Check if the next two bytes contain a valid ascii char and correct the _recdata with a '?' char

        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the max. length of the final string
        Returns:
        the converted string, never null.
        Throws:
        ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)