Class StringUtils


  • public class StringUtils
    extends Object
    As of OpenJDK / Oracle JDK 8, the JVM is optimized around String charset variable instead of Charset passing, that is exploited in toUtf8(String) and fromUtf8(byte[]).
    • Field Detail

      • EMPTY_BYTES

        public static final byte[] EMPTY_BYTES
      • UTF8_STRING

        public static final String UTF8_STRING
    • Constructor Detail

      • StringUtils

        public StringUtils()
    • Method Detail

      • estimatedBinaryLengthAsUTF8

        public static int estimatedBinaryLengthAsUTF8​(String value)
      • toUtf8WithNullToEmpty

        public static byte[] toUtf8WithNullToEmpty​(String string)
      • fromUtf8

        public static String fromUtf8​(byte[] bytes)
      • fromUtf8

        public static String fromUtf8​(byte[] bytes,
                                      int offset,
                                      int length)
      • fromUtf8

        public static String fromUtf8​(ByteBuffer buffer,
                                      int numBytes)
        Decodes a UTF-8 String from numBytes bytes starting at the current position of a buffer. Advances the position of the buffer by numBytes.
      • fromUtf8

        public static String fromUtf8​(it.unimi.dsi.fastutil.bytes.ByteArrayList buffer)
      • toUtf8

        public static byte[] toUtf8​(String string)
        Converts a string to a UTF-8 byte array.
        Throws:
        NullPointerException - if "string" is null
      • toUtf8ByteBuffer

        @Nullable
        public static ByteBuffer toUtf8ByteBuffer​(@Nullable
                                                  String string)
        Converts a string to UTF-8 bytes, returning them as a newly-allocated on-heap ByteBuffer. If "string" is null, returns null.
      • toUtf8WithLimit

        public static int toUtf8WithLimit​(String string,
                                          ByteBuffer byteBuffer)
        Encodes "string" into the buffer "byteBuffer", using no more than the number of bytes remaining in the buffer. Will only encode whole characters. The byteBuffer's position and limit may be changed during operation, but will be reset before this method call ends.
        Returns:
        the number of bytes written, which may be shorter than the full encoded string length if there is not enough room in the output buffer.
      • format

        public static String format​(String message,
                                    Object... formatArgs)
        Equivalent of String.format(Locale.ENGLISH, message, formatArgs).
      • nonStrictFormat

        public static String nonStrictFormat​(String message,
                                             Object... formatArgs)
        Formats the string as format(String, Object...), but instead of failing on illegal format, returns the concatenated format string and format arguments. Should be used for unimportant formatting like logging, exception messages, typically not directly.
      • encodeForFormat

        @Nullable
        public static String encodeForFormat​(@Nullable
                                             String s)
        Encodes a string "s" for insertion into a format string. Returns null if the input is null.
      • toLowerCase

        public static String toLowerCase​(String s)
      • toUpperCase

        public static String toUpperCase​(String s)
      • urlEncode

        @Nullable
        public static String urlEncode​(@Nullable
                                       String s)
        Encodes a String in application/x-www-form-urlencoded format, with one exception: "+" in the encoded form is replaced with "%20". application/x-www-form-urlencoded encodes spaces as "+", but we use this to encode non-form data as well.
        Parameters:
        s - String to be encoded
        Returns:
        application/x-www-form-urlencoded format encoded String, but with "+" replaced with "%20".
      • maybeRemoveLeadingSlash

        public static String maybeRemoveLeadingSlash​(String s)
      • maybeRemoveTrailingSlash

        public static String maybeRemoveTrailingSlash​(String s)
      • removeChar

        public static String removeChar​(String s,
                                        char c)
        Removes all occurrences of the given char from the given string. This method is an optimal version of s.replace("c", "").
      • replaceChar

        public static String replaceChar​(String s,
                                         char c,
                                         String replacement)
        Replaces all occurrences of the given char in the given string with the given replacement string. This method is an optimal version of s.replace("c", replacement).
      • replace

        public static String replace​(String s,
                                     String target,
                                     String replacement)
        Replaces all occurrences of the given target substring in the given string with the given replacement string. This method is an optimal version of s.replace(target, replacement).
      • nullToEmptyNonDruidDataString

        public static String nullToEmptyNonDruidDataString​(@Nullable
                                                           String string)
        Returns the given string if it is non-null; the empty string otherwise. This method should only be used at places where null to empty conversion is irrelevant to null handling of the data.
        Parameters:
        string - the string to test and possibly return
        Returns:
        string itself if it is non-null; "" if it is null
      • emptyToNullNonDruidDataString

        @Nullable
        public static String emptyToNullNonDruidDataString​(@Nullable
                                                           String string)
        Returns the given string if it is nonempty; null otherwise. This method should only be used at places where null to empty conversion is irrelevant to null handling of the data.
        Parameters:
        string - the string to test and possibly return
        Returns:
        string itself if it is nonempty; null if it is empty or null
      • utf8Base64

        public static String utf8Base64​(String input)
        Convert an input to base 64 and return the utf8 string of that byte array
        Parameters:
        input - The string to convert to base64
        Returns:
        the base64 of the input in string form
      • encodeBase64

        public static byte[] encodeBase64​(byte[] input)
        Convert an input byte array into a newly-allocated byte array using the Base64 encoding scheme
        Parameters:
        input - The byte array to convert to base64
        Returns:
        the base64 of the input in byte array form
      • encodeBase64String

        public static String encodeBase64String​(byte[] input)
        Convert an input byte array into a string using the Base64 encoding scheme
        Parameters:
        input - The byte array to convert to base64
        Returns:
        the base64 of the input in string form
      • decodeBase64

        public static byte[] decodeBase64​(byte[] input)
        Decode an input byte array using the Base64 encoding scheme and return a newly-allocated byte array
        Parameters:
        input - The byte array to decode from base64
        Returns:
        a newly-allocated byte array
      • decodeBase64String

        public static byte[] decodeBase64String​(String input)
        Decode an input string using the Base64 encoding scheme and return a newly-allocated byte array
        Parameters:
        input - The string to decode from base64
        Returns:
        a newly-allocated byte array
      • repeat

        public static String repeat​(String s,
                                    int count)
        Returns a string whose value is the concatenation of the string s repeated count times.

        If count or length is zero then the empty string is returned.

        This method may be used to create space padding for formatting text or zero padding for formatting numbers.

        Parameters:
        count - number of times to repeat
        Returns:
        A string composed of this string repeated count times or the empty string if count or length is zero.
        Throws:
        IllegalArgumentException - if the count is negative.
      • lpad

        @Nonnull
        public static String lpad​(@Nonnull
                                  String base,
                                  int len,
                                  @Nonnull
                                  String pad)
        Returns the string left-padded with the string pad to a length of len characters. If str is longer than len, the return value is shortened to len characters. This function is migrated from flink's scala function with minor refactor https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/functions/ScalarFunctions.scala - Modified to handle empty pad string. - Padding of negative length return an empty string.
        Parameters:
        base - The base string to be padded
        len - The length of padded string
        pad - The pad string
        Returns:
        the string left-padded with pad to a length of len or null if the pad is empty or the len is less than 0.
      • rpad

        @Nonnull
        public static String rpad​(@Nonnull
                                  String base,
                                  int len,
                                  @Nonnull
                                  String pad)
        Returns the string right-padded with the string pad to a length of len characters. If str is longer than len, the return value is shortened to len characters. This function is migrated from flink's scala function with minor refactor https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/functions/ScalarFunctions.scala - Modified to handle empty pad string. - Modified to only copy the pad string if needed (this implementation mimics lpad). - Padding of negative length return an empty string.
        Parameters:
        base - The base string to be padded
        len - The length of padded string
        pad - The pad string
        Returns:
        the string right-padded with pad to a length of len or null if the pad is empty or the len is less than 0.
      • chop

        @Nullable
        public static String chop​(@Nullable
                                  String s,
                                  int maxBytes)
        Returns the string truncated to maxBytes. If given string input is shorter than maxBytes, then it remains the same.
        Parameters:
        s - The input string to possibly be truncated
        maxBytes - The max bytes that string input will be truncated to
        Returns:
        the string after truncated to maxBytes
      • fastLooseChop

        @Nullable
        public static String fastLooseChop​(@Nullable
                                           String s,
                                           int maxBytes)
        Shorten "s" to "maxBytes" chars. Fast and loose because these are *chars* not *bytes*. Use chop(String, int) for slower, but accurate chopping.