Class StringUtils
- java.lang.Object
-
- org.apache.druid.java.util.common.StringUtils
-
public class StringUtils extends Object
As of OpenJDK / Oracle JDK 8, the JVM is optimized around String charset variable instead of Charset passing, that is exploited intoUtf8(String)
andfromUtf8(byte[])
.
-
-
Field Summary
Fields Modifier and Type Field Description static byte[]
EMPTY_BYTES
static Charset
UTF8_CHARSET
Deprecated.static String
UTF8_STRING
-
Constructor Summary
Constructors Constructor Description StringUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String
chop(String s, int maxBytes)
Returns the string truncated to maxBytes.static int
compareUnicode(String a, String b)
Compares two Java Strings in Unicode code-point order.static int
compareUtf8(byte[] a, byte[] b)
Compares two UTF-8 byte strings in Unicode code-point order.static int
compareUtf8UsingJavaStringOrdering(byte[] a, byte[] b)
Compares two UTF-8 byte strings in UTF-16 code-unit order.static int
compareUtf8UsingJavaStringOrdering(byte byte1, byte byte2)
Compares two bytes from UTF-8 strings in such a way that the entire byte arrays are compared in UTF-16 code-unit order.static int
compareUtf8UsingJavaStringOrdering(ByteBuffer buf1, int position1, int length1, ByteBuffer buf2, int position2, int length2)
Compares two UTF-8 byte strings in UTF-16 code-unit order.static byte[]
decodeBase64(byte[] input)
Decode an input byte array using theBase64
encoding scheme and return a newly-allocated byte arraystatic byte[]
decodeBase64String(String input)
Decode an input string using theBase64
encoding scheme and return a newly-allocated byte arraystatic String
emptyToNullNonDruidDataString(String string)
Returns the given string if it is nonempty;null
otherwise.static byte[]
encodeBase64(byte[] input)
Convert an input byte array into a newly-allocated byte array using theBase64
encoding schemestatic String
encodeBase64String(byte[] input)
Convert an input byte array into a string using theBase64
encoding schemestatic String
encodeForFormat(String s)
Encodes a string "s" for insertion into a format string.static int
estimatedBinaryLengthAsUTF8(String value)
static String
fastLooseChop(String s, int maxBytes)
Shorten "s" to "maxBytes" chars.static String
format(String message, Object... formatArgs)
Equivalent of String.format(Locale.ENGLISH, message, formatArgs).static String
fromUtf8(byte[] bytes)
static String
fromUtf8(byte[] bytes, int offset, int length)
static String
fromUtf8(it.unimi.dsi.fastutil.bytes.ByteArrayList buffer)
static String
fromUtf8(ByteBuffer buffer)
Decodes a UTF-8 string from the remaining bytes of a non-null buffer.static String
fromUtf8(ByteBuffer buffer, int numBytes)
Decodes a UTF-8 String fromnumBytes
bytes starting at the current position of a buffer.static String
fromUtf8Nullable(ByteBuffer buffer)
If buffer is Decodes a UTF-8 string from the remaining bytes of a buffer.static String
getResource(Object ref, String resource)
static String
lpad(String base, int len, String pad)
Returns the string left-padded with the string pad to a length of len characters.static String
maybeRemoveLeadingSlash(String s)
static String
maybeRemoveTrailingSlash(String s)
static String
nonStrictFormat(String message, Object... formatArgs)
Formats the string asformat(String, Object...)
, but instead of failing on illegal format, returns the concatenated format string and format arguments.static String
nullToEmptyNonDruidDataString(String string)
Returns the given string if it is non-null; the empty string otherwise.static String
removeChar(String s, char c)
Removes all occurrences of the given char from the given string.static String
repeat(String s, int count)
Returns a string whose value is the concatenation of the strings
repeatedcount
times.static String
replace(String s, String target, String replacement)
Replaces all occurrences of the given target substring in the given string with the given replacement string.static String
replaceChar(String s, char c, String replacement)
Replaces all occurrences of the given char in the given string with the given replacement string.static String
rpad(String base, int len, String pad)
Returns the string right-padded with the string pad to a length of len characters.static String
toLowerCase(String s)
static String
toUpperCase(String s)
static byte[]
toUtf8(String string)
Converts a string to a UTF-8 byte array.static ByteBuffer
toUtf8ByteBuffer(String string)
Converts a string to UTF-8 bytes, returning them as a newly-allocated on-heapByteBuffer
.static byte[]
toUtf8Nullable(String string)
static int
toUtf8WithLimit(String string, ByteBuffer byteBuffer)
Encodes "string" into the buffer "byteBuffer", using no more than the number of bytes remaining in the buffer.static byte[]
toUtf8WithNullToEmpty(String string)
static String
urlDecode(String s)
static String
urlEncode(String s)
Encodes a String in application/x-www-form-urlencoded format, with one exception: "+" in the encoded form is replaced with "%20".static String
utf8Base64(String input)
Convert an input to base 64 and return the utf8 string of that byte array
-
-
-
Field Detail
-
EMPTY_BYTES
public static final byte[] EMPTY_BYTES
-
UTF8_CHARSET
@Deprecated public static final Charset UTF8_CHARSET
Deprecated.
-
UTF8_STRING
public static final String UTF8_STRING
-
-
Method Detail
-
estimatedBinaryLengthAsUTF8
public static int estimatedBinaryLengthAsUTF8(String value)
-
toUtf8WithNullToEmpty
public static byte[] toUtf8WithNullToEmpty(String string)
-
compareUnicode
public static int compareUnicode(String a, String b)
Compares two Java Strings in Unicode code-point order. Order is consistent withcompareUtf8(byte[], byte[])
, but is not consistent withString.compareTo(String)
.
-
compareUtf8
public static int compareUtf8(byte[] a, byte[] b)
Compares two UTF-8 byte strings in Unicode code-point order. Equivalent to a comparison of the two byte arrays as if they were unsigned bytes. Order is consistent withcompareUnicode(String, String)
, but is not consistent withString.compareTo(String)
. For an ordering consistent withString.compareTo(String)
, usecompareUtf8UsingJavaStringOrdering(byte[], byte[])
instead.
-
compareUtf8UsingJavaStringOrdering
public static int compareUtf8UsingJavaStringOrdering(byte[] a, byte[] b)
Compares two UTF-8 byte strings in UTF-16 code-unit order. Order is consistent withString.compareTo(String)
, but is not consistent withcompareUnicode(String, String)
orcompareUtf8(byte[], byte[])
.
-
compareUtf8UsingJavaStringOrdering
public static int compareUtf8UsingJavaStringOrdering(ByteBuffer buf1, int position1, int length1, ByteBuffer buf2, int position2, int length2)
Compares two UTF-8 byte strings in UTF-16 code-unit order. Order is consistent withString.compareTo(String)
, but is not consistent withcompareUnicode(String, String)
orcompareUtf8(byte[], byte[])
.
-
compareUtf8UsingJavaStringOrdering
public static int compareUtf8UsingJavaStringOrdering(byte byte1, byte byte2)
Compares two bytes from UTF-8 strings in such a way that the entire byte arrays are compared in UTF-16 code-unit order. Compatible withcompareUtf8UsingJavaStringOrdering(byte[], byte[])
andcompareUtf8UsingJavaStringOrdering(ByteBuffer, int, int, ByteBuffer, int, int)
.
-
fromUtf8
public static String fromUtf8(byte[] bytes)
-
fromUtf8
public static String fromUtf8(byte[] bytes, int offset, int length)
-
fromUtf8
public static String fromUtf8(ByteBuffer buffer, int numBytes)
Decodes a UTF-8 String fromnumBytes
bytes starting at the current position of a buffer. Advances the position of the buffer bynumBytes
.
-
fromUtf8
public static String fromUtf8(ByteBuffer buffer)
Decodes a UTF-8 string from the remaining bytes of a non-null buffer. Advances the position of the buffer byBuffer.remaining()
. UsefromUtf8Nullable(ByteBuffer)
if the buffer might be null.
-
fromUtf8
public static String fromUtf8(it.unimi.dsi.fastutil.bytes.ByteArrayList buffer)
-
fromUtf8Nullable
@Nullable public static String fromUtf8Nullable(@Nullable ByteBuffer buffer)
If buffer is Decodes a UTF-8 string from the remaining bytes of a buffer. Advances the position of the buffer byBuffer.remaining()
. If the value is null, this method returns null. If the buffer will never be null, usefromUtf8(ByteBuffer)
instead.
-
toUtf8
public static byte[] toUtf8(String string)
Converts a string to a UTF-8 byte array.- Throws:
NullPointerException
- if "string" is null
-
toUtf8ByteBuffer
@Nullable public static ByteBuffer toUtf8ByteBuffer(@Nullable String string)
Converts a string to UTF-8 bytes, returning them as a newly-allocated on-heapByteBuffer
. If "string" is null, returns null.
-
toUtf8WithLimit
public static int toUtf8WithLimit(String string, ByteBuffer byteBuffer)
Encodes "string" into the buffer "byteBuffer", using no more than the number of bytes remaining in the buffer. Will only encode whole characters. The byteBuffer's position and limit may be changed during operation, but will be reset before this method call ends.- Returns:
- the number of bytes written, which may be shorter than the full encoded string length if there is not enough room in the output buffer.
-
format
public static String format(String message, Object... formatArgs)
Equivalent of String.format(Locale.ENGLISH, message, formatArgs).
-
nonStrictFormat
public static String nonStrictFormat(String message, Object... formatArgs)
Formats the string asformat(String, Object...)
, but instead of failing on illegal format, returns the concatenated format string and format arguments. Should be used for unimportant formatting like logging, exception messages, typically not directly.
-
encodeForFormat
@Nullable public static String encodeForFormat(@Nullable String s)
Encodes a string "s" for insertion into a format string. Returns null if the input is null.
-
urlEncode
@Nullable public static String urlEncode(@Nullable String s)
Encodes a String in application/x-www-form-urlencoded format, with one exception: "+" in the encoded form is replaced with "%20". application/x-www-form-urlencoded encodes spaces as "+", but we use this to encode non-form data as well.- Parameters:
s
- String to be encoded- Returns:
- application/x-www-form-urlencoded format encoded String, but with "+" replaced with "%20".
-
removeChar
public static String removeChar(String s, char c)
Removes all occurrences of the given char from the given string. This method is an optimal version ofs.replace("c", "")
.
-
replaceChar
public static String replaceChar(String s, char c, String replacement)
Replaces all occurrences of the given char in the given string with the given replacement string. This method is an optimal version ofs.replace("c", replacement)
.
-
replace
public static String replace(String s, String target, String replacement)
Replaces all occurrences of the given target substring in the given string with the given replacement string. This method is an optimal version ofs.replace(target, replacement)
.
-
nullToEmptyNonDruidDataString
public static String nullToEmptyNonDruidDataString(@Nullable String string)
Returns the given string if it is non-null; the empty string otherwise. This method should only be used at places where null to empty conversion is irrelevant to null handling of the data.- Parameters:
string
- the string to test and possibly return- Returns:
string
itself if it is non-null;""
if it is null
-
emptyToNullNonDruidDataString
@Nullable public static String emptyToNullNonDruidDataString(@Nullable String string)
Returns the given string if it is nonempty;null
otherwise. This method should only be used at places where null to empty conversion is irrelevant to null handling of the data.- Parameters:
string
- the string to test and possibly return- Returns:
string
itself if it is nonempty;null
if it is empty or null
-
utf8Base64
public static String utf8Base64(String input)
Convert an input to base 64 and return the utf8 string of that byte array- Parameters:
input
- The string to convert to base64- Returns:
- the base64 of the input in string form
-
encodeBase64
public static byte[] encodeBase64(byte[] input)
Convert an input byte array into a newly-allocated byte array using theBase64
encoding scheme- Parameters:
input
- The byte array to convert to base64- Returns:
- the base64 of the input in byte array form
-
encodeBase64String
public static String encodeBase64String(byte[] input)
Convert an input byte array into a string using theBase64
encoding scheme- Parameters:
input
- The byte array to convert to base64- Returns:
- the base64 of the input in string form
-
decodeBase64
public static byte[] decodeBase64(byte[] input)
Decode an input byte array using theBase64
encoding scheme and return a newly-allocated byte array- Parameters:
input
- The byte array to decode from base64- Returns:
- a newly-allocated byte array
-
decodeBase64String
public static byte[] decodeBase64String(String input)
Decode an input string using theBase64
encoding scheme and return a newly-allocated byte array- Parameters:
input
- The string to decode from base64- Returns:
- a newly-allocated byte array
-
repeat
public static String repeat(String s, int count)
Returns a string whose value is the concatenation of the strings
repeatedcount
times.If count or length is zero then the empty string is returned.
This method may be used to create space padding for formatting text or zero padding for formatting numbers.
- Parameters:
count
- number of times to repeat- Returns:
- A string composed of this string repeated
count
times or the empty string if count or length is zero. - Throws:
IllegalArgumentException
- if thecount
is negative.
-
lpad
@Nonnull public static String lpad(@Nonnull String base, int len, @Nonnull String pad)
Returns the string left-padded with the string pad to a length of len characters. If str is longer than len, the return value is shortened to len characters. This function is migrated from flink's scala function with minor refactor https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/functions/ScalarFunctions.scala - Modified to handle empty pad string. - Padding of negative length return an empty string.- Parameters:
base
- The base string to be paddedlen
- The length of padded stringpad
- The pad string- Returns:
- the string left-padded with pad to a length of len or null if the pad is empty or the len is less than 0.
-
rpad
@Nonnull public static String rpad(@Nonnull String base, int len, @Nonnull String pad)
Returns the string right-padded with the string pad to a length of len characters. If str is longer than len, the return value is shortened to len characters. This function is migrated from flink's scala function with minor refactor https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/functions/ScalarFunctions.scala - Modified to handle empty pad string. - Modified to only copy the pad string if needed (this implementation mimics lpad). - Padding of negative length return an empty string.- Parameters:
base
- The base string to be paddedlen
- The length of padded stringpad
- The pad string- Returns:
- the string right-padded with pad to a length of len or null if the pad is empty or the len is less than 0.
-
chop
@Nullable public static String chop(@Nullable String s, int maxBytes)
Returns the string truncated to maxBytes. If given string input is shorter than maxBytes, then it remains the same.- Parameters:
s
- The input string to possibly be truncatedmaxBytes
- The max bytes that string input will be truncated to- Returns:
- the string after truncated to maxBytes
-
fastLooseChop
@Nullable public static String fastLooseChop(@Nullable String s, int maxBytes)
Shorten "s" to "maxBytes" chars. Fast and loose because these are *chars* not *bytes*. Usechop(String, int)
for slower, but accurate chopping.
-
-