Package org.apache.druid.segment
Class StringDimensionIndexer
- java.lang.Object
-
- org.apache.druid.segment.DictionaryEncodedColumnIndexer<int[],String>
-
- org.apache.druid.segment.StringDimensionIndexer
-
- All Implemented Interfaces:
DimensionIndexer<Integer,int[],String>
public class StringDimensionIndexer extends DictionaryEncodedColumnIndexer<int[],String>
-
-
Field Summary
-
Fields inherited from class org.apache.druid.segment.DictionaryEncodedColumnIndexer
dimLookup, isSparse, sortedLookup
-
-
Constructor Summary
Constructors Constructor Description StringDimensionIndexer(DimensionSchema.MultiValueHandling multiValueHandling, boolean hasBitmapIndexes, boolean hasSpatialIndexes, boolean useMaxMemoryEstimates)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
checkUnsortedEncodedKeyComponentsEqual(int[] lhs, int[] rhs)
Check if two row value arrays from Row keys are equal.int
compareUnsortedEncodedKeyComponents(int[] lhs, int[] rhs)
Compares the row values for this DimensionIndexer's dimension from a Row key.Object
convertUnsortedEncodedKeyComponentToActualList(int[] key)
Given a row value array from a Row key, as described in the documentation forDimensionIndexer.compareUnsortedEncodedKeyComponents(EncodedKeyComponentType, EncodedKeyComponentType)
, convert the unsorted encoded values to a list of actual values.long
estimateEncodedKeyComponentSize(int[] keys)
Estimates size of the given key component.void
fillBitmapsFromUnsortedEncodedKeyComponent(int[] key, int rowNum, MutableBitmap[] bitmapIndexes, BitmapFactory factory)
Helper function for building bitmap indexes for integer-encoded dimensions.ColumnCapabilities
getColumnCapabilities()
int
getUnsortedEncodedKeyComponentHashCode(int[] key)
Given a row value array from a Row key, generate a hashcode.DimensionSelector
makeDimensionSelector(DimensionSpec spec, IncrementalIndexRowHolder currEntry, IncrementalIndex.DimensionDesc desc)
Return an object used to read values from this indexer's column as Strings.EncodedKeyComponent<int[]>
processRowValsToUnsortedEncodedKeyComponent(Object dimValues, boolean reportParseExceptions)
Encodes the given row value(s) of the dimension to be used within a row key.-
Methods inherited from class org.apache.druid.segment.DictionaryEncodedColumnIndexer
convertUnsortedValuesToSorted, dictionaryEncodesAllValues, getActualValue, getCardinality, getEncodedValue, getMaxValue, getMinValue, getSortedEncodedValueFromUnsorted, getSortedIndexedValues, getUnsortedEncodedValueFromSorted, makeColumnValueSelector, setSparseIndexed, sortedLookup
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.segment.DimensionIndexer
getFormat
-
-
-
-
Constructor Detail
-
StringDimensionIndexer
public StringDimensionIndexer(DimensionSchema.MultiValueHandling multiValueHandling, boolean hasBitmapIndexes, boolean hasSpatialIndexes, boolean useMaxMemoryEstimates)
-
-
Method Detail
-
processRowValsToUnsortedEncodedKeyComponent
public EncodedKeyComponent<int[]> processRowValsToUnsortedEncodedKeyComponent(@Nullable Object dimValues, boolean reportParseExceptions)
Description copied from interface:DimensionIndexer
Encodes the given row value(s) of the dimension to be used within a row key. It also updates the internal state of the DimensionIndexer, e.g. the dimLookup.For example, the dictionary-encoded String-type column will return an int[] containing dictionary IDs.
- Parameters:
dimValues
- Value(s) of the dimension in a row. This can either be a single value or a list of values (for multi-valued dimensions)reportParseExceptions
- true if parse exceptions should be reported, false otherwise- Returns:
- Encoded dimension value(s) to be used as a component for the row key.
Contains an object of the
DimensionIndexer
and the effective size of the key component in bytes.
-
estimateEncodedKeyComponentSize
public long estimateEncodedKeyComponentSize(int[] keys)
Estimates size of the given key component.Deprecated method. Use
processRowValsToUnsortedEncodedKeyComponent(Object, boolean)
andEncodedKeyComponent.getEffectiveSizeBytes()
.
-
compareUnsortedEncodedKeyComponents
public int compareUnsortedEncodedKeyComponents(int[] lhs, int[] rhs)
Description copied from interface:DimensionIndexer
Compares the row values for this DimensionIndexer's dimension from a Row key. The dimension value arrays within a Row key always use the "unsorted" ordering for encoded values. The row values are passed to this function as an Object, the implementer should cast them to the type appropriate for this dimension. For example, a dictionary encoded String implementation would cast the Objects as int[] arrays. When comparing, if the two arrays have different lengths, the shorter array should be ordered first. Otherwise, the implementer of this function should iterate through the unsorted encoded values, converting them to their actual type (e.g., performing a dictionary lookup for a dict-encoded String dimension), and comparing the actual values until a difference is found. Refer to StringDimensionIndexer.compareUnsortedEncodedKeyComponents() for a reference implementation. The comparison rules used by this method should match the rules used byDimensionHandler.getEncodedValueSelectorComparator()
, otherwise incorrect ordering/merging of rows can occur during ingestion, causing issues such as imperfect rollup.- Parameters:
lhs
- dimension value array from a Row keyrhs
- dimension value array from a Row key- Returns:
- comparison of the two arrays
-
checkUnsortedEncodedKeyComponentsEqual
public boolean checkUnsortedEncodedKeyComponentsEqual(int[] lhs, int[] rhs)
Description copied from interface:DimensionIndexer
Check if two row value arrays from Row keys are equal.- Parameters:
lhs
- dimension value array from a Row keyrhs
- dimension value array from a Row key- Returns:
- true if the two arrays are equal
-
getUnsortedEncodedKeyComponentHashCode
public int getUnsortedEncodedKeyComponentHashCode(int[] key)
Description copied from interface:DimensionIndexer
Given a row value array from a Row key, generate a hashcode.- Parameters:
key
- dimension value array from a Row key- Returns:
- hashcode of the array
-
getColumnCapabilities
public ColumnCapabilities getColumnCapabilities()
-
makeDimensionSelector
public DimensionSelector makeDimensionSelector(DimensionSpec spec, IncrementalIndexRowHolder currEntry, IncrementalIndex.DimensionDesc desc)
Description copied from interface:DimensionIndexer
Return an object used to read values from this indexer's column as Strings.- Parameters:
spec
- Specifies the output name of a dimension and any extraction functions to be applied.currEntry
- Provides access to the current Row object in the Cursordesc
- Descriptor object for this dimension within an IncrementalIndex- Returns:
- A new object that reads rows from currEntry
-
convertUnsortedEncodedKeyComponentToActualList
@Nullable public Object convertUnsortedEncodedKeyComponentToActualList(int[] key)
Description copied from interface:DimensionIndexer
Given a row value array from a Row key, as described in the documentation forDimensionIndexer.compareUnsortedEncodedKeyComponents(EncodedKeyComponentType, EncodedKeyComponentType)
, convert the unsorted encoded values to a list of actual values. If the key has one element, this method should return a single Object instead of a list.- Parameters:
key
- dimension value array from a Row key- Returns:
- single value or list containing the actual values corresponding to the encoded values in the input array
-
fillBitmapsFromUnsortedEncodedKeyComponent
public void fillBitmapsFromUnsortedEncodedKeyComponent(int[] key, int rowNum, MutableBitmap[] bitmapIndexes, BitmapFactory factory)
Description copied from interface:DimensionIndexer
Helper function for building bitmap indexes for integer-encoded dimensions. Called by IncrementalIndexAdapter as it iterates through its sequence of rows. Given a row value array from a Row key, with the current row number indicated by "rowNum", set the index for "rowNum" in the bitmap index for each value that appears in the row value array. For example, if key is an int[] array with values [1,3,4] for a dictionary-encoded String dimension, and rowNum is 27, this function would set bit 27 in bitmapIndexes[1], bitmapIndexes[3], and bitmapIndexes[4] See StringDimensionIndexer.fillBitmapsFromUnsortedEncodedKeyComponent() for a reference implementation. If a dimension type does not support bitmap indexes, this function will not be called and can be left unimplemented.- Parameters:
key
- dimension value array from a Row keyrowNum
- current row numberbitmapIndexes
- array of bitmaps, indexed by integer dimension valuefactory
- bitmap factory
-
-