Package org.apache.druid.segment
Class DictionaryEncodedColumnIndexer<KeyType,ActualType extends Comparable<ActualType>>
- java.lang.Object
-
- org.apache.druid.segment.DictionaryEncodedColumnIndexer<KeyType,ActualType>
-
- All Implemented Interfaces:
DimensionIndexer<Integer,KeyType,ActualType>
- Direct Known Subclasses:
StringDimensionIndexer
public abstract class DictionaryEncodedColumnIndexer<KeyType,ActualType extends Comparable<ActualType>> extends Object implements DimensionIndexer<Integer,KeyType,ActualType>
Basic structure for indexing dictionary encoded columns
-
-
Field Summary
Fields Modifier and Type Field Description protected DimensionDictionary<ActualType>
dimLookup
protected boolean
isSparse
protected SortedDimensionDictionary<ActualType>
sortedLookup
-
Constructor Summary
Constructors Constructor Description DictionaryEncodedColumnIndexer(@NotNull DimensionDictionary<ActualType> dimLookup)
Creates a new DictionaryEncodedColumnIndexer.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ColumnValueSelector
convertUnsortedValuesToSorted(ColumnValueSelector selectorWithUnsortedValues)
Converts dictionary-encoded row values from unspecified (random) encoding order, to sorted encoding.protected boolean
dictionaryEncodesAllValues()
returns true if all values are encoded indimLookup
protected ActualType
getActualValue(int intermediateValue, boolean idSorted)
int
getCardinality()
Get the cardinality of this dimension's values.protected int
getEncodedValue(ActualType fullValue, boolean idSorted)
ActualType
getMaxValue()
Get the maximum dimension value seen by this indexer.ActualType
getMinValue()
Get the minimum dimension value seen by this indexer.int
getSortedEncodedValueFromUnsorted(Integer unsortedIntermediateValue)
CloseableIndexed<ActualType>
getSortedIndexedValues()
Returns an indexed structure of this dimension's sorted actual values.Integer
getUnsortedEncodedValueFromSorted(Integer sortedIntermediateValue)
Given an encoded value that was ordered by associated actual value, return the equivalent encoded value ordered by time of ingestion.ColumnValueSelector<?>
makeColumnValueSelector(IncrementalIndexRowHolder currEntry, IncrementalIndex.DimensionDesc desc)
Return an object used to read values from this indexer's column.void
setSparseIndexed()
This method will be called while building anIncrementalIndex
whenever a known dimension column (either through an explicit schema on the ingestion spec, or auto-discovered while processing rows) is absent in any row that is processed, to allow an indexer to account for any missing rows if necessary.protected SortedDimensionDictionary<ActualType>
sortedLookup()
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.segment.DimensionIndexer
checkUnsortedEncodedKeyComponentsEqual, compareUnsortedEncodedKeyComponents, convertUnsortedEncodedKeyComponentToActualList, fillBitmapsFromUnsortedEncodedKeyComponent, getColumnCapabilities, getFormat, getUnsortedEncodedKeyComponentHashCode, makeDimensionSelector, processRowValsToUnsortedEncodedKeyComponent
-
-
-
-
Field Detail
-
dimLookup
protected final DimensionDictionary<ActualType extends Comparable<ActualType>> dimLookup
-
isSparse
protected volatile boolean isSparse
-
sortedLookup
@Nullable protected SortedDimensionDictionary<ActualType extends Comparable<ActualType>> sortedLookup
-
-
Constructor Detail
-
DictionaryEncodedColumnIndexer
public DictionaryEncodedColumnIndexer(@NotNull @NotNull DimensionDictionary<ActualType> dimLookup)
Creates a new DictionaryEncodedColumnIndexer.- Parameters:
dimLookup
- Dimension Dictionary to lookup dimension values.
-
-
Method Detail
-
setSparseIndexed
public void setSparseIndexed()
Description copied from interface:DimensionIndexer
This method will be called while building anIncrementalIndex
whenever a known dimension column (either through an explicit schema on the ingestion spec, or auto-discovered while processing rows) is absent in any row that is processed, to allow an indexer to account for any missing rows if necessary. Useful so that a stringDimensionSelector
built on top of anIncrementalIndex
may accurately reportDimensionDictionarySelector.nameLookupPossibleInAdvance()
by allowing it to track if it has any implicit null valued rows. At index persist/merge time all missing columns for a row will be explicitly replaced with the value appropriate null or default value.- Specified by:
setSparseIndexed
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
-
getSortedEncodedValueFromUnsorted
public int getSortedEncodedValueFromUnsorted(Integer unsortedIntermediateValue)
-
getUnsortedEncodedValueFromSorted
public Integer getUnsortedEncodedValueFromSorted(Integer sortedIntermediateValue)
Description copied from interface:DimensionIndexer
Given an encoded value that was ordered by associated actual value, return the equivalent encoded value ordered by time of ingestion. Using the example in the class description: getUnsortedEncodedValueFromSorted(2) would return 0- Specified by:
getUnsortedEncodedValueFromSorted
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
- Parameters:
sortedIntermediateValue
- value to convert- Returns:
- converted value
-
getSortedIndexedValues
public CloseableIndexed<ActualType> getSortedIndexedValues()
Description copied from interface:DimensionIndexer
Returns an indexed structure of this dimension's sorted actual values. The integer IDs represent the ordering of the sorted values. Using the example in the class description: "Apple"=0, "Hello"=1, "World"=2- Specified by:
getSortedIndexedValues
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
- Returns:
- Sorted index of actual values
-
getMinValue
public ActualType getMinValue()
Description copied from interface:DimensionIndexer
Get the minimum dimension value seen by this indexer. NOTE: On an in-memory segment (IncrementalIndex), we can determine min/max values by looking at the stream of row values seen in calls to processSingleRowValToIndexKey(). However, on a disk-backed segment (QueryableIndex), the numeric dimensions do not currently have any supporting index structures that can be used to efficiently determine min/max values. When numeric dimension support is added, the segment format should be changed to store min/max values, to avoid performing a full-column scan to determine these values for numeric dims.- Specified by:
getMinValue
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
- Returns:
- min value
-
getMaxValue
public ActualType getMaxValue()
Description copied from interface:DimensionIndexer
Get the maximum dimension value seen by this indexer.- Specified by:
getMaxValue
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
- Returns:
- max value
-
getCardinality
public int getCardinality()
Description copied from interface:DimensionIndexer
Get the cardinality of this dimension's values.- Specified by:
getCardinality
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
- Returns:
- value cardinality
-
makeColumnValueSelector
public ColumnValueSelector<?> makeColumnValueSelector(IncrementalIndexRowHolder currEntry, IncrementalIndex.DimensionDesc desc)
Description copied from interface:DimensionIndexer
Return an object used to read values from this indexer's column.- Specified by:
makeColumnValueSelector
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
- Parameters:
currEntry
- Provides access to the current Row object in the Cursordesc
- Descriptor object for this dimension within an IncrementalIndex- Returns:
- A new object that reads rows from currEntry
-
convertUnsortedValuesToSorted
public ColumnValueSelector convertUnsortedValuesToSorted(ColumnValueSelector selectorWithUnsortedValues)
Description copied from interface:DimensionIndexer
Converts dictionary-encoded row values from unspecified (random) encoding order, to sorted encoding. This step is needed to be able to correctly map per-segment encoded values to global values on the next conversion step,DimensionMerger.convertSortedSegmentRowValuesToMergedRowValues(int, org.apache.druid.segment.ColumnValueSelector)
. The latter method requires sorted encoding values on the input, becauseDimensionMerger.writeMergedValueDictionary(java.util.List<org.apache.druid.segment.IndexableAdapter>)
takes sorted lookups as it's input. For columns which do not use theDimensionMerger
to merge dictionary encoded values, this method should provide a selector which is compatible with the expectations ofDimensionMerger.processMergedRow(ColumnValueSelector)
, which might simply be to pass-through the 'unsorted' selector.- Specified by:
convertUnsortedValuesToSorted
in interfaceDimensionIndexer<Integer,KeyType,ActualType extends Comparable<ActualType>>
-
dictionaryEncodesAllValues
protected boolean dictionaryEncodesAllValues()
returns true if all values are encoded indimLookup
-
sortedLookup
protected SortedDimensionDictionary<ActualType> sortedLookup()
-
getActualValue
@Nullable protected ActualType getActualValue(int intermediateValue, boolean idSorted)
-
getEncodedValue
protected int getEncodedValue(@Nullable ActualType fullValue, boolean idSorted)
-
-