Package org.apache.druid.segment
Class DictionaryEncodedColumnMerger<T extends Comparable<T>>
- java.lang.Object
-
- org.apache.druid.segment.DictionaryEncodedColumnMerger<T>
-
- All Implemented Interfaces:
DimensionMerger,DimensionMergerV9
- Direct Known Subclasses:
StringDimensionMergerV9
public abstract class DictionaryEncodedColumnMerger<T extends Comparable<T>> extends Object implements DimensionMergerV9
Base structure for merging dictionary encoded columns
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classDictionaryEncodedColumnMerger.ConvertingBitmapValuesprotected static interfaceDictionaryEncodedColumnMerger.ExtendedIndexesMergerSpecifies any additional per value indexes which should be constructed whenwriteIndexes(List)is called, on top of the standard bitmap index created withmergeBitmaps(java.util.List<java.nio.IntBuffer>, org.apache.druid.collections.bitmap.BitmapFactory, org.apache.druid.segment.DictionaryEncodedColumnMerger.IndexSeeker[], int)protected static classDictionaryEncodedColumnMerger.IdConversionSerializerprotected static interfaceDictionaryEncodedColumnMerger.IndexSeekerprotected static classDictionaryEncodedColumnMerger.IndexSeekerWithConversionGet old dictId from new dictId, and only support access in orderprotected static classDictionaryEncodedColumnMerger.IndexSeekerWithoutConversionprotected static classDictionaryEncodedColumnMerger.PersistedIdConversionPeristent dictionary id conversion mappings, artifacts created during segment merge which map old dictionary ids to new dictionary ids.protected static classDictionaryEncodedColumnMerger.PersistedIdConversionsCloser ofDictionaryEncodedColumnMerger.PersistedIdConversionand a parent path which they are stored in for easy cleanup when the segment is closed.
-
Field Summary
-
Constructor Summary
Constructors Constructor Description DictionaryEncodedColumnMerger(String dimensionName, String outputName, IndexSpec indexSpec, SegmentWriteOutMedium segmentWriteOutMedium, ColumnCapabilities capabilities, ProgressIndicator progress, File segmentBaseDir, Closer closer)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract TcoerceValue(T value)ColumnValueSelectorconvertSortedSegmentRowValuesToMergedRowValues(int segmentIndex, ColumnValueSelector source)Creates a value selector, which converts values with per-segment, _sorted order_ (seeDimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)) encoding from the given selector to their equivalent representation in the merged set of rows.protected abstract Comparator<Pair<Integer,com.google.common.collect.PeekingIterator<T>>>getDictionaryMergingComparator()protected DictionaryEncodedColumnMerger.ExtendedIndexesMergergetExtendedIndexesMerger()protected abstract Indexed<T>getNullDimValue()protected abstract ObjectStrategy<T>getObjectStrategy()booleanhasOnlyNulls()Returns true if this dimension has no data besides nulls.protected DictionaryWriter<T>makeDictionaryWriter(String fileName)voidmarkAsParent()Sets this merger as the "parent" of another merger for a "projection", allowing for this merger to preserve any state which might be required for the projection mergers to do their thing.protected MutableBitmapmergeBitmaps(List<IntBuffer> segmentRowNumConversions, BitmapFactory bmpFactory, DictionaryEncodedColumnMerger.IndexSeeker[] dictIdSeeker, int dictId)voidprocessMergedRow(ColumnValueSelector selector)Process a column value(s) (potentially multi-value) of a row from the given selector and update the DimensionMerger's internal state.protected voidsetupEncodedValueWriter()protected DictionaryEncodedColumnMerger.IndexSeeker[]toIndexSeekers(List<IndexableAdapter> adapters, ArrayList<IntBuffer> dimConversions, String dimension)protected voidwriteDictionary(Iterable<T> dictionaryValues)voidwriteIndexes(List<IntBuffer> segmentRowNumConversions)Internally construct any index structures relevant to this DimensionMerger.voidwriteMergedValueDictionary(List<IndexableAdapter> adapters)Given a list of segment adapters: - Read _sorted order_ (e.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.segment.DimensionMergerV9
attachParent, makeColumnDescriptor
-
-
-
-
Field Detail
-
dimensionName
protected final String dimensionName
-
outputName
protected final String outputName
-
progress
protected final ProgressIndicator progress
-
closer
protected final Closer closer
-
indexSpec
protected final IndexSpec indexSpec
-
segmentWriteOutMedium
protected final SegmentWriteOutMedium segmentWriteOutMedium
-
nullRowsBitmap
protected final MutableBitmap nullRowsBitmap
-
capabilities
protected final ColumnCapabilities capabilities
-
dictionarySize
protected int dictionarySize
-
rowCount
protected int rowCount
-
cardinality
protected int cardinality
-
hasNull
protected boolean hasNull
-
writeDictionary
protected boolean writeDictionary
-
bitmapWriter
@Nullable protected GenericIndexedWriter<ImmutableBitmap> bitmapWriter
-
adapters
@Nullable protected List<IndexableAdapter> adapters
-
dictionaryMergeIterator
@Nullable protected DictionaryMergingIterator<T extends Comparable<T>> dictionaryMergeIterator
-
encodedValueSerializer
@Nullable protected ColumnarIntsSerializer encodedValueSerializer
-
dictionaryWriter
@Nullable protected DictionaryWriter<T extends Comparable<T>> dictionaryWriter
-
firstDictionaryValue
@Nullable protected T extends Comparable<T> firstDictionaryValue
-
segmentBaseDir
protected File segmentBaseDir
-
persistedIdConversions
protected @MonotonicNonNull DictionaryEncodedColumnMerger.PersistedIdConversions persistedIdConversions
This becomes non-null ifmarkAsParent()is called indicating that this column is a base table 'parent' to some projection column, which requires persisting id conversion buffers to a temporary files. If there are no projections defined (or projections which reference this column) then id conversion buffers will be freed after callingwriteIndexes(List)
-
-
Constructor Detail
-
DictionaryEncodedColumnMerger
public DictionaryEncodedColumnMerger(String dimensionName, String outputName, IndexSpec indexSpec, SegmentWriteOutMedium segmentWriteOutMedium, ColumnCapabilities capabilities, ProgressIndicator progress, File segmentBaseDir, Closer closer)
-
-
Method Detail
-
getDictionaryMergingComparator
protected abstract Comparator<Pair<Integer,com.google.common.collect.PeekingIterator<T>>> getDictionaryMergingComparator()
-
getObjectStrategy
protected abstract ObjectStrategy<T> getObjectStrategy()
-
markAsParent
public void markAsParent()
Description copied from interface:DimensionMergerV9Sets this merger as the "parent" of another merger for a "projection", allowing for this merger to preserve any state which might be required for the projection mergers to do their thing. This method MUST be called prior to performing any merge work. Typically, this method is only implemented ifDimensionMergerV9.attachParent(DimensionMergerV9, List)requires it.- Specified by:
markAsParentin interfaceDimensionMergerV9
-
writeMergedValueDictionary
public void writeMergedValueDictionary(List<IndexableAdapter> adapters) throws IOException
Description copied from interface:DimensionMergerGiven a list of segment adapters: - Read _sorted order_ (e. g. seeIncrementalIndexAdapter.getDimValueLookup(String)) dictionary encoding information from the adapters - Merge those sorted order dictionary into a one big sorted order dictionary and write this merged dictionary. The implementer should maintain knowledge of the "index number" of the adapters in the input list, i.e., the position of each adapter in the input list. This "index number" will be used to refer to specific segments later inDimensionMerger.convertSortedSegmentRowValuesToMergedRowValues(int, org.apache.druid.segment.ColumnValueSelector).- Specified by:
writeMergedValueDictionaryin interfaceDimensionMerger- Parameters:
adapters- List of adapters to be merged.- Throws:
IOException- See Also:
DimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)
-
convertSortedSegmentRowValuesToMergedRowValues
public ColumnValueSelector convertSortedSegmentRowValuesToMergedRowValues(int segmentIndex, ColumnValueSelector source)
Description copied from interface:DimensionMergerCreates a value selector, which converts values with per-segment, _sorted order_ (seeDimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)) encoding from the given selector to their equivalent representation in the merged set of rows. This method is used by the index merging process to build the merged sequence of rows. The implementing class is expected to use the merged value metadata constructed duringDimensionMerger.writeMergedValueDictionary(List), if applicable. For example, an implementation of this function for a dictionary-encoded String column would convert the segment-specific, sorted order dictionary values within the row to the common merged dictionary values determined duringDimensionMerger.writeMergedValueDictionary(List).- Specified by:
convertSortedSegmentRowValuesToMergedRowValuesin interfaceDimensionMerger- Parameters:
segmentIndex- indicates which segment the row originated from, in the order established inDimensionMerger.writeMergedValueDictionary(List)source- the selector from which to take values to convert- Returns:
- a selector with converted values
-
processMergedRow
public void processMergedRow(ColumnValueSelector selector) throws IOException
Description copied from interface:DimensionMergerProcess a column value(s) (potentially multi-value) of a row from the given selector and update the DimensionMerger's internal state. After constructing a merged sequence of rows across segments, the index merging process will iterate through these rows and on each iteration, for each column, pass the column value selector to the corresponding DimensionMerger. This allows each DimensionMerger to build its internal view of the sequence of merged rows, to be written out to a segment later.- Specified by:
processMergedRowin interfaceDimensionMerger- Throws:
IOException
-
writeIndexes
public void writeIndexes(@Nullable List<IntBuffer> segmentRowNumConversions) throws IOException
Description copied from interface:DimensionMergerInternally construct any index structures relevant to this DimensionMerger. After receiving the sequence of merged rows via iteratedDimensionMerger.processMergedRow(org.apache.druid.segment.ColumnValueSelector)calls, the DimensionMerger can now build any index structures it needs. For example, a dictionary encoded String implementation would create its bitmap indexes for the merged segment during this step. The index merger will provide a list of row number conversion IntBuffer objects. Each IntBuffer is associated with one of the segments being merged; the position of the IntBuffer in the list corresponds to the position of segment adapters within the input list ofDimensionMerger.writeMergedValueDictionary(List). For example, suppose there are two segments A and B. Row 24 from segment A maps to row 99 in the merged sequence of rows, The IntBuffer for segment A would have a mapping of 24 -> 99.- Specified by:
writeIndexesin interfaceDimensionMerger- Parameters:
segmentRowNumConversions- A list of row number conversion IntBuffer objects.- Throws:
IOException
-
makeDictionaryWriter
protected DictionaryWriter<T> makeDictionaryWriter(String fileName)
-
getExtendedIndexesMerger
@Nullable protected DictionaryEncodedColumnMerger.ExtendedIndexesMerger getExtendedIndexesMerger()
-
setupEncodedValueWriter
protected void setupEncodedValueWriter() throws IOException- Throws:
IOException
-
writeDictionary
protected void writeDictionary(Iterable<T> dictionaryValues) throws IOException
- Throws:
IOException
-
mergeBitmaps
protected MutableBitmap mergeBitmaps(@Nullable List<IntBuffer> segmentRowNumConversions, BitmapFactory bmpFactory, DictionaryEncodedColumnMerger.IndexSeeker[] dictIdSeeker, int dictId) throws IOException
- Throws:
IOException
-
hasOnlyNulls
public boolean hasOnlyNulls()
Description copied from interface:DimensionMergerReturns true if this dimension has no data besides nulls. SeeNullColumnPartSerdefor how null-only columns are stored in the segment.- Specified by:
hasOnlyNullsin interfaceDimensionMerger
-
toIndexSeekers
protected DictionaryEncodedColumnMerger.IndexSeeker[] toIndexSeekers(List<IndexableAdapter> adapters, ArrayList<IntBuffer> dimConversions, String dimension)
-
-