Package org.apache.druid.segment
Interface DimensionMerger
-
- All Known Subinterfaces:
DimensionMergerV9
- All Known Implementing Classes:
AutoTypeColumnMerger
,DictionaryEncodedColumnMerger
,DoubleDimensionMergerV9
,FloatDimensionMergerV9
,LongDimensionMergerV9
,NestedDataColumnMergerV4
,NumericDimensionMergerV9
,StringDimensionMergerV9
public interface DimensionMerger
Processing related interface A DimensionMerger is a per-dimension stateful object that encapsulates type-specific operations and data structures used during the segment merging process (i.e., work done byIndexMerger
). This object is responsible for: - merging encoding dictionaries, if present - writing the merged column data and any merged indexing structures (e.g., dictionaries, bitmaps) to disk At a high level, the index merging process can be broken down into the following steps: - Merge segment's encoding dictionaries. These need to be merged across segments into a shared space of dictionary mappings:writeMergedValueDictionary(List)
. - Merge the rows across segments into a common sequence of rows. Done outside of scope of this interface, currently inIndexMergerV9
. - After constructing the merged sequence of rows, process each individual row viaprocessMergedRow(org.apache.druid.segment.ColumnValueSelector)
, potentially continuing updating the internal structures. - Write the value representation metadata (dictionary, bitmaps), the sequence of row values, and index structures to a merged segment:writeIndexes(java.util.List<java.nio.IntBuffer>)
A class implementing this interface is expected to be highly stateful, updating its internal state as these functions are called.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description ColumnValueSelector
convertSortedSegmentRowValuesToMergedRowValues(int segmentIndex, ColumnValueSelector source)
Creates a value selector, which converts values with per-segment, _sorted order_ (seeDimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)
) encoding from the given selector to their equivalent representation in the merged set of rows.boolean
hasOnlyNulls()
Returns true if this dimension has no data besides nulls.void
processMergedRow(ColumnValueSelector selector)
Process a column value(s) (potentially multi-value) of a row from the given selector and update the DimensionMerger's internal state.void
writeIndexes(List<IntBuffer> segmentRowNumConversions)
Internally construct any index structures relevant to this DimensionMerger.void
writeMergedValueDictionary(List<IndexableAdapter> adapters)
Given a list of segment adapters: - Read _sorted order_ (e.
-
-
-
Method Detail
-
writeMergedValueDictionary
void writeMergedValueDictionary(List<IndexableAdapter> adapters) throws IOException
Given a list of segment adapters: - Read _sorted order_ (e. g. seeIncrementalIndexAdapter.getDimValueLookup(String)
) dictionary encoding information from the adapters - Merge those sorted order dictionary into a one big sorted order dictionary and write this merged dictionary. The implementer should maintain knowledge of the "index number" of the adapters in the input list, i.e., the position of each adapter in the input list. This "index number" will be used to refer to specific segments later inconvertSortedSegmentRowValuesToMergedRowValues(int, org.apache.druid.segment.ColumnValueSelector)
.- Parameters:
adapters
- List of adapters to be merged.- Throws:
IOException
- See Also:
DimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)
-
convertSortedSegmentRowValuesToMergedRowValues
ColumnValueSelector convertSortedSegmentRowValuesToMergedRowValues(int segmentIndex, ColumnValueSelector source)
Creates a value selector, which converts values with per-segment, _sorted order_ (seeDimensionIndexer.convertUnsortedValuesToSorted(org.apache.druid.segment.ColumnValueSelector)
) encoding from the given selector to their equivalent representation in the merged set of rows. This method is used by the index merging process to build the merged sequence of rows. The implementing class is expected to use the merged value metadata constructed duringwriteMergedValueDictionary(List)
, if applicable. For example, an implementation of this function for a dictionary-encoded String column would convert the segment-specific, sorted order dictionary values within the row to the common merged dictionary values determined duringwriteMergedValueDictionary(List)
.- Parameters:
segmentIndex
- indicates which segment the row originated from, in the order established inwriteMergedValueDictionary(List)
source
- the selector from which to take values to convert- Returns:
- a selector with converted values
-
processMergedRow
void processMergedRow(ColumnValueSelector selector) throws IOException
Process a column value(s) (potentially multi-value) of a row from the given selector and update the DimensionMerger's internal state. After constructing a merged sequence of rows across segments, the index merging process will iterate through these rows and on each iteration, for each column, pass the column value selector to the corresponding DimensionMerger. This allows each DimensionMerger to build its internal view of the sequence of merged rows, to be written out to a segment later.- Throws:
IOException
-
writeIndexes
void writeIndexes(@Nullable List<IntBuffer> segmentRowNumConversions) throws IOException
Internally construct any index structures relevant to this DimensionMerger. After receiving the sequence of merged rows via iteratedprocessMergedRow(org.apache.druid.segment.ColumnValueSelector)
calls, the DimensionMerger can now build any index structures it needs. For example, a dictionary encoded String implementation would create its bitmap indexes for the merged segment during this step. The index merger will provide a list of row number conversion IntBuffer objects. Each IntBuffer is associated with one of the segments being merged; the position of the IntBuffer in the list corresponds to the position of segment adapters within the input list ofwriteMergedValueDictionary(List)
. For example, suppose there are two segments A and B. Row 24 from segment A maps to row 99 in the merged sequence of rows, The IntBuffer for segment A would have a mapping of 24 -> 99.- Parameters:
segmentRowNumConversions
- A list of row number conversion IntBuffer objects.- Throws:
IOException
-
hasOnlyNulls
boolean hasOnlyNulls()
Returns true if this dimension has no data besides nulls. SeeNullColumnPartSerde
for how null-only columns are stored in the segment.
-
-