Package org.apache.druid.data.input
Interface SplitHintSpec
-
- All Known Implementing Classes:
FilePerSplitHintSpec
,MaxSizeSplitHintSpec
,SegmentsSplitHintSpec
public interface SplitHintSpec
In native parallel indexing, the supervisor task partitions input data into splits and assigns each of them to a single sub task. How to create splits could mainly depend on the input file format, but sometimes druid users want to give some hints to control the amount of data each sub task will read. SplitHintSpec can be used for this purpose. Implementations can ignore the given hint.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description <T> Iterator<List<T>>
split(Iterator<T> inputIterator, Function<T,InputFileAttribute> inputAttributeExtractor)
Returns an iterator of splits.
-
-
-
Method Detail
-
split
<T> Iterator<List<T>> split(Iterator<T> inputIterator, Function<T,InputFileAttribute> inputAttributeExtractor)
Returns an iterator of splits. A split has a list of files of the typeSplitHintSpec
.- Parameters:
inputIterator
- that returns input files.inputAttributeExtractor
- to createInputFileAttribute
for each input file. This may involve a network call, so implementations of SplitHintSpec should use it only if needed, and reuse results if appropriate.
-
-