Package org.apache.druid.data.input.impl
Interface SplittableInputSource<T>
-
- All Superinterfaces:
InputSource
- All Known Implementing Classes:
CloudObjectInputSource
,CombiningInputSource
,HttpInputSource
,LocalInputSource
public interface SplittableInputSource<T> extends InputSource
Splittable InputSource. ParallelIndexSupervisorTask can processInputSplit
s in parallel.
-
-
Field Summary
Fields Modifier and Type Field Description static SplitHintSpec
DEFAULT_SPLIT_HINT_SPEC
-
Fields inherited from interface org.apache.druid.data.input.InputSource
TYPE_PROPERTY
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description Stream<InputSplit<T>>
createSplits(InputFormat inputFormat, SplitHintSpec splitHintSpec)
Creates aStream
ofInputSplit
s.int
estimateNumSplits(InputFormat inputFormat, SplitHintSpec splitHintSpec)
Returns an estimated total number of splits to be created viacreateSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.default SplitHintSpec
getSplitHintSpecOrDefault(SplitHintSpec splitHintSpec)
default boolean
isSplittable()
Returns true if this inputSource can be processed in parallel using ParallelIndexSupervisorTask.InputSource
withSplit(InputSplit<T> split)
Helper method for ParallelIndexSupervisorTask.-
Methods inherited from interface org.apache.druid.data.input.InputSource
getTypes, needsFormat, reader
-
-
-
-
Field Detail
-
DEFAULT_SPLIT_HINT_SPEC
static final SplitHintSpec DEFAULT_SPLIT_HINT_SPEC
-
-
Method Detail
-
isSplittable
default boolean isSplittable()
Description copied from interface:InputSource
Returns true if this inputSource can be processed in parallel using ParallelIndexSupervisorTask. It must be castable to SplittableInputSource and the various SplittableInputSource methods must work as documented.- Specified by:
isSplittable
in interfaceInputSource
-
createSplits
Stream<InputSplit<T>> createSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec) throws IOException
Creates aStream
ofInputSplit
s. The returned stream is supposed to be evaluated lazily to avoid consuming too much memory. Note that this interface also hasestimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
which is related to this method. The implementations should be careful to NOT cache the created splits in memory. Implementations can considerInputFormat.isSplittable()
andSplitHintSpec
to create splits in the same way withestimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.- Throws:
IOException
-
estimateNumSplits
int estimateNumSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec) throws IOException
Returns an estimated total number of splits to be created viacreateSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. The estimated number of splits doesn't have to be accurate and can be different from the actual number of InputSplits returned fromcreateSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. This will be used to estimate the progress of a phase in parallel indexing. See TaskMonitor for more details of the progress estimation. This method can be expensive if an implementation iterates all directories or whatever substructure to find all input entities. Implementations can considerInputFormat.isSplittable()
andSplitHintSpec
to find splits in the same way withcreateSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.- Throws:
IOException
-
withSplit
InputSource withSplit(InputSplit<T> split)
Helper method for ParallelIndexSupervisorTask. Most of implementations can simply create a new instance with the given split.
-
getSplitHintSpecOrDefault
default SplitHintSpec getSplitHintSpecOrDefault(@Nullable SplitHintSpec splitHintSpec)
-
-