public interface SplittableInputSource<T> extends InputSource
InputSplit
s in parallel.Modifier and Type | Field and Description |
---|---|
static SplitHintSpec |
DEFAULT_SPLIT_HINT_SPEC |
TYPE_PROPERTY
Modifier and Type | Method and Description |
---|---|
Stream<InputSplit<T>> |
createSplits(InputFormat inputFormat,
SplitHintSpec splitHintSpec)
Creates a
Stream of InputSplit s. |
int |
estimateNumSplits(InputFormat inputFormat,
SplitHintSpec splitHintSpec)
Returns an estimated total number of splits to be created via
createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec) . |
default SplitHintSpec |
getSplitHintSpecOrDefault(SplitHintSpec splitHintSpec) |
default boolean |
isSplittable()
Returns true if this inputSource can be processed in parallel using ParallelIndexSupervisorTask.
|
InputSource |
withSplit(InputSplit<T> split)
Helper method for ParallelIndexSupervisorTask.
|
getTypes, needsFormat, reader
static final SplitHintSpec DEFAULT_SPLIT_HINT_SPEC
default boolean isSplittable()
InputSource
isSplittable
in interface InputSource
Stream<InputSplit<T>> createSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec) throws IOException
Stream
of InputSplit
s. The returned stream is supposed to be evaluated lazily to avoid
consuming too much memory.
Note that this interface also has estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
which is related to this method. The implementations
should be careful to NOT cache the created splits in memory.
Implementations can consider InputFormat.isSplittable()
and SplitHintSpec
to create splits
in the same way with estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.IOException
int estimateNumSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec) throws IOException
createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. The estimated number of splits
doesn't have to be accurate and can be different from the actual number of InputSplits returned from
createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. This will be used to estimate the progress of a phase in parallel indexing.
See TaskMonitor for more details of the progress estimation.
This method can be expensive if an implementation iterates all directories or whatever substructure
to find all input entities.
Implementations can consider InputFormat.isSplittable()
and SplitHintSpec
to find splits
in the same way with createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.IOException
InputSource withSplit(InputSplit<T> split)
default SplitHintSpec getSplitHintSpecOrDefault(@Nullable SplitHintSpec splitHintSpec)
Copyright © 2011–2023 The Apache Software Foundation. All rights reserved.