public abstract class CloudObjectInputSource extends AbstractInputSource implements SplittableInputSource<List<CloudObjectLocation>>
DEFAULT_SPLIT_HINT_SPECTYPE_PROPERTY| Constructor and Description |
|---|
CloudObjectInputSource(String scheme,
List<URI> uris,
List<URI> prefixes,
List<CloudObjectLocation> objects,
String objectGlob) |
fixedFormatReader, readerclone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitgetSplitHintSpecOrDefault, isSplittable, withSplitgetTypes, reader@Nullable public List<CloudObjectLocation> getObjects()
protected abstract InputEntity createEntity(CloudObjectLocation location)
InputEntity for this input source given a split on a CloudObjectLocation. This
is called internally by formattableReader(org.apache.druid.data.input.InputRowSchema, org.apache.druid.data.input.InputFormat, java.io.File) and operates on the output of createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec).protected abstract CloudObjectSplitWidget getSplitWidget()
CloudObjectSplitWidget, which is used to implement
createSplits(InputFormat, SplitHintSpec).public Stream<InputSplit<List<CloudObjectLocation>>> createSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
SplittableInputSourceStream of InputSplits. The returned stream is supposed to be evaluated lazily to avoid
consuming too much memory.
Note that this interface also has SplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec) which is related to this method. The implementations
should be careful to NOT cache the created splits in memory.
Implementations can consider InputFormat.isSplittable() and SplitHintSpec to create splits
in the same way with SplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec).createSplits in interface SplittableInputSource<List<CloudObjectLocation>>public int estimateNumSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
SplittableInputSourceSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec). The estimated number of splits
doesn't have to be accurate and can be different from the actual number of InputSplits returned from
SplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec). This will be used to estimate the progress of a phase in parallel indexing.
See TaskMonitor for more details of the progress estimation.
This method can be expensive if an implementation iterates all directories or whatever substructure
to find all input entities.
Implementations can consider InputFormat.isSplittable() and SplitHintSpec to find splits
in the same way with SplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec).estimateNumSplits in interface SplittableInputSource<List<CloudObjectLocation>>public boolean needsFormat()
InputSourceInputFormats. Some inputSources such as
LocalInputSource can store files of any format. These storage types require an InputFormat
to be passed so that InputSourceReader can parse data properly. However, some storage types have
a fixed format. For example, druid inputSource always reads segments. These inputSources should return false for
this method.needsFormat in interface InputSourceprotected InputSourceReader formattableReader(InputRowSchema inputRowSchema, InputFormat inputFormat, @Nullable File temporaryDirectory)
formattableReader in class AbstractInputSourceCopyright © 2011–2023 The Apache Software Foundation. All rights reserved.