Package org.apache.druid.data.input.impl
Class CloudObjectInputSource
- java.lang.Object
-
- org.apache.druid.data.input.AbstractInputSource
-
- org.apache.druid.data.input.impl.CloudObjectInputSource
-
- All Implemented Interfaces:
SplittableInputSource<List<CloudObjectLocation>>
,InputSource
public abstract class CloudObjectInputSource extends AbstractInputSource implements SplittableInputSource<List<CloudObjectLocation>>
-
-
Field Summary
-
Fields inherited from interface org.apache.druid.data.input.InputSource
TYPE_PROPERTY
-
Fields inherited from interface org.apache.druid.data.input.impl.SplittableInputSource
DEFAULT_SPLIT_HINT_SPEC
-
-
Constructor Summary
Constructors Constructor Description CloudObjectInputSource(String scheme, List<URI> uris, List<URI> prefixes, List<CloudObjectLocation> objects, String objectGlob)
-
Method Summary
-
Methods inherited from class org.apache.druid.data.input.AbstractInputSource
fixedFormatReader, reader
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.data.input.InputSource
getTypes, reader
-
Methods inherited from interface org.apache.druid.data.input.impl.SplittableInputSource
getSplitHintSpecOrDefault, isSplittable, withSplit
-
-
-
-
Method Detail
-
getObjects
@Nullable public List<CloudObjectLocation> getObjects()
-
createEntity
protected abstract InputEntity createEntity(CloudObjectLocation location)
Create the correctInputEntity
for this input source given a split on aCloudObjectLocation
. This is called internally byformattableReader(org.apache.druid.data.input.InputRowSchema, org.apache.druid.data.input.InputFormat, java.io.File)
and operates on the output ofcreateSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.
-
getSplitWidget
protected abstract CloudObjectSplitWidget getSplitWidget()
ReturnsCloudObjectSplitWidget
, which is used to implementcreateSplits(InputFormat, SplitHintSpec)
.
-
createSplits
public Stream<InputSplit<List<CloudObjectLocation>>> createSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
Description copied from interface:SplittableInputSource
Creates aStream
ofInputSplit
s. The returned stream is supposed to be evaluated lazily to avoid consuming too much memory. Note that this interface also hasSplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
which is related to this method. The implementations should be careful to NOT cache the created splits in memory. Implementations can considerInputFormat.isSplittable()
andSplitHintSpec
to create splits in the same way withSplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.- Specified by:
createSplits
in interfaceSplittableInputSource<List<CloudObjectLocation>>
-
estimateNumSplits
public int estimateNumSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
Description copied from interface:SplittableInputSource
Returns an estimated total number of splits to be created viaSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. The estimated number of splits doesn't have to be accurate and can be different from the actual number of InputSplits returned fromSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. This will be used to estimate the progress of a phase in parallel indexing. See TaskMonitor for more details of the progress estimation. This method can be expensive if an implementation iterates all directories or whatever substructure to find all input entities. Implementations can considerInputFormat.isSplittable()
andSplitHintSpec
to find splits in the same way withSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.- Specified by:
estimateNumSplits
in interfaceSplittableInputSource<List<CloudObjectLocation>>
-
needsFormat
public boolean needsFormat()
Description copied from interface:InputSource
Returns true if this inputSource supports differentInputFormat
s. Some inputSources such asLocalInputSource
can store files of any format. These storage types require anInputFormat
to be passed so thatInputSourceReader
can parse data properly. However, some storage types have a fixed format. For example, druid inputSource always reads segments. These inputSources should return false for this method.- Specified by:
needsFormat
in interfaceInputSource
-
formattableReader
protected InputSourceReader formattableReader(InputRowSchema inputRowSchema, InputFormat inputFormat, @Nullable File temporaryDirectory)
- Overrides:
formattableReader
in classAbstractInputSource
-
-