Package org.apache.druid.data.input.impl
Class CloudObjectInputSource
- java.lang.Object
-
- org.apache.druid.data.input.AbstractInputSource
-
- org.apache.druid.data.input.impl.CloudObjectInputSource
-
- All Implemented Interfaces:
SplittableInputSource<List<CloudObjectLocation>>,InputSource
public abstract class CloudObjectInputSource extends AbstractInputSource implements SplittableInputSource<List<CloudObjectLocation>>
-
-
Field Summary
-
Fields inherited from interface org.apache.druid.data.input.InputSource
TYPE_PROPERTY
-
Fields inherited from interface org.apache.druid.data.input.impl.SplittableInputSource
DEFAULT_SPLIT_HINT_SPEC
-
-
Constructor Summary
Constructors Constructor Description CloudObjectInputSource(String scheme, List<URI> uris, List<URI> prefixes, List<CloudObjectLocation> objects, String objectGlob)
-
Method Summary
-
Methods inherited from class org.apache.druid.data.input.AbstractInputSource
fixedFormatReader, reader
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.data.input.InputSource
getTypes, reader
-
Methods inherited from interface org.apache.druid.data.input.impl.SplittableInputSource
getSplitHintSpecOrDefault, isSplittable, withSplit
-
-
-
-
Method Detail
-
getObjects
@Nullable public List<CloudObjectLocation> getObjects()
-
createEntity
protected abstract InputEntity createEntity(CloudObjectLocation location)
Create the correctInputEntityfor this input source given a split on aCloudObjectLocation. This is called internally byformattableReader(org.apache.druid.data.input.InputRowSchema, org.apache.druid.data.input.InputFormat, java.io.File)and operates on the output ofcreateSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec).
-
getSplitWidget
protected abstract CloudObjectSplitWidget getSplitWidget()
ReturnsCloudObjectSplitWidget, which is used to implementcreateSplits(InputFormat, SplitHintSpec).
-
createSplits
public Stream<InputSplit<List<CloudObjectLocation>>> createSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
Description copied from interface:SplittableInputSourceCreates aStreamofInputSplits. The returned stream is supposed to be evaluated lazily to avoid consuming too much memory. Note that this interface also hasSplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)which is related to this method. The implementations should be careful to NOT cache the created splits in memory. Implementations can considerInputFormat.isSplittable()andSplitHintSpecto create splits in the same way withSplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec).- Specified by:
createSplitsin interfaceSplittableInputSource<List<CloudObjectLocation>>
-
estimateNumSplits
public int estimateNumSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
Description copied from interface:SplittableInputSourceReturns an estimated total number of splits to be created viaSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec). The estimated number of splits doesn't have to be accurate and can be different from the actual number of InputSplits returned fromSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec). This will be used to estimate the progress of a phase in parallel indexing. See TaskMonitor for more details of the progress estimation. This method can be expensive if an implementation iterates all directories or whatever substructure to find all input entities. Implementations can considerInputFormat.isSplittable()andSplitHintSpecto find splits in the same way withSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec).- Specified by:
estimateNumSplitsin interfaceSplittableInputSource<List<CloudObjectLocation>>
-
needsFormat
public boolean needsFormat()
Description copied from interface:InputSourceReturns true if this inputSource supports differentInputFormats. Some inputSources such asLocalInputSourcecan store files of any format. These storage types require anInputFormatto be passed so thatInputSourceReadercan parse data properly. However, some storage types have a fixed format. For example, druid inputSource always reads segments. These inputSources should return false for this method.- Specified by:
needsFormatin interfaceInputSource
-
formattableReader
protected InputSourceReader formattableReader(InputRowSchema inputRowSchema, InputFormat inputFormat, @Nullable File temporaryDirectory)
- Overrides:
formattableReaderin classAbstractInputSource
-
-