Package org.apache.druid.data.input.impl
Class CloudObjectInputSource
- java.lang.Object
-
- org.apache.druid.data.input.AbstractInputSource
-
- org.apache.druid.data.input.impl.CloudObjectInputSource
-
- All Implemented Interfaces:
SplittableInputSource<List<CloudObjectLocation>>
,SystemFieldInputSource
,InputSource
public abstract class CloudObjectInputSource extends AbstractInputSource implements SplittableInputSource<List<CloudObjectLocation>>, SystemFieldInputSource
-
-
Field Summary
Fields Modifier and Type Field Description protected SystemFields
systemFields
-
Fields inherited from interface org.apache.druid.data.input.InputSource
TYPE_PROPERTY
-
Fields inherited from interface org.apache.druid.data.input.impl.SplittableInputSource
DEFAULT_SPLIT_HINT_SPEC
-
Fields inherited from interface org.apache.druid.data.input.impl.systemfield.SystemFieldInputSource
SYSTEM_FIELDS_PROPERTY
-
-
Constructor Summary
Constructors Constructor Description CloudObjectInputSource(String scheme, List<URI> uris, List<URI> prefixes, List<CloudObjectLocation> objects, String objectGlob, SystemFields systemFields)
-
Method Summary
-
Methods inherited from class org.apache.druid.data.input.AbstractInputSource
fixedFormatReader, reader
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.data.input.InputSource
getTypes, reader
-
Methods inherited from interface org.apache.druid.data.input.impl.SplittableInputSource
getSplitHintSpecOrDefault, isSplittable, withSplit
-
Methods inherited from interface org.apache.druid.data.input.impl.systemfield.SystemFieldInputSource
getSystemFieldValue
-
-
-
-
Field Detail
-
systemFields
protected final SystemFields systemFields
-
-
Method Detail
-
getObjects
@Nullable public List<CloudObjectLocation> getObjects()
-
getConfiguredSystemFields
public Set<SystemField> getConfiguredSystemFields()
Description copied from interface:SystemFieldInputSource
System fields that this input source is configured to return. This is not the same set thatSystemFieldInputSource.getSystemFieldValue(InputEntity, SystemField)
returns nonnull for. For example, if aLocalInputSource
is configured to returnSystemField.BUCKET
then it will show up in this list, even though its value is always null. For another example in a different direction, if aLocalInputSource
is *not* configured to returnSystemField.URI
, then it will *not* show up in this list, even though its value fromSystemFieldInputSource.getSystemFieldValue(InputEntity, SystemField)
would be nonnull.- Specified by:
getConfiguredSystemFields
in interfaceSystemFieldInputSource
-
createEntity
protected abstract InputEntity createEntity(CloudObjectLocation location)
Create the correctInputEntity
for this input source given a split on aCloudObjectLocation
. This is called internally byformattableReader(org.apache.druid.data.input.InputRowSchema, org.apache.druid.data.input.InputFormat, java.io.File)
and operates on the output ofcreateSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.
-
getSplitWidget
protected abstract CloudObjectSplitWidget getSplitWidget()
ReturnsCloudObjectSplitWidget
, which is used to implementcreateSplits(InputFormat, SplitHintSpec)
.
-
createSplits
public Stream<InputSplit<List<CloudObjectLocation>>> createSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
Description copied from interface:SplittableInputSource
Creates aStream
ofInputSplit
s. The returned stream is supposed to be evaluated lazily to avoid consuming too much memory. Note that this interface also hasSplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
which is related to this method. The implementations should be careful to NOT cache the created splits in memory. Implementations can considerInputFormat.isSplittable()
andSplitHintSpec
to create splits in the same way withSplittableInputSource.estimateNumSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.- Specified by:
createSplits
in interfaceSplittableInputSource<List<CloudObjectLocation>>
-
estimateNumSplits
public int estimateNumSplits(InputFormat inputFormat, @Nullable SplitHintSpec splitHintSpec)
Description copied from interface:SplittableInputSource
Returns an estimated total number of splits to be created viaSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. The estimated number of splits doesn't have to be accurate and can be different from the actual number of InputSplits returned fromSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
. This will be used to estimate the progress of a phase in parallel indexing. See TaskMonitor for more details of the progress estimation. This method can be expensive if an implementation iterates all directories or whatever substructure to find all input entities. Implementations can considerInputFormat.isSplittable()
andSplitHintSpec
to find splits in the same way withSplittableInputSource.createSplits(org.apache.druid.data.input.InputFormat, org.apache.druid.data.input.SplitHintSpec)
.- Specified by:
estimateNumSplits
in interfaceSplittableInputSource<List<CloudObjectLocation>>
-
needsFormat
public boolean needsFormat()
Description copied from interface:InputSource
Returns true if this inputSource supports differentInputFormat
s. Some inputSources such asLocalInputSource
can store files of any format. These storage types require anInputFormat
to be passed so thatInputSourceReader
can parse data properly. However, some storage types have a fixed format. For example, druid inputSource always reads segments. These inputSources should return false for this method.- Specified by:
needsFormat
in interfaceInputSource
-
formattableReader
protected InputSourceReader formattableReader(InputRowSchema inputRowSchema, InputFormat inputFormat, @Nullable File temporaryDirectory)
- Overrides:
formattableReader
in classAbstractInputSource
-
-