Package org.apache.beam.sdk.io
Class BoundedSource<T>
- java.lang.Object
-
- org.apache.beam.sdk.io.Source<T>
-
- org.apache.beam.sdk.io.BoundedSource<T>
-
- Type Parameters:
T
- Type of records read by the source.
- All Implemented Interfaces:
java.io.Serializable
,HasDisplayData
- Direct Known Subclasses:
OffsetBasedSource
public abstract class BoundedSource<T> extends Source<T>
ASource
that reads a finite amount of input and, because of that, supports some additional operations.The operations are:
- Splitting into sources that read bundles of given size:
split(long, org.apache.beam.sdk.options.PipelineOptions)
; - Size estimation:
getEstimatedSizeBytes(org.apache.beam.sdk.options.PipelineOptions)
; - The accompanying
reader
has additional functionality to enable runners to dynamically adapt based on runtime conditions.- Progress estimation (
BoundedSource.BoundedReader.getFractionConsumed()
) - Tracking of parallelism, to determine whether the current source can be split (
BoundedSource.BoundedReader.getSplitPointsConsumed()
andBoundedSource.BoundedReader.getSplitPointsRemaining()
). - Dynamic splitting of the current source (
BoundedSource.BoundedReader.splitAtFraction(double)
).
- Progress estimation (
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BoundedSource.BoundedReader<T>
AReader
that reads a bounded amount of input and supports some additional operations, such as progress estimation and dynamic work rebalancing.-
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T>
-
-
Constructor Summary
Constructors Constructor Description BoundedSource()
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description abstract BoundedSource.BoundedReader<T>
createReader(PipelineOptions options)
Returns a newBoundedSource.BoundedReader
that reads from this source.abstract long
getEstimatedSizeBytes(PipelineOptions options)
An estimate of the total size (in bytes) of the data that would be read from this source.abstract java.util.List<? extends BoundedSource<T>>
split(long desiredBundleSizeBytes, PipelineOptions options)
Splits the source into bundles of approximatelydesiredBundleSizeBytes
.-
Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder, getOutputCoder, populateDisplayData, validate
-
-
-
-
Method Detail
-
split
public abstract java.util.List<? extends BoundedSource<T>> split(long desiredBundleSizeBytes, PipelineOptions options) throws java.lang.Exception
Splits the source into bundles of approximatelydesiredBundleSizeBytes
.- Throws:
java.lang.Exception
-
getEstimatedSizeBytes
public abstract long getEstimatedSizeBytes(PipelineOptions options) throws java.lang.Exception
An estimate of the total size (in bytes) of the data that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader.If there is no way to estimate the size of the source implementations MAY return 0L.
- Throws:
java.lang.Exception
-
createReader
public abstract BoundedSource.BoundedReader<T> createReader(PipelineOptions options) throws java.io.IOException
Returns a newBoundedSource.BoundedReader
that reads from this source.- Throws:
java.io.IOException
-
-