@DefaultAnnotation(org.checkerframework.checker.nullness.qual.NonNull.class)
Package org.apache.beam.sdk.io
Defines transforms for reading and writing common storage formats, including
AvroIO
, and TextIO
.
The classes in this package provide Read
transforms that create PCollections from
existing storage:
PCollection<TableRow> inputData = pipeline.apply(
BigQueryIO.readTableRows().from("clouddataflow-readonly:samples.weather_stations"));
and Write
transforms that persist PCollections to external storage:
PCollection<Integer> numbers = ...;
numbers.apply(TextIO.write().to("gs://my_bucket/path/to/numbers"));
-
Interface Summary Interface Description AvroIO.RecordFormatter<ElementT> Deprecated. Users can achieve the same by providing this transform in aParDo
before using write in AvroIOAvroIO.write(Class)
.AvroSink.DatumWriterFactory<T> AvroSource.DatumReaderFactory<T> CompressedSource.DecompressingChannelFactory Factory interface for creating channels that decompress the content of an underlying channel.FileBasedSink.OutputFileHints Provides hints about how to generate output files, such as a suggested filename suffix (e.g.FileBasedSink.WritableByteChannelFactory Implementations create instances ofWritableByteChannel
used byFileBasedSink
and related classes to allow decorating, or otherwise transforming, the raw data that would normally be written directly to theWritableByteChannel
passed intoFileBasedSink.WritableByteChannelFactory.create(WritableByteChannel)
.FileIO.Sink<ElementT> Specifies how to write elements to individual files inFileIO.write()
andFileIO.writeDynamic()
.FileIO.Write.FileNaming A policy for generating names for shard files.FileSystemRegistrar A registrar that createsFileSystem
instances fromPipelineOptions
.ShardingFunction<UserT,DestinationT> Function for assigningShardedKey
s to input elements for shardedWriteFiles
.TextRowCountEstimator.SamplingStrategy Sampling Strategy shows us when should we stop reading further files.UnboundedSource.CheckpointMark A marker representing the progress and state of anUnboundedSource.UnboundedReader
. -
Class Summary Class Description AvroIO PTransform
s for reading and writing Avro files.AvroIO.Parse<T> AvroIO.ParseAll<T> Deprecated. SeeAvroIO.parseAllGenericRecords(SerializableFunction)
for details.AvroIO.ParseFiles<T> AvroIO.Read<T> Implementation ofAvroIO.read(java.lang.Class<T>)
andAvroIO.readGenericRecords(org.apache.avro.Schema)
.AvroIO.ReadAll<T> Deprecated. SeeAvroIO.readAll(Class)
for details.AvroIO.ReadFiles<T> Implementation ofAvroIO.readFiles(java.lang.Class<T>)
.AvroIO.Sink<ElementT> AvroIO.TypedWrite<UserT,DestinationT,OutputT> Implementation ofAvroIO.write(java.lang.Class<T>)
.AvroIO.Write<T> This class is used as the default return value ofAvroIO.write(java.lang.Class<T>)
AvroSchemaIOProvider An implementation ofSchemaIOProvider
for reading and writing Avro files withAvroIO
.AvroSink<UserT,DestinationT,OutputT> AFileBasedSink
for Avro files.AvroSource<T> Do not use in pipelines directly: most users should useAvroIO.Read
.AvroSource.AvroReader<T> ABlockBasedSource.BlockBasedReader
for reading blocks from Avro files.BlockBasedSource<T> ABlockBasedSource
is aFileBasedSource
where a file consists of blocks of records.BlockBasedSource.Block<T> ABlock
represents a block of records that can be read.BlockBasedSource.BlockBasedReader<T> AReader
that reads records from aBlockBasedSource
.BoundedReadFromUnboundedSource<T> PTransform
that reads a bounded amount of data from anUnboundedSource
, specified as one or both of a maximum number of elements or a maximum period of time to read.BoundedSource<T> ASource
that reads a finite amount of input and, because of that, supports some additional operations.BoundedSource.BoundedReader<T> AReader
that reads a bounded amount of input and supports some additional operations, such as progress estimation and dynamic work rebalancing.ClassLoaderFileSystem A read-onlyFileSystem
implementation looking up resources using a ClassLoader.ClassLoaderFileSystem.ClassLoaderFileSystemRegistrar AutoService
registrar for theClassLoaderFileSystem
.ClassLoaderFileSystem.ClassLoaderResourceId CompressedSource<T> A Source that reads from compressed files.CompressedSource.CompressedReader<T> Reader for aCompressedSource
.CountingSource Most users should useGenerateSequence
instead.CountingSource.CounterMark The checkpoint for an unboundedCountingSource
is simply the last value produced.DefaultFilenamePolicy A defaultFileBasedSink.FilenamePolicy
for windowed and unwindowed files.DefaultFilenamePolicy.Params Encapsulates constructor parameters toDefaultFilenamePolicy
.DefaultFilenamePolicy.ParamsCoder A Coder forDefaultFilenamePolicy.Params
.DynamicAvroDestinations<UserT,DestinationT,OutputT> A specialization ofFileBasedSink.DynamicDestinations
forAvroIO
.DynamicFileDestinations Some helper classes that derive fromFileBasedSink.DynamicDestinations
.FileBasedSink<UserT,DestinationT,OutputT> Abstract class for file-based output.FileBasedSink.DynamicDestinations<UserT,DestinationT,OutputT> A class that allows value-dependent writes inFileBasedSink
.FileBasedSink.FilenamePolicy A naming policy for output files.FileBasedSink.FileResult<DestinationT> Result of a single bundle write.FileBasedSink.FileResultCoder<DestinationT> A coder forFileBasedSink.FileResult
objects.FileBasedSink.WriteOperation<DestinationT,OutputT> Abstract operation that manages the process of writing toFileBasedSink
.FileBasedSink.Writer<DestinationT,OutputT> Abstract writer that writes a bundle to aFileBasedSink
.FileBasedSource<T> A common base class for all file-basedSource
s.FileBasedSource.FileBasedReader<T> Areader
that implements code common to readers ofFileBasedSource
s.FileIO General-purpose transforms for working with files: listing files (matching), reading and writing.FileIO.Match Implementation ofFileIO.match()
.FileIO.MatchAll Implementation ofFileIO.matchAll()
.FileIO.MatchConfiguration Describes configuration for matching filepatterns, such asEmptyMatchTreatment
and continuous watching for matching files.FileIO.ReadableFile A utility class for accessing a potentially compressed file.FileIO.ReadMatches Implementation ofFileIO.readMatches()
.FileIO.Write<DestinationT,UserT> Implementation ofFileIO.write()
andFileIO.writeDynamic()
.FileSystem<ResourceIdT extends ResourceId> File system interface in Beam.FileSystems Clients facingFileSystem
utility.FileSystemUtils GenerateSequence APTransform
that produces longs starting from the given value, and either up to the given limit or untilLong.MAX_VALUE
/ until the given time elapses.GenerateSequence.External Exposes GenerateSequence as an external transform for cross-language usage.GenerateSequence.External.ExternalConfiguration Parameters class to expose the transform to an external SDK.LocalFileSystemRegistrar AutoService
registrar for theLocalFileSystem
.LocalResources Helper functions for producing aResourceId
that references a local file or directory.OffsetBasedSource<T> ABoundedSource
that uses offsets to define starting and ending positions.OffsetBasedSource.OffsetBasedReader<T> ASource.Reader
that implements code common to readers of allOffsetBasedSource
s.Read APTransform
for reading from aSource
.Read.Bounded<T> PTransform
that reads from aBoundedSource
.Read.Builder Helper class for buildingRead
transforms.Read.Unbounded<T> PTransform
that reads from aUnboundedSource
.ReadableFileCoder ACoder
forFileIO.ReadableFile
.ReadAllViaFileBasedSource<T> Reads each file in the inputPCollection
ofFileIO.ReadableFile
using given parameters for splitting files into offset ranges and for creating aFileBasedSource
for a file.ReadAllViaFileBasedSource.ReadFileRangesFnExceptionHandler A class to handle errors which occur during file reads.ShardNameTemplate Standard shard naming templates.Source<T> Base class for defining input formats and creating aSource
for reading the input.Source.Reader<T> The interface that readers of custom input sources must implement.TextIO PTransform
s for reading and writing text files.TextIO.Read Implementation ofTextIO.read()
.TextIO.ReadAll Deprecated. SeeTextIO.readAll()
for details.TextIO.ReadFiles Implementation ofTextIO.readFiles()
.TextIO.Sink Implementation ofTextIO.sink()
.TextIO.TypedWrite<UserT,DestinationT> Implementation ofTextIO.write()
.TextIO.Write This class is used as the default return value ofTextIO.write()
.TextRowCountEstimator This returns a row count estimation for files associated with a file pattern.TextRowCountEstimator.Builder Builder forTextRowCountEstimator
.TextRowCountEstimator.LimitNumberOfFiles This strategy stops sampling if we sample enough number of bytes.TextRowCountEstimator.LimitNumberOfTotalBytes This strategy stops sampling when total number of sampled bytes are more than some threshold.TextRowCountEstimator.SampleAllFiles This strategy samples all the files.TFRecordIO PTransform
s for reading and writing TensorFlow TFRecord files.TFRecordIO.Read Implementation ofTFRecordIO.read()
.TFRecordIO.ReadFiles Implementation ofTFRecordIO.readFiles()
.TFRecordIO.Sink TFRecordIO.Write Implementation ofTFRecordIO.write()
.UnboundedSource<OutputT,CheckpointMarkT extends UnboundedSource.CheckpointMark> ASource
that reads an unbounded amount of input and, because of that, supports some additional operations such as checkpointing, watermarks, and record ids.UnboundedSource.CheckpointMark.NoopCheckpointMark A checkpoint mark that does nothing when finalized.UnboundedSource.UnboundedReader<OutputT> AReader
that reads an unbounded amount of input.WriteFiles<UserT,DestinationT,OutputT> APTransform
that writes to aFileBasedSink
.WriteFilesResult<DestinationT> The result of aWriteFiles
transform. -
Enum Summary Enum Description CompressedSource.CompressionMode Deprecated. UseCompression
insteadCompression Various compression types for reading/writing files.FileBasedSink.CompressionType Deprecated. useCompression
.FileBasedSource.Mode A givenFileBasedSource
represents a file resource of one of these types.FileIO.ReadMatches.DirectoryTreatment Enum to control how directories are handled.TextIO.CompressionType Deprecated. UseCompression
.TFRecordIO.CompressionType Deprecated. UseCompression
. -
Exception Summary Exception Description TextRowCountEstimator.NoEstimationException An exception that will be thrown if the estimator cannot get an estimation of the number of lines.