T - The type to read from the compressed file.@Experimental(value=SOURCE_SINK) public class CompressedSource<T> extends FileBasedSource<T>
CompressedSources wraps a delegate
FileBasedSource that is able to read the decompressed file format.
For example, use the following to read from a gzip-compressed XML file:
XmlSource mySource = XmlSource.from(...);
PCollection<T> collection = p.apply(CompressedSource.readFromSource(mySource,
CompressedSource.CompressionMode.GZIP);
Or, alternatively:
XmlSource mySource = XmlSource.from(...);
PCollection<T> collection = p.apply(Read.from(CompressedSource.from(mySource,
CompressedSource.CompressionMode.GZIP)));
Default compression modes are CompressedSource.CompressionMode.GZIP and CompressedSource.CompressionMode.BZIP2.
User-defined compression types are supported by implementing CompressedSource.DecompressingChannelFactory.
| Modifier and Type | Class and Description |
|---|---|
static class |
CompressedSource.CompressedReader<T>
Reader for a
CompressedSource. |
static class |
CompressedSource.CompressionMode
Default compression types supported by the
CompressedSource. |
static interface |
CompressedSource.DecompressingChannelFactory
Factory interface for creating channels that decompress the content of an underlying channel.
|
FileBasedSource.FileBasedReader<T>, FileBasedSource.ModeByteOffsetBasedSource.ByteOffsetBasedReader<T>BoundedSource.AbstractBoundedReader<T>, BoundedSource.BoundedReader<T>Source.AbstractReader<T>, Source.Reader<T>| Modifier and Type | Method and Description |
|---|---|
CompressedSource<T> |
createForSubrangeOfFile(String fileName,
long start,
long end)
Creates a
CompressedSource for a subrange of a file. |
CompressedSource.CompressedReader<T> |
createSingleFileReader(PipelineOptions options,
com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext)
Creates a
CompressedReader to read a single file. |
static <T> CompressedSource<T> |
from(FileBasedSource<T> sourceDelegate)
Creates a
CompressedSource from an underlying FileBasedSource that must be
further configured with withDecompression(com.google.cloud.dataflow.sdk.io.CompressedSource.DecompressingChannelFactory). |
CompressedSource.DecompressingChannelFactory |
getChannelFactory() |
Coder<T> |
getDefaultOutputCoder()
Returns the delegate source's default output coder.
|
protected boolean |
isSplittable()
Determines whether a single file represented by this source is splittable.
|
boolean |
producesSortedKeys(PipelineOptions options)
Returns whether the delegate source produces sorted keys.
|
static <T> Read.Bound<T> |
readFromSource(FileBasedSource<T> sourceDelegate,
CompressedSource.DecompressingChannelFactory channelFactory)
Creates a
Read transform that reads from a CompressedSource that reads from an
underlying FileBasedSource after decompressing it with a CompressedSource.DecompressingChannelFactory. |
void |
validate()
Validates that the delegate source is a valid source and that the channel factory is not null.
|
CompressedSource<T> |
withDecompression(CompressedSource.DecompressingChannelFactory channelFactory)
Return a
CompressedSource that is like this one but will decompress its underlying file
with the given CompressedSource.DecompressingChannelFactory. |
createReader, createSourceForSubrange, getEstimatedSizeBytes, getFileOrPatternSpec, getMaxEndOffset, getMode, splitIntoBundles, toStringgetEndOffset, getMinBundleSize, getStartOffsetpublic static <T> Read.Bound<T> readFromSource(FileBasedSource<T> sourceDelegate, CompressedSource.DecompressingChannelFactory channelFactory)
Read transform that reads from a CompressedSource that reads from an
underlying FileBasedSource after decompressing it with a CompressedSource.DecompressingChannelFactory.public static <T> CompressedSource<T> from(FileBasedSource<T> sourceDelegate)
CompressedSource from an underlying FileBasedSource that must be
further configured with withDecompression(com.google.cloud.dataflow.sdk.io.CompressedSource.DecompressingChannelFactory).public CompressedSource<T> withDecompression(CompressedSource.DecompressingChannelFactory channelFactory)
CompressedSource that is like this one but will decompress its underlying file
with the given CompressedSource.DecompressingChannelFactory.public void validate()
validate in class FileBasedSource<T>public CompressedSource<T> createForSubrangeOfFile(String fileName, long start, long end)
CompressedSource for a subrange of a file. Called by superclass to create a
source for a single file.createForSubrangeOfFile in class FileBasedSource<T>fileName - file backing the new FileBasedSource.start - starting byte offset of the new FileBasedSource.end - ending byte offset of the new FileBasedSource. May be Long.MAX_VALUE,
in which case it will be inferred using FileBasedSource.getMaxEndOffset(com.google.cloud.dataflow.sdk.options.PipelineOptions).protected final boolean isSplittable()
throws Exception
isSplittable in class FileBasedSource<T>Exceptionpublic final CompressedSource.CompressedReader<T> createSingleFileReader(PipelineOptions options, com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext)
CompressedReader to read a single file.
Uses the delegate source to create a single file reader for the delegate source.
createSingleFileReader in class FileBasedSource<T>public final boolean producesSortedKeys(PipelineOptions options) throws Exception
producesSortedKeys in class BoundedSource<T>Exceptionpublic final Coder<T> getDefaultOutputCoder()
getDefaultOutputCoder in class Source<T>public final CompressedSource.DecompressingChannelFactory getChannelFactory()