public static class TextIO.Read extends Object
PTransform
that reads from a text file (or multiple text
files matching a pattern) and returns a PCollection
containing
the decoding of each of the lines of the text file(s). The
default decoding just returns each line as a String
, but you may call
withCoder(Coder)
to change the return type.Modifier and Type | Class and Description |
---|---|
static class |
TextIO.Read.Bound<T>
A
PTransform that reads from one or more text files and returns a bounded
PCollection containing one element for each line of the input files. |
Modifier and Type | Method and Description |
---|---|
static TextIO.Read.Bound<String> |
from(String filepattern)
Returns a transform for reading text files that reads from the file(s)
with the given filename or filename pattern.
|
static TextIO.Read.Bound<String> |
named(String name)
Returns a transform for reading text files that uses the given step name.
|
static <T> TextIO.Read.Bound<T> |
withCoder(Coder<T> coder)
Returns a transform for reading text files that uses the given
Coder<T> to decode each of the lines of the file into a
value of type T . |
static TextIO.Read.Bound<String> |
withCompressionType(TextIO.CompressionType compressionType)
Returns a transform for reading text files that decompresses all input files
using the specified compression type.
|
static TextIO.Read.Bound<String> |
withoutValidation()
Returns a transform for reading text files that has GCS path validation on
pipeline creation disabled.
|
public static TextIO.Read.Bound<String> named(String name)
public static TextIO.Read.Bound<String> from(String filepattern)
"gs://<bucket>/<filepath>"
(if running locally or via the Google Cloud Dataflow
service). Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.public static <T> TextIO.Read.Bound<T> withCoder(Coder<T> coder)
Coder<T>
to decode each of the lines of the file into a
value of type T
.
By default, uses StringUtf8Coder
, which just
returns the text lines as Java strings.
T
- the type of the decoded elements, and the elements
of the resulting PCollectionpublic static TextIO.Read.Bound<String> withoutValidation()
This can be useful in the case where the GCS input does not exist at the pipeline creation time, but is expected to be available at execution time.
public static TextIO.Read.Bound<String> withCompressionType(TextIO.CompressionType compressionType)
If no compression type is specified, the default is TextIO.CompressionType.AUTO
.
In this mode, the compression type of the file is determined by its extension
(e.g., *.gz
is gzipped, *.bz2
is bzipped, and all other extensions are
uncompressed).