public static class TextIO.Write
extends java.lang.Object
| Modifier and Type | Class and Description |
|---|---|
static class |
TextIO.Write.Bound<T>
A PTransform that writes a bounded PCollection to a text file (or
multiple text files matching a sharding pattern), with each
PCollection element being encoded into its own line.
|
| Constructor and Description |
|---|
TextIO.Write() |
| Modifier and Type | Method and Description |
|---|---|
static TextIO.Write.Bound<java.lang.String> |
named(java.lang.String name)
Returns a TextIO.Write PTransform with the given step name.
|
static TextIO.Write.Bound<java.lang.String> |
to(java.lang.String prefix)
Returns a TextIO.Write PTransform that writes to the file(s)
with the given prefix.
|
static <T> TextIO.Write.Bound<T> |
withCoder(Coder<T> coder)
Returns a TextIO.Write PTransform that uses the given
Coder<T> to encode each of the elements of the input
PCollection<T> into an output text line. |
static TextIO.Write.Bound<java.lang.String> |
withNumShards(int numShards)
Returns a TextIO.Write PTransform that uses the provided shard count.
|
static TextIO.Write.Bound<java.lang.String> |
withoutSharding()
Returns a TextIO.Write PTransform that forces a single file as
output.
|
static TextIO.Write.Bound<java.lang.String> |
withoutValidation()
Returns a TextIO.Write PTransform that has GCS path validation on
pipeline creation disabled.
|
static TextIO.Write.Bound<java.lang.String> |
withShardNameTemplate(java.lang.String shardTemplate)
Returns a TextIO.Write PTransform that uses the given shard name
template.
|
static TextIO.Write.Bound<java.lang.String> |
withSuffix(java.lang.String nameExtension)
Returns a TextIO.Write PTransform that writes to the file(s) with the
given filename suffix.
|
public static TextIO.Write.Bound<java.lang.String> named(java.lang.String name)
public static TextIO.Write.Bound<java.lang.String> to(java.lang.String prefix)
"gs://<bucket>/<filepath>")
(if running locally or via the Google Cloud Dataflow service).
The files written will begin with this prefix, followed by
a shard identifier (see TextIO.Write.Bound.withNumShards(int), and end
in a common extension, if given by TextIO.Write.Bound.withSuffix(java.lang.String).
public static TextIO.Write.Bound<java.lang.String> withSuffix(java.lang.String nameExtension)
public static TextIO.Write.Bound<java.lang.String> withNumShards(int numShards)
Constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
numShards - the number of shards to use, or 0 to let the system
decide.public static TextIO.Write.Bound<java.lang.String> withShardNameTemplate(java.lang.String shardTemplate)
See ShardNameTemplate for a description of shard templates.
public static TextIO.Write.Bound<java.lang.String> withoutSharding()
public static <T> TextIO.Write.Bound<T> withCoder(Coder<T> coder)
Coder<T> to encode each of the elements of the input
PCollection<T> into an output text line.
By default, uses StringUtf8Coder, which writes input
Java strings directly as output lines.
T - the type of the elements of the input PCollectionpublic static TextIO.Write.Bound<java.lang.String> withoutValidation()
This can be useful in the case where the GCS output location does not exist at the pipeline creation time, but is expected to be available at execution time.