public static class AvroIO.Write extends Object
PTransform that writes a PCollection to an Avro file (or
multiple Avro files matching a sharding pattern).| Modifier and Type | Class and Description |
|---|---|
static class |
AvroIO.Write.Bound<T>
A
PTransform that writes a bounded PCollection to an Avro file (or
multiple Avro files matching a sharding pattern). |
| Constructor and Description |
|---|
Write() |
| Modifier and Type | Method and Description |
|---|---|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
named(String name)
Returns an AvroIO.Write PTransform with the given step name.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
to(String prefix)
Returns an AvroIO.Write PTransform that writes to the file(s)
with the given prefix.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withNumShards(int numShards)
Returns an AvroIO.Write PTransform that uses the provided shard count.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withoutSharding()
Returns an AvroIO.Write PTransform that forces a single file as
output.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withoutValidation()
Returns a AvroIO.Write PTransform that has GCS path validation on
pipeline creation disabled.
|
static <T> AvroIO.Write.Bound<T> |
withSchema(Class<T> type)
Returns an AvroIO.Write PTransform that writes Avro file(s)
containing records whose type is the specified Avro-generated class.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withSchema(org.apache.avro.Schema schema)
Returns an AvroIO.Write PTransform that writes Avro file(s)
containing records of the specified schema.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withSchema(String schema)
Returns an AvroIO.Write PTransform that writes Avro file(s)
containing records of the specified schema in a JSON-encoded
string form.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withShardNameTemplate(String shardTemplate)
Returns an AvroIO.Write PTransform that uses the given shard name
template.
|
static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withSuffix(String filenameSuffix)
Returns an AvroIO.Write PTransform that writes to the file(s) with the
given filename suffix.
|
public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> named(String name)
public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> to(String prefix)
"gs://<bucket>/<filepath>")
(if running locally or via the Google Cloud Dataflow service).
The files written will begin with this prefix, followed by
a shard identifier (see AvroIO.Write.Bound.withNumShards(int), and end
in a common extension, if given by AvroIO.Write.Bound.withSuffix(java.lang.String).
public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withSuffix(String filenameSuffix)
public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withNumShards(int numShards)
Constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
numShards - the number of shards to use, or 0 to let the system
decide.public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withShardNameTemplate(String shardTemplate)
See ShardNameTemplate for a description of shard templates.
public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withoutSharding()
Constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
public static <T> AvroIO.Write.Bound<T> withSchema(Class<T> type)
T - the type of the elements of the input PCollectionpublic static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withSchema(org.apache.avro.Schema schema)
public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withSchema(String schema)
public static AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withoutValidation()
This can be useful in the case where the GCS output location does not exist at the pipeline creation time, but is expected to be available at execution time.