Class WriteFiles<UserT,​DestinationT,​OutputT>

  • All Implemented Interfaces:
    java.io.Serializable, HasDisplayData

    @Experimental(SOURCE_SINK)
    public abstract class WriteFiles<UserT,​DestinationT,​OutputT>
    extends PTransform<PCollection<UserT>,​WriteFilesResult<DestinationT>>
    A PTransform that writes to a FileBasedSink. A write begins with a sequential global initialization of a sink, followed by a parallel write, and ends with a sequential finalization of the write. The output of a write is PDone.

    By default, every bundle in the input PCollection will be processed by a FileBasedSink.WriteOperation, so the number of output will vary based on runner behavior, though at least 1 output will always be produced. The exact parallelism of the write stage can be controlled using withNumShards(int), typically used to control how many files are produced or to globally limit the number of workers connecting to an external service. However, this option can often hurt performance: it adds an additional GroupByKey to the pipeline.

    Example usage with runner-determined sharding:

    p.apply(WriteFiles.to(new MySink(...)));

    Example usage with a fixed number of shards:

    p.apply(WriteFiles.to(new MySink(...)).withNumShards(3));
    See Also:
    Serialized Form