Class FileBasedSink.WriteOperation<DestinationT,​OutputT>

  • Type Parameters:
    OutputT - the type of values written to the sink.
    All Implemented Interfaces:
    java.io.Serializable
    Enclosing class:
    FileBasedSink<UserT,​DestinationT,​OutputT>

    public abstract static class FileBasedSink.WriteOperation<DestinationT,​OutputT>
    extends java.lang.Object
    implements java.io.Serializable
    Abstract operation that manages the process of writing to FileBasedSink.

    The primary responsibilities of the WriteOperation is the management of output files. During a write, FileBasedSink.Writers write bundles to temporary file locations. After the bundles have been written,

    1. finalizeDestination(DestinationT, org.apache.beam.sdk.transforms.windowing.BoundedWindow, java.lang.Integer, java.util.Collection<org.apache.beam.sdk.io.FileBasedSink.FileResult<DestinationT>>) is given a list of the temporary files containing the output bundles.
    2. During finalize, these temporary files are copied to final output locations and named according to a file naming template.
    3. Finally, any temporary files that were created during the write are removed.

    Subclass implementations of WriteOperation must implement createWriter() to return a concrete FileBasedSinkWriter.

    Temporary and Output File Naming:

    During the write, bundles are written to temporary files using the tempDirectory that can be provided via the constructor of WriteOperation. These temporary files will be named {tempDirectory}/{bundleId}, where bundleId is the unique id of the bundle. For example, if tempDirectory is "gs://my-bucket/my_temp_output", the output for a bundle with bundle id 15723 will be "gs://my-bucket/my_temp_output/15723".

    Final output files are written to the location specified by the FileBasedSink.FilenamePolicy. If no filename policy is specified, then the DefaultFilenamePolicy will be used. The directory that the files are written to is determined by the FileBasedSink.FilenamePolicy instance.

    Note that in the case of permanent failure of a bundle's write, no clean up of temporary files will occur.

    If there are no elements in the PCollection being written, no output will be generated.

    See Also:
    Serialized Form
    • Constructor Detail

      • WriteOperation

        public WriteOperation​(FileBasedSink<?,​DestinationT,​OutputT> sink)
        Constructs a WriteOperation using the default strategy for generating a temporary directory from the base output filename.

        Without windowing, the default is a uniquely named subdirectory of the provided tempDirectory, e.g. if tempDirectory is /path/to/foo/, the temporary directory will be /path/to/foo/.temp-beam-$uuid.

        With windowing, the default is a consistent named subdirectory of the provided tempDirectory, e.g. if tempDirectory is /path/to/foo/, the temporary directory will be /path/to/foo/.temp-beam. With windowing, unique subdirectories of the tempDirectory are not beneficial as they cannot be used for cleanup. By using a consistent directory, the created temp files are well-distributed beneath a common directory prefix, across both worker and pipeline executions. This is beneficial for filesystems such as GCS which can reuse autoscaling of the file metadata.

        Parameters:
        sink - the FileBasedSink that will be used to configure this write operation.
      • WriteOperation

        @Experimental(FILESYSTEM)
        public WriteOperation​(FileBasedSink<?,​DestinationT,​OutputT> sink,
                              ResourceId tempDirectory)
        Create a new WriteOperation.
        Parameters:
        sink - the FileBasedSink that will be used to configure this write operation.
        tempDirectory - the base directory to be used for temporary output files.
    • Method Detail

      • buildTemporaryFilename

        @Experimental(FILESYSTEM)
        protected static ResourceId buildTemporaryFilename​(ResourceId tempDirectory,
                                                           java.lang.String filename)
                                                    throws java.io.IOException
        Constructs a temporary file resource given the temporary directory and a filename.
        Throws:
        java.io.IOException
      • getTempDirectory

        public ResourceId getTempDirectory()
      • setWindowedWrites

        public void setWindowedWrites()
        Indicates that the operation will be performing windowed writes.
      • removeTemporaryFiles

        public void removeTemporaryFiles​(java.util.Collection<ResourceId> filenames)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object