Class FileBasedSink.WriteOperation<DestinationT,OutputT>
- java.lang.Object
-
- org.apache.beam.sdk.io.FileBasedSink.WriteOperation<DestinationT,OutputT>
-
- Type Parameters:
OutputT
- the type of values written to the sink.
- All Implemented Interfaces:
java.io.Serializable
- Enclosing class:
- FileBasedSink<UserT,DestinationT,OutputT>
public abstract static class FileBasedSink.WriteOperation<DestinationT,OutputT> extends java.lang.Object implements java.io.Serializable
Abstract operation that manages the process of writing toFileBasedSink
.The primary responsibilities of the WriteOperation is the management of output files. During a write,
FileBasedSink.Writer
s write bundles to temporary file locations. After the bundles have been written,finalizeDestination(DestinationT, org.apache.beam.sdk.transforms.windowing.BoundedWindow, java.lang.Integer, java.util.Collection<org.apache.beam.sdk.io.FileBasedSink.FileResult<DestinationT>>)
is given a list of the temporary files containing the output bundles.- During finalize, these temporary files are copied to final output locations and named according to a file naming template.
- Finally, any temporary files that were created during the write are removed.
Subclass implementations of WriteOperation must implement
createWriter()
to return a concrete FileBasedSinkWriter.Temporary and Output File Naming:
During the write, bundles are written to temporary files using the tempDirectory that can be provided via the constructor of WriteOperation. These temporary files will be named
{tempDirectory}/{bundleId}
, where bundleId is the unique id of the bundle. For example, if tempDirectory is "gs://my-bucket/my_temp_output", the output for a bundle with bundle id 15723 will be "gs://my-bucket/my_temp_output/15723".Final output files are written to the location specified by the
FileBasedSink.FilenamePolicy
. If no filename policy is specified, then theDefaultFilenamePolicy
will be used. The directory that the files are written to is determined by theFileBasedSink.FilenamePolicy
instance.Note that in the case of permanent failure of a bundle's write, no clean up of temporary files will occur.
If there are no elements in the PCollection being written, no output will be generated.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected FileBasedSink<?,DestinationT,OutputT>
sink
The Sink that this WriteOperation will write to.protected boolean
windowedWrites
Whether windowed writes are being used.
-
Constructor Summary
Constructors Constructor Description WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink)
Constructs a WriteOperation using the default strategy for generating a temporary directory from the base output filename.WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink, ResourceId tempDirectory)
Create a new WriteOperation.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected static ResourceId
buildTemporaryFilename(ResourceId tempDirectory, java.lang.String filename)
Constructs a temporary file resource given the temporary directory and a filename.abstract FileBasedSink.Writer<DestinationT,OutputT>
createWriter()
Clients must implement to return a subclass ofFileBasedSink.Writer
.protected java.util.List<KV<FileBasedSink.FileResult<DestinationT>,ResourceId>>
finalizeDestination(@Nullable DestinationT dest, @Nullable BoundedWindow window, @Nullable java.lang.Integer numShards, java.util.Collection<FileBasedSink.FileResult<DestinationT>> existingResults)
FileBasedSink<?,DestinationT,OutputT>
getSink()
Returns the FileBasedSink for this write operation.ResourceId
getTempDirectory()
void
removeTemporaryFiles(java.util.Collection<ResourceId> filenames)
void
setWindowedWrites()
Indicates that the operation will be performing windowed writes.java.lang.String
toString()
-
-
-
Field Detail
-
sink
protected final FileBasedSink<?,DestinationT,OutputT> sink
The Sink that this WriteOperation will write to.
-
windowedWrites
@Experimental(FILESYSTEM) protected boolean windowedWrites
Whether windowed writes are being used.
-
-
Constructor Detail
-
WriteOperation
public WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink)
Constructs a WriteOperation using the default strategy for generating a temporary directory from the base output filename.Without windowing, the default is a uniquely named subdirectory of the provided tempDirectory, e.g. if tempDirectory is /path/to/foo/, the temporary directory will be /path/to/foo/.temp-beam-$uuid.
With windowing, the default is a consistent named subdirectory of the provided tempDirectory, e.g. if tempDirectory is /path/to/foo/, the temporary directory will be /path/to/foo/.temp-beam. With windowing, unique subdirectories of the tempDirectory are not beneficial as they cannot be used for cleanup. By using a consistent directory, the created temp files are well-distributed beneath a common directory prefix, across both worker and pipeline executions. This is beneficial for filesystems such as GCS which can reuse autoscaling of the file metadata.
- Parameters:
sink
- the FileBasedSink that will be used to configure this write operation.
-
WriteOperation
@Experimental(FILESYSTEM) public WriteOperation(FileBasedSink<?,DestinationT,OutputT> sink, ResourceId tempDirectory)
Create a new WriteOperation.- Parameters:
sink
- the FileBasedSink that will be used to configure this write operation.tempDirectory
- the base directory to be used for temporary output files.
-
-
Method Detail
-
buildTemporaryFilename
@Experimental(FILESYSTEM) protected static ResourceId buildTemporaryFilename(ResourceId tempDirectory, java.lang.String filename) throws java.io.IOException
Constructs a temporary file resource given the temporary directory and a filename.- Throws:
java.io.IOException
-
getTempDirectory
public ResourceId getTempDirectory()
-
createWriter
public abstract FileBasedSink.Writer<DestinationT,OutputT> createWriter() throws java.lang.Exception
Clients must implement to return a subclass ofFileBasedSink.Writer
. This method must not mutate the state of the object.- Throws:
java.lang.Exception
-
setWindowedWrites
public void setWindowedWrites()
Indicates that the operation will be performing windowed writes.
-
removeTemporaryFiles
public void removeTemporaryFiles(java.util.Collection<ResourceId> filenames) throws java.io.IOException
- Throws:
java.io.IOException
-
finalizeDestination
@Experimental(FILESYSTEM) protected final java.util.List<KV<FileBasedSink.FileResult<DestinationT>,ResourceId>> finalizeDestination(@Nullable DestinationT dest, @Nullable BoundedWindow window, @Nullable java.lang.Integer numShards, java.util.Collection<FileBasedSink.FileResult<DestinationT>> existingResults) throws java.lang.Exception
- Throws:
java.lang.Exception
-
getSink
public FileBasedSink<?,DestinationT,OutputT> getSink()
Returns the FileBasedSink for this write operation.
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-