T
- the type of values written to the sink.public abstract class FileBasedSink<T> extends Sink<T>
Sink
for file-based output. An implementation of FileBasedSink writes file-based
output and defines the format of output files (how values are written, headers/footers, MIME
type, etc.).
At pipeline construction time, the methods of FileBasedSink are called to validate the sink
and to create a Sink.WriteOperation
that manages the process of writing to the sink.
The process of writing to file-based sink is as follows:
Supported file systems are those registered with IOChannelUtils
.
Modifier and Type | Class and Description |
---|---|
static class |
FileBasedSink.FileBasedWriteOperation<T>
Abstract
Sink.WriteOperation that manages the process of writing to a
FileBasedSink . |
static class |
FileBasedSink.FileBasedWriter<T>
Abstract
Sink.Writer that writes a bundle to a FileBasedSink . |
static class |
FileBasedSink.FileResult
Result of a single bundle write.
|
Sink.WriteOperation<T,WriteT>, Sink.Writer<T,WriteT>
Modifier and Type | Field and Description |
---|---|
protected String |
baseOutputFilename
Base filename for final output files.
|
protected String |
extension
The extension to be used for the final output files.
|
protected String |
fileNamingTemplate
Naming template for output files.
|
Constructor and Description |
---|
FileBasedSink(String baseOutputFilename,
String extension)
Construct a FileBasedSink with the given base output filename and extension.
|
FileBasedSink(String baseOutputFilename,
String extension,
String fileNamingTemplate)
Construct a FileBasedSink with the given base output filename, extension, and file naming
template.
|
Modifier and Type | Method and Description |
---|---|
abstract FileBasedSink.FileBasedWriteOperation<T> |
createWriteOperation(PipelineOptions options)
Return a subclass of
FileBasedSink.FileBasedWriteOperation that will manage the write
to the sink. |
String |
getBaseOutputFilename()
Returns the base output filename for this file based sink.
|
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
void |
validate(PipelineOptions options)
Perform pipeline-construction-time validation.
|
protected final String baseOutputFilename
protected final String extension
protected final String fileNamingTemplate
ShardNameTemplate
for a description of
possible naming templates. Default is ShardNameTemplate.INDEX_OF_MAX
.public FileBasedSink(String baseOutputFilename, String extension)
public FileBasedSink(String baseOutputFilename, String extension, String fileNamingTemplate)
See ShardNameTemplate
for a description of file naming templates.
public String getBaseOutputFilename()
public void validate(PipelineOptions options)
Preconditions
in the implementation of this method.public abstract FileBasedSink.FileBasedWriteOperation<T> createWriteOperation(PipelineOptions options)
FileBasedSink.FileBasedWriteOperation
that will manage the write
to the sink.createWriteOperation
in class Sink<T>
public void populateDisplayData(DisplayData.Builder builder)
Sink
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call
super.populateDisplayData(builder)
in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder)
to use
the namespace of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class Sink<T>
builder
- The builder to populate with display data.HasDisplayData