public class XmlSink extends Object
Sink
that outputs records as XML-formatted elements. Writes a PCollection
of
records from JAXB-annotated classes to a single file location.
Given a PCollection containing records of type T that can be marshalled to XML elements, this Sink will produce a single file consisting of a single root element that contains all of the elements in the PCollection.
XML Sinks are created with a base filename to write to, a root element name that will be used for the root element of the output files, and a class to bind to an XML element. This class will be used in the marshalling of records in an input PCollection to their XML representation and must be able to be bound using JAXB annotations (checked at pipeline construction time).
XML Sinks can be written to using the Write
transform:
p.apply(Write.to( XmlSink.ofRecordClass(Type.class) .withRootElementName(root_element) .toFilenamePrefix(output_filename)));
For example, consider the following class with JAXB annotations:
@XmlRootElement(name = "word_count_result") @XmlType(propOrder = {"word", "frequency"}) public class WordFrequency { private String word; private long frequency; public WordFrequency() { } public WordFrequency(String word, long frequency) { this.word = word; this.frequency = frequency; } public void setWord(String word) { this.word = word; } public void setFrequency(long frequency) { this.frequency = frequency; } public long getFrequency() { return frequency; } public String getWord() { return word; } }
The following will produce XML output with a root element named "words" from a PCollection of WordFrequency objects:
p.apply(Write.to( XmlSink.ofRecordClass(WordFrequency.class) .withRootElement("words") .toFilenamePrefix(output_file)));
The output of which will look like:
<words>
<word_count_result>
<word>decreased</word>
<frequency>1</frequency>
</word_count_result>
<word_count_result>
<word>War</word>
<frequency>4</frequency>
</word_count_result>
<word_count_result>
<word>empress'</word>
<frequency>14</frequency>
</word_count_result>
<word_count_result>
<word>stoops</word>
<frequency>6</frequency>
</word_count_result>
...
</words>
Modifier and Type | Class and Description |
---|---|
static class |
XmlSink.Bound<T>
A
FileBasedSink that writes objects as XML elements. |
protected static class |
XmlSink.XmlWriteOperation<T>
Sink.WriteOperation for XML Sink s. |
protected static class |
XmlSink.XmlWriter<T>
A
Sink.Writer that can write objects as XML elements. |
Modifier and Type | Field and Description |
---|---|
protected static String |
XML_EXTENSION |
Constructor and Description |
---|
XmlSink() |
Modifier and Type | Method and Description |
---|---|
static XmlSink.Bound<?> |
write()
Returns a builder for an XmlSink.
|
static <T> XmlSink.Bound<T> |
writeOf(Class<T> klass,
String rootElementName,
String baseOutputFilename)
Returns an XmlSink that writes objects as XML entities.
|
protected static final String XML_EXTENSION
public static XmlSink.Bound<?> write()
XmlSink.Bound.ofRecordClass(java.lang.Class<T>)
, XmlSink.Bound.withRootElement(java.lang.String)
, and XmlSink.Bound.toFilenamePrefix(java.lang.String)
, respectively.public static <T> XmlSink.Bound<T> writeOf(Class<T> klass, String rootElementName, String baseOutputFilename)
Output files will have the name {baseOutputFilename}-0000i-of-0000n.xml where n is the number of output bundles that the Dataflow service divides the output into.
klass
- the class of the elements to write.rootElementName
- the enclosing root element.baseOutputFilename
- the output filename prefix.