T - the type of elements handled by this coderpublic class AvroCoder<T> extends StandardCoder<T>
Coder using Avro binary format.
The Avro schema is generated using reflection on the element type, using
Avro's
org.apache.avro.reflect.ReflectData,
and encoded as part of the Coder instance.
For complete details about schema generation and how it can be controlled please see the org.apache.avro.reflect package. Only concrete classes with a no-argument constructor can be mapped to Avro records. All inherited fields that are not static or transient are used. Fields are not permitted to be null unless annotated by org.apache.avro.reflect.Nullable or a org.apache.avro.reflect.Union containing null.
To use, specify the Coder type on a PCollection:
PCollection<MyCustomElement> records =
input.apply(...)
.setCoder(AvroCoder.of(MyCustomElement.class);
or annotate the element class using @DefaultCoder.
@DefaultCoder(AvroCoder.class)
public class MyCustomElement {
...
}
The implementation attempts to determine if the Avro encoding of the given type will satisfy
the criteria of Coder.verifyDeterministic() by inspecting both the type and the
Schema provided or generated by Avro. Only coders that are deterministic can be used in
GroupByKey operations.
| Modifier and Type | Class and Description |
|---|---|
protected static class |
AvroCoder.AvroDeterminismChecker
Helper class encapsulating the various pieces of state maintained by the
recursive walk used for checking if the encoding will be deterministic.
|
Coder.Context, Coder.NonDeterministicException| Modifier and Type | Field and Description |
|---|---|
static CoderProvider |
PROVIDER |
| Modifier | Constructor and Description |
|---|---|
protected |
AvroCoder(Class<T> type,
org.apache.avro.Schema schema) |
| Modifier and Type | Method and Description |
|---|---|
CloudObject |
asCloudObject()
Returns the
CloudObject that represents this Coder. |
org.apache.avro.io.DatumReader<T> |
createDatumReader()
Returns a new DatumReader that can be used to read from
an Avro file directly.
|
org.apache.avro.io.DatumWriter<T> |
createDatumWriter()
Returns a new DatumWriter that can be used to write to
an Avro file directly.
|
T |
decode(InputStream inStream,
Coder.Context context)
Decodes a value of type
T from the given input stream in
the given context. |
void |
encode(T value,
OutputStream outStream,
Coder.Context context)
Encodes the given value of type
T onto the given output stream
in the given context. |
List<? extends Coder<?>> |
getCoderArguments()
If this is a
Coder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type. |
String |
getEncodingId()
The encoding identifier is designed to support evolution as per the design of Avro
In order to use this class effectively, carefully read the Avro
documentation at
Schema Resolution
to ensure that the old and new schema match.
|
org.apache.avro.Schema |
getSchema()
Returns the schema used by this coder.
|
static <T> AvroCoder<T> |
of(Class<T> clazz)
Returns an
AvroCoder instance for the provided element class. |
static <T> AvroCoder<T> |
of(Class<T> type,
org.apache.avro.Schema schema)
Returns an
AvroCoder instance for the provided element type
using the provided Avro schema. |
static AvroCoder<org.apache.avro.generic.GenericRecord> |
of(org.apache.avro.Schema schema)
Returns an
AvroCoder instance for the Avro schema. |
static AvroCoder<?> |
of(String classType,
String schema) |
void |
verifyDeterministic()
Raises an exception describing reasons why the type may not be deterministically
encoded using the given Schema, the directBinaryEncoder, and the ReflectDatumWriter
or GenericDatumWriter.
|
consistentWithEquals, equals, getAllowedEncodings, getComponents, getEncodedElementByteSize, hashCode, isRegisterByteSizeObserverCheap, registerByteSizeObserver, structuralValue, toString, verifyDeterministic, verifyDeterministicpublic static final CoderProvider PROVIDER
public static <T> AvroCoder<T> of(Class<T> clazz)
AvroCoder instance for the provided element class.T - the element typepublic static AvroCoder<org.apache.avro.generic.GenericRecord> of(org.apache.avro.Schema schema)
AvroCoder instance for the Avro schema. The implicit
type is GenericRecord.public static <T> AvroCoder<T> of(Class<T> type, org.apache.avro.Schema schema)
AvroCoder instance for the provided element type
using the provided Avro schema.
If the type argument is GenericRecord, the schema may be arbitrary. Otherwise, the schema must correspond to the type provided.
T - the element typepublic static AvroCoder<?> of(String classType, String schema) throws ClassNotFoundException
ClassNotFoundExceptionpublic String getEncodingId()
In particular, this encoding identifier is guaranteed to be the same for AvroCoder
instances of the same principal class, and otherwise distinct. The schema is not included
in the identifier.
When modifying a class to be encoded as Avro, here are some guidelines; see the above link for greater detail.
required field.
optional fields, with sensible defaults.
Code consuming this message class should be prepared to support all versions of the class until it is certain that no remaining serialized instances exist.
If backwards incompatible changes must be made, the best recourse is to change the name of your class.
getEncodingId in interface Coder<T>getEncodingId in class StandardCoder<T>StandardCoder.getAllowedEncodings()public void encode(T value, OutputStream outStream, Coder.Context context) throws IOException
CoderT onto the given output stream
in the given context.IOException - if writing to the OutputStream fails
for some reasonCoderException - if the value could not be encoded for some reasonpublic T decode(InputStream inStream, Coder.Context context) throws IOException
CoderT from the given input stream in
the given context. Returns the decoded value.IOException - if reading from the InputStream fails
for some reasonCoderException - if the value could not be decoded for some reasonpublic List<? extends Coder<?>> getCoderArguments()
CoderCoder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type.public CloudObject asCloudObject()
CoderCloudObject that represents this Coder.asCloudObject in interface Coder<T>asCloudObject in class StandardCoder<T>public void verifyDeterministic()
throws Coder.NonDeterministicException
Coder.NonDeterministicException - if this coder is not deterministic.public org.apache.avro.io.DatumReader<T> createDatumReader()
public org.apache.avro.io.DatumWriter<T> createDatumWriter()
public org.apache.avro.Schema getSchema()