T - the type of the values being transcodedpublic interface Coder<T> extends Serializable
Coder<T> defines how to encode and decode values of type T into byte streams.
All methods of a Coder are required to be thread safe.
Coder instances are serialized during job creation and deserialized
before use, via JSON serialization.
See SerializableCoder for an example of a Coder that adds a custom field to
the Coder serialization. It provides a constructor annotated with
JsonCreator, which is a factory method used when
deserializing a Coder instance.
See KvCoder for an example of a nested Coder type.
The binary format of a Coder is identified by getEncodingId(); be sure to
understand the requirements for evolving coder formats.
| Modifier and Type | Interface and Description |
|---|---|
static class |
Coder.Context
The context in which encoding or decoding is being done.
|
static class |
Coder.NonDeterministicException
Exception thrown by
verifyDeterministic() if the encoding is
not deterministic. |
| Modifier and Type | Method and Description |
|---|---|
CloudObject |
asCloudObject()
Returns the
CloudObject that represents this Coder. |
boolean |
consistentWithEquals()
Returns true if the encoded bytes of two objects are
equal only when they are also equal according to
Object.equals(). |
T |
decode(InputStream inStream,
Coder.Context context)
Decodes a value of type
T from the given input stream in
the given context. |
void |
encode(T value,
OutputStream outStream,
Coder.Context context)
Encodes the given value of type
T onto the given output stream
in the given context. |
Collection<String> |
getAllowedEncodings()
A collection of encodings supported by
decode(java.io.InputStream, com.google.cloud.dataflow.sdk.coders.Coder.Context) in addition to the encoding
from getEncodingId() (which is assumed supported). |
List<? extends Coder<?>> |
getCoderArguments()
If this is a
Coder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type. |
String |
getEncodingId()
An identifier for the binary format written by
encode(T, java.io.OutputStream, com.google.cloud.dataflow.sdk.coders.Coder.Context). |
boolean |
isRegisterByteSizeObserverCheap(T value,
Coder.Context context)
Returns whether
registerByteSizeObserver(T, com.google.cloud.dataflow.sdk.util.common.ElementByteSizeObserver, com.google.cloud.dataflow.sdk.coders.Coder.Context) cheap enough to
call for every element, that is, if this Coder can
calculate the byte size of the element to be coded in roughly
constant time (or lazily). |
void |
registerByteSizeObserver(T value,
ElementByteSizeObserver observer,
Coder.Context context)
Notifies the
ElementByteSizeObserver about the byte size
of the encoded value using this Coder. |
Object |
structuralValue(T value)
Returns an object with an
Object.equals() method
that represents structural equality on the argument. |
void |
verifyDeterministic()
Throw
Coder.NonDeterministicException if the coding is not deterministic. |
void encode(T value, OutputStream outStream, Coder.Context context) throws CoderException, IOException
T onto the given output stream
in the given context.IOException - if writing to the OutputStream fails
for some reasonCoderException - if the value could not be encoded for some reasonT decode(InputStream inStream, Coder.Context context) throws CoderException, IOException
T from the given input stream in
the given context. Returns the decoded value.IOException - if reading from the InputStream fails
for some reasonCoderException - if the value could not be decoded for some reasonList<? extends Coder<?>> getCoderArguments()
Coder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type.CloudObject asCloudObject()
CloudObject that represents this Coder.void verifyDeterministic()
throws Coder.NonDeterministicException
Coder.NonDeterministicException if the coding is not deterministic.
In order for a Coder to be considered deterministic,
the following must be true:
Object.equals()
or Comparable.compareTo(), if supported) have the same
encoding.
Coder always produces a canonical encoding, which is the
same for an instance of an object even if produced on different
computers at different times.
Coder.NonDeterministicException - if this coder is not deterministic.boolean consistentWithEquals()
Object.equals().
(and also implements a compatible Object.hasCode())
This most notably false for arrays. It will generally
be false when Object.equals() compares object identity,
rather than performing a semantic/structural comparison.
Object structuralValue(T value) throws Exception
Object.equals() method
that represents structural equality on the argument.
(and also implements a compatible Object.hashCode()).
For any two objects of type T, if their encoded bytes
are the same, then their structural values are equal
according to Object.equals().
Most notably, the structural value for an array coder should perform a structural comparison of the contents of the arrays, rather than the default behavior of comparing according to object identity.
See also consistentWithEquals().
Exceptionboolean isRegisterByteSizeObserverCheap(T value, Coder.Context context)
registerByteSizeObserver(T, com.google.cloud.dataflow.sdk.util.common.ElementByteSizeObserver, com.google.cloud.dataflow.sdk.coders.Coder.Context) cheap enough to
call for every element, that is, if this Coder can
calculate the byte size of the element to be coded in roughly
constant time (or lazily).
Not intended to be called by user code, but instead by
PipelineRunner
implementations.
void registerByteSizeObserver(T value, ElementByteSizeObserver observer, Coder.Context context) throws Exception
ElementByteSizeObserver about the byte size
of the encoded value using this Coder.
Not intended to be called by user code, but instead by
PipelineRunner
implementations.
Exception@Experimental(value=CODER_ENCODING_ID) String getEncodingId()
encode(T, java.io.OutputStream, com.google.cloud.dataflow.sdk.coders.Coder.Context).
This value, along with the fully qualified class name, forms an identifier for the binary format of this coder. Whenever this value changes, the new encoding is considered incompatible with the prior format: It is presumed that the prior version of the coder will be unable to correctly read the new format and the new version of the coder will be unable to correctly read the old format.
If the format is changed in a backwards-compatible way (the Coder can still accept data from
the prior format), such as by adding optional fields to a Protocol Buffer or Avro definition,
and you want Dataflow to understand that the new coder is compatible with the prior coder,
this value must remain unchanged. It is then the responsibility of decode(java.io.InputStream, com.google.cloud.dataflow.sdk.coders.Coder.Context) to correctly
read data from the prior format.
@Experimental(value=CODER_ENCODING_ID) Collection<String> getAllowedEncodings()
decode(java.io.InputStream, com.google.cloud.dataflow.sdk.coders.Coder.Context) in addition to the encoding
from getEncodingId() (which is assumed supported).
This information is not currently used for any purpose. It is descriptive only, and this method is subject to change.
getEncodingId()