Extract specific fields from an RDD of Avro records.
Extract specific fields from an RDD of Avro records.
This operation extracts specific fields from an RDD of Avro records. Field values are wrapped in an Option.
the names of the fields to extract
a DStream of sequences containing the field values requested.
Extract serializable values from a DStream of Avro records.
Extract serializable values from a DStream of Avro records.
The (serializable) type of the value extracted from a record.
A function to extract the serializable values from a record.
A DStream containing the extracted values.
View the DStream of Avro records as io.divolte.spark.avro.Record instances.
View the DStream of Avro records as io.divolte.spark.avro.Record instances.
This operation must perform a deep copy of the Avro record with conversions to ensure that everything can be serialized. If you only wish to access a small subset of the Avro record, it can be more efficient to extract the fields you need using AvroRDDMagnet#fields.
a DStream of io.divolte.spark.avro.Record instances built from the Avro records.
Magnet for operations on a DStream containing Avro records.
This is motivated by the fact that Avro records (ironically) don't implement the java.io.Serializable interface, which means that the only safe operations are those which cannot result in Spark trying to serialize the records.
For convenience, we provide two operations:
the type of the decoded Kafka keys for each message.
the type of the deserialized Avro record.