za.co.absa.abris.avro.read.confluent
This class does not hold a Schema Registry client instance.
This class does not hold a Schema Registry client instance. Instead, it relies on SchemaManager. This, this method configures the Schema Registry on SchemaManager.
This is here as a utility, so that users do not need to invoke SchemaManager in their codes at any time.
Converts the Avro binary payload into an Avro's GenericRecord.
Converts the Avro binary payload into an Avro's GenericRecord. Important highlights:
1. This uses the ScalaDatumReader to parse the bytes. 2. This takes into account Confluent's specific metadata included in the payload (e.g. schema id), thus, it will not work on regular binary Avro records. 3. If there is a topic defined in the constructor and access to Schema Registry is configured, the schema retrieved from the later will be considered the writer schema, otherwise, the reader schema passed to the constructor will be used as both, reader and writer (thus notice that either, topic or reader schema must be informed). 4. The Avro DatumReader is cached based on the schema id, thus, if a new id is received as part of the payload, a new DatumReader will be created for that id, with a new schema being retrieved, iff the topic is informed and Schema Registry is configured. 5. Although changes in the schema are supported, it is important to bear in mind that this class's main reason of existence is to parse GenericRecords that will be later converted into Spark Rows. This conversion relies on RowEncoders, which need to be instantiated once, outside this class. Thus, even though schema changes can be dealt with here, they cannot be translated into new RowEncoders, which could generate from exceptions to inconsistencies in the final data.
The only way to overcome the issue described in 5. is to change Spark code itself, which would then be able to change the RowEncoder instance on the fly as a new schema version is detected.
This class provides methods to deserialize Confluent binary Avro records into Spark Rows with schemas.
Please, invest some time in understanding how it works and above all, read the documentation for the method 'deserialize()'.