A named model for mutations coming from a CDC tool.
Options for a CdcModel:
Options for a CdcModel:
saveMode
specifies the behaviour when saving and the output uri already exists; valid values are:
format
specifies the data format to use; valid values are:
extraOptions
allows specifying any writer-specific options accepted by DataFrameReader/Writer.option
partitionBy
allows specifying columns to be used to partition the data by using different directories for
different values
specifies the behaviour when the output uri exists
specifies the format to use
extra options for the underlying writer
Base datastore model.
The HttpModel used by HttpWriter to send data with HTTP protocol
The HttpModel used by HttpWriter to send data with HTTP protocol
The httpModel name
The url to send the request to
The HTTP methods: GET, POST, PUT, PATCH, DELETE
The name of the DataFrame column to be used as http headers, it must be of type Map[String,String], if None, no header will be sent in the request, except for the content-type and content-encoding ones
The list of DataFrame columns to be rendered as json in the http request body. If the passed list is empty, all the fields, except the headers field (if any) will be rendered as a json object. If there is only one field, the behaviour is controlled by the structured field
The HttpCompression
The format of the request content
It enable the request body logger
Indicates how the request body will be rendered. The effect of this configuration has effect only if the DataFrame contains only one column to be sent and only if it is of ArrayType or MapType. If structured is true the array or map will always be enclosed in a json object, otherwise the map or the array will be at the top level of the json document. Input dataframe:
+---------+ | values | +---------+ |[3, 4, 5]| +---------+
Request with structured = true
{"values" : [3, 4, 5]}
Request with structured = false
[3, 4, 5]
A builder able to create instances of IndexModel.
A builder able to create instances of IndexModel.
The current Stage of the builder.
The kind of DataStore whose index is being built.
A model for grouping of topics.
A model for grouping of topics.
The name
field specifies the name of the model, which is used as the unique identifier for the model in the
models database.
The topicNameField
field specifies the field whose contents will be used as the name of the topic to which the
message will be sent when writing to Kafka. The field must be of type string. The original field will be left as-is,
so your schema must handle it (or you can use valueFieldsNames
).
The topicModelNames
contains the names of the topic model that constitute this grouping of topics.
The topic models that constitute this grouping of topics must: - consist of at least one topic model - be all different models - refer to different topics - use the same settings for everything but partitions and replicas
A model for a pipegraph, a processing pipeline abstraction.
A model for a pipegraph, a processing pipeline abstraction.
name of the pipegraph
description of the pipegraph
owner of the pipegraph
whether the pipegraph is from the WASP system
time of creation of the pipegraph
components describing processing built on Spark Structured Streaming
dashboard of the pipegraph
DataSource class.
DataSource class. The fields must be the same as the ones inside the MongoDB document associated with this model *
A named model for data stored as files on a raw datastore (eg HDFS).
A named model for data stored as files on a raw datastore (eg HDFS).
The uri
is augmented with time information if timed
is true. For writers this means whether to use uri
as-is or create timed namespaces (eg for HDFS, a subdirectory) inside; for readers whether to read from uri
as-is
or from the most recent timed namespace inside.
schema
is a json-encoded DataFrame schema, that is, a StructType. See DataType.fromJson and DataType.json.
options
control the underlying spark DataFrameWriter/Reader in the writers/readers using an instance of this model.
the name of the datastore
the uri where the data files reside
whether the uri must be augmented with time information
the schema of the data
the options for the datastore
Options for a raw datastore.
Options for a raw datastore.
saveMode
specifies the behaviour when saving and the output uri already exists; valid values are:
format
specifies the data format to use; valid values are:
extraOptions
allows specifying any writer-specific options accepted by DataFrameReader/Writer.option
partitionBy
allows specifying columns to be used to partition the data by using different directories for
different values
specifies the behaviour when the output uri exists
specifies the format to use
extra options for the underlying writer
A model for a reader, composed by a name, a datastoreModelName defining the datastore, a datastoreProduct defining the datastore software product to use, and any additional options needed to configure the reader.
Class representing a SqlSource model
Class representing a SqlSource model
The name of the SqlSource model
The name of the connection to use. N.B. have to be present in jdbc-subConfig
The name of the table
optional - Partition info (column, lowerBound, upperBound)
optional - Number of partitions
optional - Fetch size
A streaming processing component that leverages Spark's Structured Streaming API.
A streaming processing component that leverages Spark's Structured Streaming API.
unique name of the processing component
group of which the processing component is part
list of inputs for static datasets
streaming output
machine learning models to be used in the processing
strategy model that defines the processing
trigger interval to use, in milliseconds
has no effect at all
A model for a topic, that is, a message queue of some sort.
A model for a topic, that is, a message queue of some sort. Right now this means just Kafka topics.
the name of the topic, and doubles as the unique identifier for the model in the models database
marks the time at which the model was generated.
the number of partitions used for the topic when wasp creates it
the number of replicas used for the topic when wasp creates it
field specifies the format to use when encoding/decoding data to/from messages, allowed values are: avro, plaintext, json, binary
optionally specify a field whose contents will be used as a message key when
writing to Kafka. The field must be of type string or binary. The original
field will be left as-is, so you schema must handle it
(or you can use valueFieldsNames
).
allows you to optionally specify a field whose contents will be used
as message headers when writing to Kafka. The field must contain
an array of non-null objects which must have a non-null field
headerKey
of type string and a field headerValue
of type binary.
The original field will be left as-is, so your
schema must handle it (or you can use valueFieldsNames
).
allows you to specify a list of field names to be used to filter
the fields that get passed to the value encoding; with this you can
filter out fields that you don't need in the value, obviating the need
to handle them in the schema. This is especially useful when specifying
the keyFieldName
or headersFieldName
. For the avro and json topic
data type this is optional; for the plaintext and binary topic data types
this field is mandatory and the list must contain a single value field
name that has the proper type (string for plaintext and binary for binary).
if a schema registry should be used or not to handle the schema evolution (it makes sense only for avro message datatype)
the Avro schema to use when encoding the value, for plaintext and binary this field is ignored. For json and avro the field names need to match 1:1 with the valueFieldsNames or the schema output of the strategy
to use to compress messages
subject strategy to use when registering the schema to the schema registry for the schema registry implementations that support it. This property makes sense only for avro and only if useAvroSchemaManager is set to true
the schema to be used to encode the key as avro
A model for a writer, composed by a name, a datastoreModelName defining the datastore, a datastoreProduct defining the datastore software product to use, and any additional options needed to configure the writer.
(Since version 2.8.0)
Object used to represents all the fields used to represent a generic mutation inside the cdcPlugin, this object has been placed here because all the cdc adapters (like debezium, goldengate etc etc...) need to know how to map the fields into a compliant dataframe.
Companion object of IndexModelBuilder, contains the syntax.
Companion object of IndexModelBuilder, contains the syntax.
import IndexModelBuilder._ when you want to construct an IndexModel.
A named model for mutations coming from a CDC tool. This model should be used together with the Cdc writer plugin in order to write these mutations into a Delta Lake table on HDFS.
uri
is the location on HDFS where the Delta Table will be created.schema
is a json-encoded DataFrame schema, that is, a StructType. See DataType.fromJson and DataType.json.options
control the underlying spark DeltaLakeWriter in the writers using an instance of this model.the name of the datastore
the uri where the data are meant to be written
the schema of the data
the options for the datastore