Convenience method for transforming org.apache.spark.rdd.RDDs into org.apache.spark.sql.DataFrame> This is called once per batch on the org.apache.spark.rdd.RDD generated by the Extractor and the result is passed to the Loader.
Convenience method for transforming org.apache.spark.rdd.RDDs into org.apache.spark.sql.DataFrame> This is called once per batch on the org.apache.spark.rdd.RDD generated by the Extractor and the result is passed to the Loader.
The SQLContext that is used to run this pipeline. NOTE: If the pipeline is running in MemSQL Streamliner, this is an instance of com.memsql.spark.context.MemSQLContext, which has additional metadata about the MemSQL cluster.
The org.apache.spark.rdd.RDD for this batch generated by the Extractor.
The user defined configuration passed from MemSQL Ops.
A logger instance that is integrated with MemSQL Ops.
A org.apache.spark.sql.DataFrame with the transformed data to be loaded.
Initialization code for your Transformer.
Initialization code for your Transformer. This is called after instantiation of your Transformer and before Transformer.transform. The default implementation does nothing.
The SQLContext that is used to run this pipeline. NOTE: If the pipeline is running in MemSQL Streamliner, this is an instance of com.memsql.spark.context.MemSQLContext, which has additional metadata about the MemSQL cluster.
The user defined configuration passed from MemSQL Ops.
A logger instance that is integrated with MemSQL Ops.
Initialization code for this Extractor
Initialization code for this Extractor
The SQLContext that is used to run this pipeline. NOTE: If the pipeline is running in MemSQL Streamliner, this is an instance of com.memsql.spark.context.MemSQLContext, which has additional metadata about the MemSQL cluster.
The Transformer configuration passed from MemSQL Ops.
A logger instance that is integrated with MemSQL Ops.
Transforms the incoming org.apache.spark.rdd.RDD into a org.apache.spark.sql.DataFrame.
Transforms the incoming org.apache.spark.rdd.RDD into a org.apache.spark.sql.DataFrame.
The SQLContext that is used to run this pipeline. NOTE: If the pipeline is running in MemSQL Streamliner, this is an instance of com.memsql.spark.context.MemSQLContext, which has additional metadata about the MemSQL cluster.
The org.apache.spark.rdd.RDD generated by the Extractor for this batch.
The Transformer configuration passed from MemSQL Ops.
A logger instance that is integrated with MemSQL Ops.
A org.apache.spark.sql.DataFrame with the transformed data to be loaded.
Convenience wrapper around ByteArrayExtractor for initialization and transformation of extracted org.apache.spark.rdd.RDDs.