NamedRdds - a trait that gives you safe, concurrent creation and access to named RDDs (the native SparkContext interface only has access to RDDs by numbers).
NamedRdds - a trait that gives you safe, concurrent creation and access to named RDDs (the native SparkContext interface only has access to RDDs by numbers). It facilitates easy sharing of RDDs amongst jobs sharing the same SparkContext. If two jobs simultaneously tries to create an RDD with the same name, only one will win and the other will retrieve the same one.
Note that to take advantage of NamedRddSupport, a job must mix this in and use the APIs here instead of
the native RDD cache()
, otherwise we will not know about the names.
This is the entry point for a Spark Job Server to execute Spark jobs.
This is the entry point for a Spark Job Server to execute Spark jobs. This function should create or reuse RDDs and return the result at the end, which the Job Server will cache or display.
a SparkContext or similar for the job. May be reused across jobs.
the job result
Assume that the job succeeds
Assume that the job succeeds
Always return SparkJobValid as this example will not do error checking
A Spark job example that implements the SparkJob trait and can be submitted to the job server.
Set the config with the sentence to split or count: input.string = "adsfasdf asdkf safksf a sdfa"
validate() returns SparkJobInvalid if there is no input.string