jobserver

Type Members

class AkkaClusterSupervisorActor extends InstrumentedActor

The AkkaClusterSupervisorActor launches Spark Contexts as external processes that connect back with the master node via Akka Cluster.
The AkkaClusterSupervisorActor launches Spark Contexts as external processes that connect back with the master node via Akka Cluster.
Currently, when the Supervisor gets a MemberUp message from another actor, it is assumed to be one starting up, and it will be asked to identify itself, and then the Supervisor will try to initialize it.
See the LocalContextSupervisorActor for normal config options. Here are ones specific to this class.
Configuration
```
deploy {
  manager-start-cmd = "./manager_start.sh"
}
```
class BroadcastPersister[T] extends NamedObjectPersister[NamedBroadcast[T]]

implementation of a NamedObjectPersister for Broadcast objects
trait CORSSupport extends AnyRef
trait ChunkEncodedStreamingSupport extends AnyRef
class ChunkEncodingActor extends Actor with ActorLogging

Performs sending back a response in streaming fashion using chunk encoding
trait ContextLike extends AnyRef

Represents a context based on SparkContext.
Represents a context based on SparkContext. Examples include: StreamingContext, SQLContext.
The Job Server can spin up not just a vanilla SparkContext, but anything that implements ContextLike.
class DataFramePersister extends NamedObjectPersister[NamedDataFrame]

implementation of a NamedObjectPersister for DataFrame objects
class DataManagerActor extends InstrumentedActor

An Actor that manages the data files stored by the job server to disc.
class JarManager extends InstrumentedActor

An Actor that manages the jars stored by the job server.
An Actor that manages the jars stored by the job server. It's important that threads do not try to load a class from a jar as a new one is replacing it, so using an actor to serialize requests is perfect.
class JavaSparkJob extends SparkJob

A class to make Java jobs easier to write.
A class to make Java jobs easier to write. In Java: public class MySparkJob extends JavaSparkJob {
class JobCache extends AnyRef

A cache for SparkJob classes.
A cache for SparkJob classes. A lot of times jobs are run repeatedly, and especially for low-latency jobs, why retrieve the jar and load it every single time?
class JobInfoActor extends InstrumentedActor
case class JobJarInfo(constructor: () ⇒ SparkJobBase, className: String, jarFilePath: String) extends Product with Serializable

class JobManagerActor extends InstrumentedActor

The JobManager actor supervises jobs running in a single SparkContext, as well as shared metadata.

The JobManager actor supervises jobs running in a single SparkContext, as well as shared metadata. It creates a SparkContext (or a StreamingContext etc. depending on the factory class) It also creates and supervises a JobResultActor and JobStatusActor, although an existing JobResultActor can be passed in as well.

contextConfig

num-cpu-cores = 4         # Total # of CPU cores to allocate across the cluster
memory-per-node = 512m    # -Xmx style memory string for total memory to use for executor on one node
dependent-jar-uris = ["local://opt/foo/my-foo-lib.jar"]
                          # URIs for dependent jars to load for entire context
context-factory = "spark.jobserver.context.DefaultSparkContextFactory"
spark.mesos.coarse = true  # per-context, rather than per-job, resource allocation
rdd-ttl = 24 h            # time-to-live for RDDs in a SparkContext.  Don't specify = forever
is-adhoc = false          # true if context is ad-hoc context
context.name = "sql"      # Name of context

global configuration

spark {
  jobserver {
    max-jobs-per-context = 16      # Number of jobs that can be run simultaneously per context
  }
}

class JobResultActor extends InstrumentedActor with YammerMetrics

It is an actor to manage results that are returned from jobs.
It is an actor to manage results that are returned from jobs.
TODO: support multiple subscribers for same JobID
class JobServerNamedObjects extends NamedObjects

An implementation of NamedObjects API for the Job Server.
An implementation of NamedObjects API for the Job Server. Note that this contains code that executes on the same thread as the job. Uses spray caching for cache references to named objects and to avoid that the same object is created multiple times
class JobStatusActor extends InstrumentedActor with YammerMetrics

It is an actor to manage job status updates

class LocalContextSupervisorActor extends InstrumentedActor

This class starts and stops JobManagers / Contexts in-process.

This class starts and stops JobManagers / Contexts in-process. It is responsible for watching out for the death of contexts/JobManagers.

Auto context start configuration

Contexts can be configured to be created automatically at job server initialization. Configuration example:

spark {
  contexts {
    olap-demo {
      num-cpu-cores = 4            # Number of cores to allocate.  Required.
      memory-per-node = 1024m      # Executor memory per node, -Xmx style eg 512m, 1G, etc.
    }
  }
}

Other configuration

spark {
  jobserver {
    context-creation-timeout = 15 s
    yarn-context-creation-timeout = 40 s
  }

  # Default settings for all context creation
  context-settings {
    spark.mesos.coarse = true
  }
}

case class NamedBroadcast[T](broadcast: Broadcast[T]) extends NamedObject with Product with Serializable

wrapper for named objects of type Broadcast
case class NamedDataFrame(df: DataFrame, forceComputation: Boolean, storageLevel: StorageLevel) extends NamedObject with Product with Serializable

wrapper for named objects of type DataFrame
trait NamedObject extends AnyRef
abstract class NamedObjectPersister[O <: NamedObject] extends AnyRef

implementations of this abstract class should handle the specifics of each named object's persistence
trait NamedObjectSupport extends AnyRef
trait NamedObjects extends AnyRef

NamedObjects - a trait that gives you safe, concurrent creation and access to named objects such as RDDs or DataFrames (the native SparkContext interface only has access to RDDs by numbers).
NamedObjects - a trait that gives you safe, concurrent creation and access to named objects such as RDDs or DataFrames (the native SparkContext interface only has access to RDDs by numbers). It facilitates easy sharing of data objects amongst jobs sharing the same SparkContext. If two jobs simultaneously tries to create a data object with the same name and in the same namespace, only one will win and the other will retrieve the same one.
Note that to take advantage of NamedObjectSupport, a job must mix this in and use the APIs here instead of the native DataFrame/RDD cache(), otherwise we will not know about the names.
class NamedObjectsTestJob extends SparkJob with NamedObjectSupport

A test job that accepts a SQLContext, as opposed to the regular SparkContext.
A test job that accepts a SQLContext, as opposed to the regular SparkContext. Just initializes some dummy data into a table.
case class NamedRDD[T](rdd: RDD[T], forceComputation: Boolean, storageLevel: StorageLevel) extends NamedObject with Product with Serializable

wrapper for named objects of type RDD[T]
trait NamedRddSupport extends NamedObjectSupport

Note
please use NamedObjectSupport instead !
class RDDPersister[T] extends NamedObjectPersister[NamedRDD[T]]

implementation of a NamedObjectPersister for RDD[T] objects
trait SparkHiveJob extends SparkJobBase
trait SparkJob extends SparkJobBase
trait SparkJobBase extends AnyRef

This trait is the main API for Spark jobs submitted to the Job Server.
case class SparkJobInvalid(reason: String) extends SparkJobValidation with Product with Serializable
sealed trait SparkJobValidation extends AnyRef
trait SparkSqlJob extends SparkJobBase
trait SparkStreamingJob extends SparkJobBase

Defines a Job that runs on a StreamingContext, note that these jobs are usually long running jobs and there's (yet) no way in Spark Job Server to query the status of these jobs.
trait StatusMessage extends AnyRef
case class StoreJar(appName: String, jarBytes: Array[Byte]) extends Product with Serializable

Message for storing a JAR for an application given the byte array of the JAR file
case class StoreLocalJars(localJars: Map[String, String]) extends Product with Serializable

Message for storing one or more local JARs based on the given map.
Message for storing one or more local JARs based on the given map.
localJars
Map where the key is the appName and the value is the local path to the JAR.
class WebApi extends HttpService with CommonRoutes with DataRoutes with SJSAuthenticator with CORSSupport with ChunkEncodedStreamingSupport

Value Members

object ChunkEncodingActor
object CommonMessages
object ContextSupervisor

Messages common to all ContextSupervisors
object DataManagerActor
object HiveLoaderJob extends SparkHiveJob

A test job that accepts a HiveContext, as opposed to the regular SparkContext.
A test job that accepts a HiveContext, as opposed to the regular SparkContext. Initializes some dummy data into a table, reads it back out, and returns a count (Will create Hive metastore at job-server/metastore_db if Hive isn't configured)
object HiveTestJob extends SparkHiveJob

This job simply runs the Hive SQL in the config.
object InvalidJar extends Product with Serializable
object JarStored extends Product with Serializable
object JobInfoActor
object JobManager

The JobManager is the main entry point for the forked JVM process running an individual SparkContext.
The JobManager is the main entry point for the forked JVM process running an individual SparkContext. It is passed $workDir $clusterAddr $configFile
Each forked process has a working directory with log files for that context only, plus a file "context.conf" which contains context-specific settings.
object JobManagerActor
object JobServer

The Spark Job Server is a web service that allows users to submit and run Spark jobs, check status, and view results.
The Spark Job Server is a web service that allows users to submit and run Spark jobs, check status, and view results. It may offer other goodies in the future. It only takes in one optional command line arg, a config file to override the default (and you can still use -Dsetting=value to override) -- Configuration --
```
spark {
  master = "local"
  jobserver {
    port = 8090
  }
}
```
object JobStatusActor
object KMeansExample extends SparkJob with NamedRddSupport

A Spark job example that implements the SparkJob trait and can be submitted to the job server.
A Spark job example that implements the SparkJob trait and can be submitted to the job server.
Set the config with the sentence to split or count: input.string = "adsfasdf asdkf safksf a sdfa"
validate() returns SparkJobInvalid if there is no input.string
object ListJars extends Product with Serializable

Message requesting a listing of the available JARs
object NamedObjectsTestJobConfig
object SparkJobValid extends SparkJobValidation with Product with Serializable
object SqlLoaderJob extends SparkSqlJob

A test job that accepts a SQLContext, as opposed to the regular SparkContext.
A test job that accepts a SQLContext, as opposed to the regular SparkContext. Just initializes some dummy data into a table.
object SqlTestJob extends SparkSqlJob

This job simply runs the SQL in the config.
object StreamingTestJob extends SparkStreamingJob

Annotations
@VisibleForTesting()
object WebApi
package auth
package context
package io
package routes
package util

package jobserver

Type Members

class AkkaClusterSupervisorActor extends InstrumentedActor

Configuration

class BroadcastPersister[T] extends NamedObjectPersister[NamedBroadcast[T]]

trait CORSSupport extends AnyRef

trait ChunkEncodedStreamingSupport extends AnyRef

class ChunkEncodingActor extends Actor with ActorLogging

trait ContextLike extends AnyRef

class DataFramePersister extends NamedObjectPersister[NamedDataFrame]

class DataManagerActor extends InstrumentedActor

class JarManager extends InstrumentedActor

class JavaSparkJob extends SparkJob

class JobCache extends AnyRef

class JobInfoActor extends InstrumentedActor

case class JobJarInfo(constructor: () ⇒ SparkJobBase, className: String, jarFilePath: String) extends Product with Serializable

class JobManagerActor extends InstrumentedActor

contextConfig

global configuration

class JobResultActor extends InstrumentedActor with YammerMetrics

class JobServerNamedObjects extends NamedObjects

class JobStatusActor extends InstrumentedActor with YammerMetrics

class LocalContextSupervisorActor extends InstrumentedActor

Auto context start configuration

Other configuration

case class NamedBroadcast[T](broadcast: Broadcast[T]) extends NamedObject with Product with Serializable

case class NamedDataFrame(df: DataFrame, forceComputation: Boolean, storageLevel: StorageLevel) extends NamedObject with Product with Serializable

trait NamedObject extends AnyRef

abstract class NamedObjectPersister[O <: NamedObject] extends AnyRef

trait NamedObjectSupport extends AnyRef

trait NamedObjects extends AnyRef

class NamedObjectsTestJob extends SparkJob with NamedObjectSupport

case class NamedRDD[T](rdd: RDD[T], forceComputation: Boolean, storageLevel: StorageLevel) extends NamedObject with Product with Serializable

trait NamedRddSupport extends NamedObjectSupport

class RDDPersister[T] extends NamedObjectPersister[NamedRDD[T]]

trait SparkHiveJob extends SparkJobBase

trait SparkJob extends SparkJobBase

trait SparkJobBase extends AnyRef

case class SparkJobInvalid(reason: String) extends SparkJobValidation with Product with Serializable

sealed trait SparkJobValidation extends AnyRef

trait SparkSqlJob extends SparkJobBase

trait SparkStreamingJob extends SparkJobBase

trait StatusMessage extends AnyRef

case class StoreJar(appName: String, jarBytes: Array[Byte]) extends Product with Serializable

case class StoreLocalJars(localJars: Map[String, String]) extends Product with Serializable

class WebApi extends HttpService with CommonRoutes with DataRoutes with SJSAuthenticator with CORSSupport with ChunkEncodedStreamingSupport

Value Members

object ChunkEncodingActor

object CommonMessages

object ContextSupervisor

object DataManagerActor

object HiveLoaderJob extends SparkHiveJob

object HiveTestJob extends SparkHiveJob

object InvalidJar extends Product with Serializable

object JarStored extends Product with Serializable

object JobInfoActor

object JobManager

object JobManagerActor

object JobServer

object JobStatusActor

object KMeansExample extends SparkJob with NamedRddSupport

object ListJars extends Product with Serializable

object NamedObjectsTestJobConfig

object SparkJobValid extends SparkJobValidation with Product with Serializable

object SqlLoaderJob extends SparkSqlJob

object SqlTestJob extends SparkSqlJob

object StreamingTestJob extends SparkStreamingJob

object WebApi

package auth

package context

package io

package routes

package util

Ungrouped