Package

spark

jobserver

Permalink

package jobserver

Visibility
  1. Public
  2. All

Type Members

  1. class AkkaClusterSupervisorActor extends InstrumentedActor

    Permalink

    The AkkaClusterSupervisorActor launches Spark Contexts as external processes that connect back with the master node via Akka Cluster.

    The AkkaClusterSupervisorActor launches Spark Contexts as external processes that connect back with the master node via Akka Cluster.

    Currently, when the Supervisor gets a MemberUp message from another actor, it is assumed to be one starting up, and it will be asked to identify itself, and then the Supervisor will try to initialize it.

    See the LocalContextSupervisorActor for normal config options. Here are ones specific to this class.

    Configuration

    deploy {
      manager-start-cmd = "./manager_start.sh"
    }
  2. class BroadcastPersister[T] extends NamedObjectPersister[NamedBroadcast[T]]

    Permalink

    implementation of a NamedObjectPersister for Broadcast objects

  3. trait CORSSupport extends AnyRef

    Permalink
  4. trait ChunkEncodedStreamingSupport extends AnyRef

    Permalink
  5. class ChunkEncodingActor extends Actor with ActorLogging

    Permalink

    Performs sending back a response in streaming fashion using chunk encoding

  6. trait ContextLike extends AnyRef

    Permalink

    Represents a context based on SparkContext.

    Represents a context based on SparkContext. Examples include: StreamingContext, SQLContext.

    The Job Server can spin up not just a vanilla SparkContext, but anything that implements ContextLike.

  7. class DataFramePersister extends NamedObjectPersister[NamedDataFrame]

    Permalink

    implementation of a NamedObjectPersister for DataFrame objects

  8. class DataManagerActor extends InstrumentedActor

    Permalink

    An Actor that manages the data files stored by the job server to disc.

  9. class JarManager extends InstrumentedActor

    Permalink

    An Actor that manages the jars stored by the job server.

    An Actor that manages the jars stored by the job server. It's important that threads do not try to load a class from a jar as a new one is replacing it, so using an actor to serialize requests is perfect.

  10. class JavaSparkJob extends SparkJob

    Permalink

    A class to make Java jobs easier to write.

    A class to make Java jobs easier to write. In Java: public class MySparkJob extends JavaSparkJob {

  11. class JobCache extends AnyRef

    Permalink

    A cache for SparkJob classes.

    A cache for SparkJob classes. A lot of times jobs are run repeatedly, and especially for low-latency jobs, why retrieve the jar and load it every single time?

  12. class JobInfoActor extends InstrumentedActor

    Permalink
  13. case class JobJarInfo(constructor: () ⇒ SparkJobBase, className: String, jarFilePath: String) extends Product with Serializable

    Permalink
  14. class JobManagerActor extends InstrumentedActor

    Permalink

    The JobManager actor supervises jobs running in a single SparkContext, as well as shared metadata.

    The JobManager actor supervises jobs running in a single SparkContext, as well as shared metadata. It creates a SparkContext (or a StreamingContext etc. depending on the factory class) It also creates and supervises a JobResultActor and JobStatusActor, although an existing JobResultActor can be passed in as well.

    contextConfig

    num-cpu-cores = 4         # Total # of CPU cores to allocate across the cluster
    memory-per-node = 512m    # -Xmx style memory string for total memory to use for executor on one node
    dependent-jar-uris = ["local://opt/foo/my-foo-lib.jar"]
                              # URIs for dependent jars to load for entire context
    context-factory = "spark.jobserver.context.DefaultSparkContextFactory"
    spark.mesos.coarse = true  # per-context, rather than per-job, resource allocation
    rdd-ttl = 24 h            # time-to-live for RDDs in a SparkContext.  Don't specify = forever
    is-adhoc = false          # true if context is ad-hoc context
    context.name = "sql"      # Name of context

    global configuration

    spark {
      jobserver {
        max-jobs-per-context = 16      # Number of jobs that can be run simultaneously per context
      }
    }
  15. class JobResultActor extends InstrumentedActor with YammerMetrics

    Permalink

    It is an actor to manage results that are returned from jobs.

    It is an actor to manage results that are returned from jobs.

    TODO: support multiple subscribers for same JobID

  16. class JobServerNamedObjects extends NamedObjects

    Permalink

    An implementation of NamedObjects API for the Job Server.

    An implementation of NamedObjects API for the Job Server. Note that this contains code that executes on the same thread as the job. Uses spray caching for cache references to named objects and to avoid that the same object is created multiple times

  17. class JobStatusActor extends InstrumentedActor with YammerMetrics

    Permalink

    It is an actor to manage job status updates

  18. class LocalContextSupervisorActor extends InstrumentedActor

    Permalink

    This class starts and stops JobManagers / Contexts in-process.

    This class starts and stops JobManagers / Contexts in-process. It is responsible for watching out for the death of contexts/JobManagers.

    Auto context start configuration

    Contexts can be configured to be created automatically at job server initialization. Configuration example:

    spark {
      contexts {
        olap-demo {
          num-cpu-cores = 4            # Number of cores to allocate.  Required.
          memory-per-node = 1024m      # Executor memory per node, -Xmx style eg 512m, 1G, etc.
        }
      }
    }

    Other configuration

    spark {
      jobserver {
        context-creation-timeout = 15 s
        yarn-context-creation-timeout = 40 s
      }
    
      # Default settings for all context creation
      context-settings {
        spark.mesos.coarse = true
      }
    }
  19. case class NamedBroadcast[T](broadcast: Broadcast[T]) extends NamedObject with Product with Serializable

    Permalink

    wrapper for named objects of type Broadcast

  20. case class NamedDataFrame(df: DataFrame, forceComputation: Boolean, storageLevel: StorageLevel) extends NamedObject with Product with Serializable

    Permalink

    wrapper for named objects of type DataFrame

  21. trait NamedObject extends AnyRef

    Permalink
  22. abstract class NamedObjectPersister[O <: NamedObject] extends AnyRef

    Permalink

    implementations of this abstract class should handle the specifics of each named object's persistence

  23. trait NamedObjectSupport extends AnyRef

    Permalink
  24. trait NamedObjects extends AnyRef

    Permalink

    NamedObjects - a trait that gives you safe, concurrent creation and access to named objects such as RDDs or DataFrames (the native SparkContext interface only has access to RDDs by numbers).

    NamedObjects - a trait that gives you safe, concurrent creation and access to named objects such as RDDs or DataFrames (the native SparkContext interface only has access to RDDs by numbers). It facilitates easy sharing of data objects amongst jobs sharing the same SparkContext. If two jobs simultaneously tries to create a data object with the same name and in the same namespace, only one will win and the other will retrieve the same one.

    Note that to take advantage of NamedObjectSupport, a job must mix this in and use the APIs here instead of the native DataFrame/RDD cache(), otherwise we will not know about the names.

  25. class NamedObjectsTestJob extends SparkJob with NamedObjectSupport

    Permalink

    A test job that accepts a SQLContext, as opposed to the regular SparkContext.

    A test job that accepts a SQLContext, as opposed to the regular SparkContext. Just initializes some dummy data into a table.

  26. case class NamedRDD[T](rdd: RDD[T], forceComputation: Boolean, storageLevel: StorageLevel) extends NamedObject with Product with Serializable

    Permalink

    wrapper for named objects of type RDD[T]

  27. trait NamedRddSupport extends NamedObjectSupport

    Permalink

    Note

    please use NamedObjectSupport instead !

  28. class RDDPersister[T] extends NamedObjectPersister[NamedRDD[T]]

    Permalink

    implementation of a NamedObjectPersister for RDD[T] objects

  29. trait SparkHiveJob extends SparkJobBase

    Permalink
  30. trait SparkJob extends SparkJobBase

    Permalink
  31. trait SparkJobBase extends AnyRef

    Permalink

    This trait is the main API for Spark jobs submitted to the Job Server.

  32. case class SparkJobInvalid(reason: String) extends SparkJobValidation with Product with Serializable

    Permalink
  33. sealed trait SparkJobValidation extends AnyRef

    Permalink
  34. trait SparkSqlJob extends SparkJobBase

    Permalink
  35. trait SparkStreamingJob extends SparkJobBase

    Permalink

    Defines a Job that runs on a StreamingContext, note that these jobs are usually long running jobs and there's (yet) no way in Spark Job Server to query the status of these jobs.

  36. trait StatusMessage extends AnyRef

    Permalink
  37. case class StoreJar(appName: String, jarBytes: Array[Byte]) extends Product with Serializable

    Permalink

    Message for storing a JAR for an application given the byte array of the JAR file

  38. case class StoreLocalJars(localJars: Map[String, String]) extends Product with Serializable

    Permalink

    Message for storing one or more local JARs based on the given map.

    Message for storing one or more local JARs based on the given map.

    localJars

    Map where the key is the appName and the value is the local path to the JAR.

  39. class WebApi extends HttpService with CommonRoutes with DataRoutes with SJSAuthenticator with CORSSupport with ChunkEncodedStreamingSupport

    Permalink

Value Members

  1. object ChunkEncodingActor

    Permalink
  2. object CommonMessages

    Permalink
  3. object ContextSupervisor

    Permalink

    Messages common to all ContextSupervisors

  4. object DataManagerActor

    Permalink
  5. object HiveLoaderJob extends SparkHiveJob

    Permalink

    A test job that accepts a HiveContext, as opposed to the regular SparkContext.

    A test job that accepts a HiveContext, as opposed to the regular SparkContext. Initializes some dummy data into a table, reads it back out, and returns a count (Will create Hive metastore at job-server/metastore_db if Hive isn't configured)

  6. object HiveTestJob extends SparkHiveJob

    Permalink

    This job simply runs the Hive SQL in the config.

  7. object InvalidJar extends Product with Serializable

    Permalink
  8. object JarStored extends Product with Serializable

    Permalink
  9. object JobInfoActor

    Permalink
  10. object JobManager

    Permalink

    The JobManager is the main entry point for the forked JVM process running an individual SparkContext.

    The JobManager is the main entry point for the forked JVM process running an individual SparkContext. It is passed $workDir $clusterAddr $configFile

    Each forked process has a working directory with log files for that context only, plus a file "context.conf" which contains context-specific settings.

  11. object JobManagerActor

    Permalink
  12. object JobServer

    Permalink

    The Spark Job Server is a web service that allows users to submit and run Spark jobs, check status, and view results.

    The Spark Job Server is a web service that allows users to submit and run Spark jobs, check status, and view results. It may offer other goodies in the future. It only takes in one optional command line arg, a config file to override the default (and you can still use -Dsetting=value to override) -- Configuration --

    spark {
      master = "local"
      jobserver {
        port = 8090
      }
    }
  13. object JobStatusActor

    Permalink
  14. object KMeansExample extends SparkJob with NamedRddSupport

    Permalink

    A Spark job example that implements the SparkJob trait and can be submitted to the job server.

    A Spark job example that implements the SparkJob trait and can be submitted to the job server.

    Set the config with the sentence to split or count: input.string = "adsfasdf asdkf safksf a sdfa"

    validate() returns SparkJobInvalid if there is no input.string

  15. object ListJars extends Product with Serializable

    Permalink

    Message requesting a listing of the available JARs

  16. object NamedObjectsTestJobConfig

    Permalink
  17. object SparkJobValid extends SparkJobValidation with Product with Serializable

    Permalink
  18. object SqlLoaderJob extends SparkSqlJob

    Permalink

    A test job that accepts a SQLContext, as opposed to the regular SparkContext.

    A test job that accepts a SQLContext, as opposed to the regular SparkContext. Just initializes some dummy data into a table.

  19. object SqlTestJob extends SparkSqlJob

    Permalink

    This job simply runs the SQL in the config.

  20. object StreamingTestJob extends SparkStreamingJob

    Permalink
    Annotations
    @VisibleForTesting()
  21. object WebApi

    Permalink
  22. package auth

    Permalink
  23. package context

    Permalink
  24. package io

    Permalink
  25. package routes

    Permalink
  26. package util

    Permalink

Ungrouped