Package

com.coxautodata.waimak.spark

app

Permalink

package app

Visibility
  1. Public
  2. All

Type Members

  1. case class AllApps(apps: Seq[String]) extends Product with Serializable

    Permalink
  2. trait BaseEnv extends Env

    Permalink

    Environment which provides a base path into which the application can write its data Unless overridden, paths will be of the form {uri}/data/{environment}/{project}/{branch} where environment is the logical environment (e.g.

    Environment which provides a base path into which the application can write its data Unless overridden, paths will be of the form {uri}/data/{environment}/{project}/{branch} where environment is the logical environment (e.g. dev, test), project is the name of the application and branch is the Git branch

    N.B when environment is 'prod', the branch is omitted from the path as we assume it will always be master

    e.g. hdfs:///data/dev/my_project/feature_abc, hdfs:///data/prod/my_project

  3. trait Env extends Logging

    Permalink

    Environment defining a sandbox in which an application can write

  4. case class EnvironmentAction(action: String, appClassName: String) extends Product with Serializable

    Permalink
  5. trait HiveEnv extends BaseEnv

    Permalink

    Environment which provides databases.

    Environment which provides databases. By default, there will be a single database of the form {environment}_{project}_{branch} where environment is the logical environment (e.g. dev, test), project is the name of the application and branch is the Git branch

    N.B when environment is 'prod', the branch is omitted from the database name as we assume it will always be master

    e.g. dev_my_project_feature_abc, prod_my_project

  6. case class SingleAppConfig(appClassName: String, dependencies: Seq[String] = Nil) extends Product with Serializable

    Permalink
  7. abstract class SparkApp[E <: Env] extends AnyRef

    Permalink

    During the development lifecycle of Spark applications, it is useful to create sandbox environments comprising paths and Hive databases etc.

    During the development lifecycle of Spark applications, it is useful to create sandbox environments comprising paths and Hive databases etc. which are tied to specific logical environments (e.g. dev, test, prod) and feature development (i.e Git branches). e.g. when working on a feature called new_feature for a project called my_project, the application should write its data to paths under /data/dev/my_project/new_feature/ and create tables in a database called dev_my_project_new_feature (actual implementation of what these environments should look like can be defined by extending Env or one of its subclasses - the final implementation should be a case class whose values define the environment i.e env, branch etc.)

    This is a generic Spark Application which uses an implementation of Env to generate application-specific configuration and subsequently parse this configuration into a case class to be used for the application logic.

    E

    the type of the Env implementation (must be a case class)

  8. abstract class WaimakApp[E <: Env] extends SparkApp[E]

    Permalink

    This is a SparkApp specifically for applications using Waimak

    This is a SparkApp specifically for applications using Waimak

    E

    the type of the Env implementation (must be a case class)

Value Members

  1. object EnvironmentManager

    Permalink

    Performs create and cleanup operations for the Env implementation used by a provided implementation of SparkApp The following configuration values should be present in the SparkSession:

    Performs create and cleanup operations for the Env implementation used by a provided implementation of SparkApp The following configuration values should be present in the SparkSession:

    spark.waimak.environment.appClassName: the application class to use (must extend SparkApp) spark.waimak.environment.action: the environment action to perform (create or cleanup)

    The Env implementation expects configuration values prefixed with spark.waimak.environment.

  2. object MultiAppRunner

    Permalink

    Allows multiple Spark applications to be run in a single main method whilst obeying configured dependency constraints.

    Allows multiple Spark applications to be run in a single main method whilst obeying configured dependency constraints. The following configuration values should be present in the SparkSession:

    spark.waimak.apprunner.apps: a comma-delimited list of the names (identifiers) of all of the applications being run (e.g. myapp1,myapp2)

    spark.waimak.apprunner.{appname}.appClassName: for each application, the application class to use (must extend SparkApp) (e.g. spark.waimak.apprunner.myapp1.appClassName = com.example.MyWaimakApp)

    spark.waimak.apprunner.{appname}.dependencies: for each application, an optional comma-delimited list of dependencies. If omitted, the application will have no dependencies and will not wait for other apps to finish before starting execution. Dependencies must match the names provided in spark.waimak.apprunner.apps (e.g. spark.waimak.apprunner.myapp1.dependencies = myapp2)

    The Env implementation used by the provided SparkApp implementation expects configuration values prefixed with: spark.waimak.environment.{appname}.

Ungrouped