  1. case class AllApps(apps: Seq[String]) extends Product with Serializable

  2. trait BaseEnv extends Env


    Environment which provides a base path into which the application can write its data Unless overridden, paths will be of the form {uri}/data/{environment}/{project}/{branch} where environment is the logical environment (e.g. dev, test), project is the name of the application and branch is the Git branch

    N.B when environment is 'prod', the branch is omitted from the path as we assume it will always be master

    e.g. hdfs:///data/dev/my_project/feature_abc, hdfs:///data/prod/my_project

  3. trait Env extends Logging


    Environment defining a sandbox in which an application can write

  4. case class EnvironmentAction(ids: Seq[String], action: String) extends Product with Serializable

  5. trait HiveEnv extends BaseEnv


    Environment which provides databases. By default, there will be a single database of the form {environment}_{project}_{branch} where environment is the logical environment (e.g. dev, test), project is the name of the application and branch is the Git branch

    N.B when environment is 'prod', the branch is omitted from the database name as we assume it will always be master

    e.g. dev_my_project_feature_abc, prod_my_project

  6. case class SingleAppConfig(appClassName: String, dependencies: Seq[String] = Nil) extends Product with Serializable

  7. abstract class SparkApp[E <: Env] extends AnyRef


    During the development lifecycle of Spark applications, it is useful to create sandbox environments comprising paths and Hive databases etc. which are tied to specific logical environments (e.g. dev, test, prod) and feature development (i.e Git branches). e.g. when working on a feature called new_feature for a project called my_project, the application should write its data to paths under /data/dev/my_project/new_feature/ and create tables in a database called dev_my_project_new_feature (actual implementation of what these environments should look like can be defined by extending Env or one of its subclasses - the final implementation should be a case class whose values define the environment i.e env, branch etc.)

    This is a generic Spark Application which uses an implementation of Env to generate application-specific configuration and subsequently parse this configuration into a case class to be used for the application logic.


    the type of the Env implementation (must be a case class)

  8. abstract class WaimakApp[E <: Env with WaimakEnv] extends SparkApp[E]


    This is a SparkApp specifically for applications using Waimak


    the type of the WaimakEnv implementation (must be a case class)

  9. trait WaimakEnv extends AnyRef


    Trait for defining Waimak-app specific configuration

  1. object EnvironmentManager


    Performs create and cleanup operations for the Env implementation used by a provided implementation of SparkApp The following configuration values should be present in the SparkSession:

    spark.waimak.environment.ids: comma-separated unique ids for the environments spark.waimak.environment.{environmentid}.appClassName: the application class to use (must extend SparkApp) spark.waimak.environment.action: the environment action to perform (create or cleanup)

    The Env implementation expects configuration values prefixed with spark.waimak.environment.{environmentid}.

  2. object MultiAppRunner


    Allows multiple Spark applications to be run in a single main method whilst obeying configured dependency constraints. The following configuration values should be present in the SparkSession:

    spark.waimak.apprunner.apps: a comma-delimited list of the names (identifiers) of all of the applications being run (e.g. myapp1,myapp2)

    spark.waimak.apprunner.{appname}.appClassName: for each application, the application class to use (must extend SparkApp) (e.g. spark.waimak.apprunner.myapp1.appClassName = com.example.MyWaimakApp)

    spark.waimak.apprunner.{appname}.dependencies: for each application, an optional comma-delimited list of dependencies. If omitted, the application will have no dependencies and will not wait for other apps to finish before starting execution. Dependencies must match the names provided in spark.waimak.apprunner.apps (e.g. spark.waimak.apprunner.myapp1.dependencies = myapp2)

    The Env implementation used by the provided SparkApp implementation expects configuration values prefixed with: spark.waimak.environment.{appname}.
