app

Type Members

case class AllApps(apps: Seq[String]) extends Product with Serializable
trait BaseEnv extends Env

Environment which provides a base path into which the application can write its data Unless overridden, paths will be of the form {uri}/data/{environment}/{project}/{branch} where environment is the logical environment (e.g.
Environment which provides a base path into which the application can write its data Unless overridden, paths will be of the form {uri}/data/{environment}/{project}/{branch} where environment is the logical environment (e.g. dev, test), project is the name of the application and branch is the Git branch
N.B when environment is 'prod', the branch is omitted from the path as we assume it will always be master
e.g. hdfs:///data/dev/my_project/feature_abc, hdfs:///data/prod/my_project
trait Env extends Logging

Environment defining a sandbox in which an application can write
case class EnvironmentAction(ids: Seq[String], action: String) extends Product with Serializable
trait HiveEnv extends BaseEnv

Environment which provides databases.
Environment which provides databases. By default, there will be a single database of the form {environment}_{project}_{branch} where environment is the logical environment (e.g. dev, test), project is the name of the application and branch is the Git branch
N.B when environment is 'prod', the branch is omitted from the database name as we assume it will always be master
e.g. dev_my_project_feature_abc, prod_my_project
case class SingleAppConfig(appClassName: String, dependencies: Seq[String] = Nil) extends Product with Serializable
abstract class SparkApp[E <: Env] extends AnyRef

During the development lifecycle of Spark applications, it is useful to create sandbox environments comprising paths and Hive databases etc.
During the development lifecycle of Spark applications, it is useful to create sandbox environments comprising paths and Hive databases etc. which are tied to specific logical environments (e.g. dev, test, prod) and feature development (i.e Git branches). e.g. when working on a feature called new_feature for a project called my_project, the application should write its data to paths under /data/dev/my_project/new_feature/ and create tables in a database called dev_my_project_new_feature (actual implementation of what these environments should look like can be defined by extending Env or one of its subclasses - the final implementation should be a case class whose values define the environment i.e env, branch etc.)
This is a generic Spark Application which uses an implementation of Env to generate application-specific configuration and subsequently parse this configuration into a case class to be used for the application logic.
E
the type of the Env implementation (must be a case class)
abstract class WaimakApp[E <: Env with WaimakEnv] extends SparkApp[E]

This is a SparkApp specifically for applications using Waimak
This is a SparkApp specifically for applications using Waimak
E
the type of the WaimakEnv implementation (must be a case class)
trait WaimakEnv extends AnyRef

Trait for defining Waimak-app specific configuration

Value Members

object EnvironmentManager

Performs create and cleanup operations for the Env implementation used by a provided implementation of SparkApp The following configuration values should be present in the SparkSession:
Performs create and cleanup operations for the Env implementation used by a provided implementation of SparkApp The following configuration values should be present in the SparkSession:
spark.waimak.environment.ids: comma-separated unique ids for the environments spark.waimak.environment.{environmentid}.appClassName: the application class to use (must extend SparkApp) spark.waimak.environment.action: the environment action to perform (create or cleanup)
The Env implementation expects configuration values prefixed with spark.waimak.environment.{environmentid}.
object MultiAppRunner

Allows multiple Spark applications to be run in a single main method whilst obeying configured dependency constraints.
Allows multiple Spark applications to be run in a single main method whilst obeying configured dependency constraints. The following configuration values should be present in the SparkSession:
spark.waimak.apprunner.apps: a comma-delimited list of the names (identifiers) of all of the applications being run (e.g. myapp1,myapp2)
spark.waimak.apprunner.{appname}.appClassName: for each application, the application class to use (must extend SparkApp) (e.g. spark.waimak.apprunner.myapp1.appClassName = com.example.MyWaimakApp)
spark.waimak.apprunner.{appname}.dependencies: for each application, an optional comma-delimited list of dependencies. If omitted, the application will have no dependencies and will not wait for other apps to finish before starting execution. Dependencies must match the names provided in spark.waimak.apprunner.apps (e.g. spark.waimak.apprunner.myapp1.dependencies = myapp2)
The Env implementation used by the provided SparkApp implementation expects configuration values prefixed with: spark.waimak.environment.{appname}.

package app

Type Members

case class AllApps(apps: Seq[String]) extends Product with Serializable

trait BaseEnv extends Env

trait Env extends Logging

case class EnvironmentAction(ids: Seq[String], action: String) extends Product with Serializable

trait HiveEnv extends BaseEnv

case class SingleAppConfig(appClassName: String, dependencies: Seq[String] = Nil) extends Product with Serializable

abstract class SparkApp[E <: Env] extends AnyRef

abstract class WaimakApp[E <: Env with WaimakEnv] extends SparkApp[E]

trait WaimakEnv extends AnyRef

Value Members

object EnvironmentManager

object MultiAppRunner

Ungrouped