Class/Object

com.coxautodata.waimak.dataflow

DataFlow

Related Docs: object DataFlow | package dataflow

Permalink

abstract class DataFlow[Self <: DataFlow[Self]] extends Logging

Defines a state of the data flow. State is defined by the inputs that are ready to be consumed and actions that need to be executed. In most of the BAU cases, initial state of the data flow has no inputs, as they need to be produced by the actions. When an action finishes, it can produce 0 or N outputs, to create next state of the data flow, that action is removed from the data flow and its outputs are added as inputs into the flow. This state transitioning will enable restarts of the flow from any point or debug/exploratory runs with already existing/manufactured/captured/materialised inputs.

Also inputs are useful for unit testing, as they give access to all intermediate outputs of actions.

Self Type
Self
Linear Supertypes
Logging, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataFlow
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataFlow()(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[Self])

    Permalink

Abstract Value Members

  1. abstract def actions(acs: Seq[DataFlowAction]): Self

    Permalink
  2. abstract def actions: Seq[DataFlowAction]

    Permalink

    Actions to execute, these will be scheduled when inputs become available.

    Actions to execute, these will be scheduled when inputs become available. Executed actions must be removed from the sate.

  3. abstract def executor: DataFlowExecutor

    Permalink

    Current DataFlowExecutor associated with this flow

  4. abstract def flowContext: FlowContext

    Permalink
  5. abstract def inputs(inp: DataFlowEntities): Self

    Permalink
  6. abstract def inputs: DataFlowEntities

    Permalink

    Inputs that were explicitly set or produced by previous actions, these are inputs for all following actions.

    Inputs that were explicitly set or produced by previous actions, these are inputs for all following actions. Inputs are preserved in the data flow state, even if they are no longer required by the remaining actions. //TODO: explore the option of removing the inputs that are no longer required by remaining actions!!!

  7. abstract def metadataExtensions: Set[DataFlowMetadataExtension[Self]]

    Permalink
  8. abstract def schedulingMeta(sc: SchedulingMeta): Self

    Permalink
  9. abstract def schedulingMeta: SchedulingMeta

    Permalink
  10. abstract def setMetadataExtensions(extensions: Set[DataFlowMetadataExtension[Self]]): Self

    Permalink
  11. abstract def tagState(ts: DataFlowTagState): Self

    Permalink
  12. abstract def tagState: DataFlowTagState

    Permalink
  13. abstract def withExecutor(executor: DataFlowExecutor): Self

    Permalink

    Add a new executor to this flow, replacing the existing one

    Add a new executor to this flow, replacing the existing one

    executor

    DataFlowExecutor to add to this flow

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addAction[A <: DataFlowAction](action: A): Self

    Permalink

    Creates new state of the dataflow by adding an action to it.

    Creates new state of the dataflow by adding an action to it.

    action

    - action to add

    returns

    - new state with action

    Exceptions thrown

    DataFlowException when: 1) at least one of the input labels is not present in the inputs 2) at least one of the input labels is not present in the outputs of existing actions

  5. def addInput(label: String, value: Option[Any]): Self

    Permalink

    Creates new state of the dataflow by adding an input.

    Creates new state of the dataflow by adding an input. Duplicate labels are handled in prepareForExecution()

    label

    - name of the input

    value

    - values of the input

    returns

    - new state with the input

  6. def addInterceptor(interceptor: InterceptorAction, guidToIntercept: String): Self

    Permalink

    Creates new state of the data flow by replacing the action that is intercepted with action that intercepts it.

    Creates new state of the data flow by replacing the action that is intercepted with action that intercepts it. The action to replace will differ from the intercepted action in the InterceptorAction in the case of replacing an existing InterceptorAction

  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def execute(errorOnUnexecutedActions: Boolean = true): (Seq[DataFlowAction], Self)

    Permalink

    Execute this flow using the current executor on the flow.

    Execute this flow using the current executor on the flow. See DataFlowExecutor.execute() for more information.

  12. def executed(executed: DataFlowAction, outputs: Seq[Option[Any]]): Self

    Permalink

    Creates new state of the dataflow by removing executed action from the actions list and adds its outputs to the inputs.

    Creates new state of the dataflow by removing executed action from the actions list and adds its outputs to the inputs.

    executed

    - executed actions

    outputs

    - outputs of the executed action

    returns

    - next stage data flow without the executed action, but with its outpus as inputs

    Exceptions thrown

    DataFlowException if number of provided outputs is not equal to the number of output labels of the action

  13. def executionPool(executionPoolName: String)(nestedFlow: (Self) ⇒ Self): Self

    Permalink

    Creates a code block with all actions inside of it being run on the specified execution pool.

    Creates a code block with all actions inside of it being run on the specified execution pool. Same execution pool name can be used multiple times and nested pools are allowed, the name closest to the action will be assigned to it.

    Ex: flow.executionPool("pool_1") { _.addAction(a1) .addAction(a2) .executionPool("pool_2") { _.addAction(a3) .addAction(a4) }..addAction(a5) }

    So actions a1, a2, a5 will be in the pool_1 and actions a3, a4 in the pool_2

    executionPoolName

    pool name to assign to all actions inside of it, but it can be overwritten by the nested execution pools.

  14. def finaliseExecution(): Try[Self]

    Permalink

    A function called just after the flow is executed.

    A function called just after the flow is executed. By default, the implementation on DataFlow is no-op, however it is used in spark.SparkDataFlow to clean up the temporary directory

  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def foldLeftOver[A, S >: Self](foldOver: Iterable[A])(f: (S, A) ⇒ S): S

    Permalink

    Fold left over a collection, where the current DataFlow is the zero value.

    Fold left over a collection, where the current DataFlow is the zero value. Lets you fold over a flow inline in the flow.

    foldOver

    Collection to fold over

    f

    Function to apply during the flow

    returns

    A DataFlow produced after repeated applications of f for each element in the collection

  17. def getActionByGuid(actionGuid: String): DataFlowAction

    Permalink

    Guids are unique, find action by guid

  18. def getActionByOutputLabel(outputLabel: String): DataFlowAction

    Permalink

    Output labels are unique.

    Output labels are unique. Finds action that produces outputLabel.

  19. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  20. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  22. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def isValidFlowDAG: Try[Self]

    Permalink

    Flow DAG is valid iff: 1.

    Flow DAG is valid iff: 1. All output labels and existing input labels unique 2. Each action depends on labels that are produced by actions or already present in inputs 3. Active tags is empty 4. Active dependencies is zero 5. No cyclic dependencies in labels 6. No cyclic dependencies in tags 7. No cyclic dependencies in label tag combination

  24. def logAndReturn[A](a: A, msg: String, level: Level): A

    Permalink

    Takes a value of type A and a msg to log, returning a and logging the message at the desired level

    Takes a value of type A and a msg to log, returning a and logging the message at the desired level

    returns

    a

    Definition Classes
    Logging
  25. def logAndReturn[A](a: A, message: (A) ⇒ String, level: Level): A

    Permalink

    Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter

    Takes a value of type A and a function message from A to String, logs the value of invoking message(a) at the level described by the level parameter

    returns

    a

    Definition Classes
    Logging
    Example:
    1. logAndReturn(1, (num: Int) => s"number: $num", Info)
      // In the log we would see a log corresponding to "number 1"
  26. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  32. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  36. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  37. def map[R >: Self](f: (Self) ⇒ R): R

    Permalink

    Transforms the current dataflow by applying a function to it.

    Transforms the current dataflow by applying a function to it.

    f

    A function that transforms a dataflow object

    returns

    New dataflow

  38. def mapOption[R >: Self](f: (Self) ⇒ Option[R]): R

    Permalink

    Optionally transform a dataflow depending on the output of the applying function.

    Optionally transform a dataflow depending on the output of the applying function. If the transforming function returns a None then the original dataflow is returned.

    f

    A function that returns an Option[DataFlow]

    returns

    DataFlow object that may have been transformed

  39. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  40. def nextRunnable(executionPoolsAvailable: Set[String]): Seq[DataFlowAction]

    Permalink

    Returns actions that are ready to run: 1.

    Returns actions that are ready to run: 1. have no input labels; 2. whose inputs have been created 3. all actions whose dependent tags have been run 4. belong to the available pool

    will not include actions that are skipped.

    executionPoolsAvailable

    set of execution pool for which to schedule actions

  41. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  42. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  43. def prepareForExecution(): Try[Self]

    Permalink

    A function called just before the flow is executed.

    A function called just before the flow is executed. This function keeps calling any extension preparation steps first, then checks the tagging state of the flow, and could be overloaded to have implementation specific preparation steps. An overloaded function should call this function first. It would be responsible for preparing an execution environment such as cleaning temporary directories.

  44. def schedulingMeta(mutateState: (SchedulingMetaState) ⇒ SchedulingMetaState)(nestedFlow: (Self) ⇒ Self): Self

    Permalink

    Generic method that can be used to add context and state to all actions inside the block.

    Generic method that can be used to add context and state to all actions inside the block.

    mutateState

    function that adds attributes to the state

    nestedFlow

    all actions inside of this flow will be associated with the mutated state

  45. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  46. def tag(tags: String*)(taggedFlow: (Self) ⇒ Self): Self

    Permalink

    Tag all actions added during the taggedFlow lambda function with any given number of tags.

    Tag all actions added during the taggedFlow lambda function with any given number of tags. These tags can then be used by the tagDependency() action to create a dependency in the running order of actions by tag.

    tags

    Tags to apply to added actions

    taggedFlow

    An intermediate flow that actions can be added to that will be be marked with the tag

  47. def tagDependency(depTags: String*)(tagDependentFlow: (Self) ⇒ Self): Self

    Permalink

    Mark all actions added during the tagDependentFlow lambda function as having a dependency on the tags provided.

    Mark all actions added during the tagDependentFlow lambda function as having a dependency on the tags provided. These actions will only be run once all tagged actions have finished.

    depTags

    Tags to create a dependency on

    tagDependentFlow

    An intermediate flow that actions can be added to that will depended on tagged actions to have completed before running

  48. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  49. def updateMetadataExtension[S <: DataFlowMetadataExtension[Self]](identifier: DataFlowMetadataExtensionIdentifier, combineStates: (Option[S]) ⇒ Option[S])(implicit arg0: ClassTag[S]): Self

    Permalink

    Add, update or remove a metadata extension from the flow using the identifier argument to find an existing extension.

    Add, update or remove a metadata extension from the flow using the identifier argument to find an existing extension.

    S

    Type of the DataFlowMetadataExtension

    identifier

    Identifier of extension to update or remove

    combineStates

    Function that manipulates the extension on the flow. Input will be None if no existing extension with matching identifier exists on the flow. Return None to remove an existing extension with matching identifier from the flow.

  50. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  52. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped