package workflow
- Alphabetic
- Public
- All
Type Members
-
case class
DAG[N <: DAGNode] extends SmartDataLakeLogger with Product with Serializable
A generic directed acyclic graph (DAG) consisting of DAGNodes interconnected with directed DAGEdges.
A generic directed acyclic graph (DAG) consisting of DAGNodes interconnected with directed DAGEdges.
This DAG can have multiple start nodes and multiple end nodes as well as disconnected parts.
-
case class
FileSubFeed(fileRefs: Option[Seq[FileRef]], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, processedInputFileRefs: Option[Seq[FileRef]] = None) extends SubFeed with Product with Serializable
A FileSubFeed is used to transport references to files between Actions.
A FileSubFeed is used to transport references to files between Actions.
- fileRefs
path to files to be processed
- dataObjectId
id of the DataObject this SubFeed corresponds to
- partitionValues
Values of Partitions transported by this SubFeed
- processedInputFileRefs
used to remember processed input FileRef's for post processing (e.g. delete after read)
- case class HadoopFileStateId(path: Path, appName: String, runId: Int, attemptId: Int) extends StateId with Product with Serializable
-
case class
InitSubFeed(dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues]) extends SubFeed with Product with Serializable
A InitSubFeed is used to initialize first Nodes of a DAG.
A InitSubFeed is used to initialize first Nodes of a DAG.
- dataObjectId
id of the DataObject this SubFeed corresponds to
- partitionValues
Values of Partitions transported by this SubFeed
- class PrimaryKeyConstraintViolationException extends RuntimeException
-
class
ProcessingLogicException extends RuntimeException
Exception to signal that a configured pipeline can't be executed properly
-
case class
SparkSubFeed(dataFrame: Option[DataFrame], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isDummy: Boolean = false, filter: Option[String] = None) extends SubFeed with Product with Serializable
A SparkSubFeed is used to transport DataFrame's between Actions.
A SparkSubFeed is used to transport DataFrame's between Actions.
- dataFrame
Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).
- dataObjectId
id of the DataObject this SubFeed corresponds to
- partitionValues
Values of Partitions transported by this SubFeed
- isDAGStart
true if this subfeed is a start node of the dag
- isDummy
true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.
- filter
a spark sql filter expression. This is used by SparkIncrementalMode.
-
trait
SubFeed extends DAGResult
A SubFeed transports references to data between Actions.
A SubFeed transports references to data between Actions. Data can be represented by different technologies like Files or DataFrame.
Value Members
- object DAG extends SmartDataLakeLogger with Serializable
- object ExecutionPhase extends Enumeration
- object FileSubFeed extends Serializable
- object SparkSubFeed extends Serializable