Class/Object

io.smartdatalake.workflow

SparkSubFeed

Related Docs: object SparkSubFeed | package workflow

Permalink

case class SparkSubFeed(dataFrame: Option[DataFrame], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, isDummy: Boolean = false, filter: Option[String] = None) extends SubFeed with Product with Serializable

A SparkSubFeed is used to transport DataFrame's between Actions.

dataFrame

Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).

dataObjectId

id of the DataObject this SubFeed corresponds to

partitionValues

Values of Partitions transported by this SubFeed

isDAGStart

true if this subfeed is a start node of the dag

isSkipped

true if this subfeed is the result of a skipped action

isDummy

true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.

filter

a spark sql filter expression. This is used by SparkIncrementalMode.

Linear Supertypes
Serializable, Serializable, Product, Equals, SubFeed, SmartDataLakeLogger, DAGResult, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SparkSubFeed
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. SubFeed
  7. SmartDataLakeLogger
  8. DAGResult
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SparkSubFeed(dataFrame: Option[DataFrame], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, isDummy: Boolean = false, filter: Option[String] = None)

    Permalink

    dataFrame

    Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).

    dataObjectId

    id of the DataObject this SubFeed corresponds to

    partitionValues

    Values of Partitions transported by this SubFeed

    isDAGStart

    true if this subfeed is a start node of the dag

    isSkipped

    true if this subfeed is the result of a skipped action

    isDummy

    true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.

    filter

    a spark sql filter expression. This is used by SparkIncrementalMode.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def breakLineage(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink

    Break lineage.

    Break lineage. This means to discard an existing DataFrame or List of FileRefs, so that it is requested again from the DataObject. On one side this is usable to break long DataFrame Lineages over multiple Actions and instead reread the data from an intermediate table. On the other side it is needed if partition values or filter condition are changed.

    Definition Classes
    SparkSubFeedSubFeed
  6. def clearDAGStart(): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  7. def clearFilter(breakLineageOnChange: Boolean = true)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink
  8. def clearPartitionValues(breakLineageOnChange: Boolean = true)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  9. def clearSkipped(): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. val dataFrame: Option[DataFrame]

    Permalink

    Spark DataFrame to be processed.

    Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).

  12. val dataObjectId: DataObjectId

    Permalink

    id of the DataObject this SubFeed corresponds to

    id of the DataObject this SubFeed corresponds to

    Definition Classes
    SparkSubFeedSubFeed
  13. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. val filter: Option[String]

    Permalink

    a spark sql filter expression.

    a spark sql filter expression. This is used by SparkIncrementalMode.

  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def getFilterCol: Option[Column]

    Permalink
  18. def hasReusableDataFrame: Boolean

    Permalink
  19. val isDAGStart: Boolean

    Permalink

    true if this subfeed is a start node of the dag

    true if this subfeed is a start node of the dag

    Definition Classes
    SparkSubFeedSubFeed
  20. val isDummy: Boolean

    Permalink

    true if this subfeed only contains a dummy DataFrame.

    true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.

  21. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  22. val isSkipped: Boolean

    Permalink

    true if this subfeed is the result of a skipped action

    true if this subfeed is the result of a skipped action

    Definition Classes
    SparkSubFeedSubFeed
  23. def isStreaming: Option[Boolean]

    Permalink
  24. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  25. def movePartitionColumnsLast(partitions: Seq[String]): SparkSubFeed

    Permalink
  26. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  27. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  29. val partitionValues: Seq[PartitionValues]

    Permalink

    Values of Partitions transported by this SubFeed

    Values of Partitions transported by this SubFeed

    Definition Classes
    SparkSubFeedSubFeed
  30. def persist: SparkSubFeed

    Permalink
  31. def resultId: String

    Permalink
    Definition Classes
    SubFeed → DAGResult
  32. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  33. def toOutput(dataObjectId: DataObjectId): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  34. def union(other: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): SubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  35. def unionPartitionValues(otherPartitionValues: Seq[PartitionValues]): Seq[PartitionValues]

    Permalink
    Definition Classes
    SubFeed
  36. def updatePartitionValues(partitions: Seq[String], breakLineageOnChange: Boolean = true, newPartitionValues: Option[Seq[PartitionValues]] = None)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  37. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from SubFeed

Inherited from SmartDataLakeLogger

Inherited from DAGResult

Inherited from AnyRef

Inherited from Any

Ungrouped