case class PartitionDiffMode(partitionColNb: Option[Int] = None, alternativeOutputId: Option[DataObjectId] = None, nbOfPartitionValuesPerRun: Option[Int] = None, applyCondition: Option[String] = None, failCondition: Option[String] = None, failConditions: Seq[Condition] = Seq(), selectExpression: Option[String] = None, applyPartitionValuesTransform: Boolean = false, selectAdditionalInputExpression: Option[String] = None) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable
Partition difference execution mode lists partitions on mainInput & mainOutput DataObject and starts loading all missing partitions. Partition columns to be used for comparision need to be a common 'init' of input and output partition columns. This mode needs mainInput/Output DataObjects which CanHandlePartitions to list partitions. Partition values are passed to following actions for partition columns which they have in common.
- partitionColNb
optional number of partition columns to use as a common 'init'.
- alternativeOutputId
optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.
- nbOfPartitionValuesPerRun
optional restriction of the number of partition values per run.
- applyCondition
Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.
- failConditions
List of conditions to fail application of execution mode if true. Define as spark sql expressions working with attributes of PartitionDiffModeExpressionData returning a boolean. Default is that the application of the PartitionDiffMode does not fail the action. If there is no data to process, the following actions are skipped. Multiple conditions are evaluated individually and every condition may fail the execution mode (or-logic)
- selectExpression
optional expression to define or refine the list of selected output partitions. Define a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected output partitions found in attribute selectedOutputPartitionValues.
- applyPartitionValuesTransform
If true applies the partition values transform of custom transformations on input partition values before comparison with output partition values. If enabled input and output partition columns can be different. Default is to disable the transformation of partition values.
- selectAdditionalInputExpression
optional expression to refine the list of selected input partitions. Note that primarily output partitions are selected by PartitionDiffMode. The selected output partitions are then transformed back to the input partitions needed to create the selected output partitions. This is one-to-one except if applyPartitionValuesTransform=true. And sometimes there is a need for additional input data to create the output partitions, e.g. if you aggregate a window of 7 days for every day. You can customize selected input partitions by defining a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected input partitions found in attribute selectedInputPartitionValues.
- Alphabetic
- By Inheritance
- PartitionDiffMode
- Serializable
- Serializable
- Product
- Equals
- ExecutionModeWithMainInputOutput
- ExecutionMode
- SmartDataLakeLogger
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
PartitionDiffMode(partitionColNb: Option[Int] = None, alternativeOutputId: Option[DataObjectId] = None, nbOfPartitionValuesPerRun: Option[Int] = None, applyCondition: Option[String] = None, failCondition: Option[String] = None, failConditions: Seq[Condition] = Seq(), selectExpression: Option[String] = None, applyPartitionValuesTransform: Boolean = false, selectAdditionalInputExpression: Option[String] = None)
- partitionColNb
optional number of partition columns to use as a common 'init'.
- alternativeOutputId
optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.
- nbOfPartitionValuesPerRun
optional restriction of the number of partition values per run.
- applyCondition
Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.
- failConditions
List of conditions to fail application of execution mode if true. Define as spark sql expressions working with attributes of PartitionDiffModeExpressionData returning a boolean. Default is that the application of the PartitionDiffMode does not fail the action. If there is no data to process, the following actions are skipped. Multiple conditions are evaluated individually and every condition may fail the execution mode (or-logic)
- selectExpression
optional expression to define or refine the list of selected output partitions. Define a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected output partitions found in attribute selectedOutputPartitionValues.
- applyPartitionValuesTransform
If true applies the partition values transform of custom transformations on input partition values before comparison with output partition values. If enabled input and output partition columns can be different. Default is to disable the transformation of partition values.
- selectAdditionalInputExpression
optional expression to refine the list of selected input partitions. Note that primarily output partitions are selected by PartitionDiffMode. The selected output partitions are then transformed back to the input partitions needed to create the selected output partitions. This is one-to-one except if applyPartitionValuesTransform=true. And sometimes there is a need for additional input data to create the output partitions, e.g. if you aggregate a window of 7 days for every day. You can customize selected input partitions by defining a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected input partitions found in attribute selectedInputPartitionValues.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
alternativeOutput(implicit context: ActionPipelineContext): Option[DataObject]
- Definition Classes
- ExecutionModeWithMainInputOutput
-
val
alternativeOutputId: Option[DataObjectId]
- Definition Classes
- PartitionDiffMode → ExecutionModeWithMainInputOutput
- val applyCondition: Option[String]
- val applyPartitionValuesTransform: Boolean
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val failCondition: Option[String]
- val failConditions: Seq[Condition]
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
lazy val
logger: Logger
- Attributes
- protected
- Definition Classes
- SmartDataLakeLogger
- Annotations
- @transient()
- val nbOfPartitionValuesPerRun: Option[Int]
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val partitionColNb: Option[Int]
- val selectAdditionalInputExpression: Option[String]
- val selectExpression: Option[String]
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()