io.smartdatalake.workflow.action
inputs DataObject
output DataObject
a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject
if the input files should be deleted after processing successfully
number of files per Spark partition
optional execution mode for this Action
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
Adds a runtime event for this Action
Adds a runtime event for this Action
Adds a runtime metric for this Action
Adds a runtime metric for this Action
Applies the executionMode and stores result in executionModeResult variable
Applies the executionMode and stores result in executionModeResult variable
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue.
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue. This is needed to reprocess all files of a path/partition instead of the FileRef's passed from the previous Action.
"Transforms" a given FileSubFeed Note usage of doExec to choose between initialization or actual execution.
"Transforms" a given FileSubFeed Note usage of doExec to choose between initialization or actual execution.
subFeed to be processed (referencing files to be read)
prepared output subFeed
true if action should be executed. If false this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.
processed output subFeed (referencing files written by this action)
Action.exec implementation
Action.exec implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
optional spark sql expression evaluated against SubFeedsExpressionData.
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
optional execution mode for this Action
optional execution mode for this Action
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
number of files per Spark partition
Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.
Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.
Get latest runtime state
Get latest runtime state
Get summarized runtime information for a given ExecutionId.
Get summarized runtime information for a given ExecutionId.
ExecutionId to get runtime information for. If empty runtime information for last ExecutionId are returned.
Get the latest metrics for all DataObjects and a given SDLExecutionId.
Get the latest metrics for all DataObjects and a given SDLExecutionId.
ExecutionId to get metrics for. If empty metrics for last ExecutionId are returned.
A unique identifier for this instance.
A unique identifier for this instance.
Action.init implementation
Action.init implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
Input FileRefDataObject which can CanCreateInputStream
Input FileRefDataObject which can CanCreateInputStream
inputs DataObject
Input DataObjects To be implemented by subclasses
Input DataObjects To be implemented by subclasses
Additional metadata for the Action
Additional metadata for the Action
optional spark sql expression evaluated as where-clause against dataframe of metrics.
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
provide an implementation of the DAG node id
provide an implementation of the DAG node id
Output FileRefDataObject which can CanCreateOutputStream
Output FileRefDataObject which can CanCreateOutputStream
output DataObject
Output DataObjects To be implemented by subclasses
Output DataObjects To be implemented by subclasses
Executes operations needed after executing an action.
Executes operations needed after executing an action. In this step any task on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.
Executes operations needed to cleanup after executing an action failed.
Executes operations needed to cleanup after executing an action failed.
Executes operations needed before executing an action.
Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql
Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.
Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.
Prepare DataObjects prerequisites.
Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table
This runs during the "prepare" phase of the DAG.
Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.
Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.
Sets the util job description for better traceability in the Spark UI
Sets the util job description for better traceability in the Spark UI
Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.
phase description (be short...)
This is displayed in ascii graph visualization
This is displayed in ascii graph visualization
a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject
if the input files should be deleted after processing successfully
if the input files should be deleted after processing successfully
(Since version 2.0.3) use executionMode = FileIncrementalMoveMode instead
Action to transform files between two Hadoop Data Objects. The transformation is executed in distributed mode on the Spark executors. A custom file transformer must be given, which reads a file from Hadoop and writes it back to Hadoop.
inputs DataObject
output DataObject
a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject
if the input files should be deleted after processing successfully
number of files per Spark partition
optional execution mode for this Action
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.