Class/Object

io.smartdatalake.workflow.action

CustomFileAction

Related Docs: object CustomFileAction | package action

Permalink

case class CustomFileAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, transformer: CustomFileTransformerConfig, deleteDataAfterRead: Boolean = false, filesPerPartition: Int = 10, breakFileRefLineage: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends FileSubFeedAction with SmartDataLakeLogger with Product with Serializable

Action to transform files between two Hadoop Data Objects. The transformation is executed in distributed mode on the Spark executors. A custom file transformer must be given, which reads a file from Hadoop and writes it back to Hadoop.

inputId

inputs DataObject

outputId

output DataObject

transformer

a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject

deleteDataAfterRead

if the input files should be deleted after processing successfully

filesPerPartition

number of files per Spark partition

executionMode

optional execution mode for this Action

executionCondition

optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.

metricsFailCondition

optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

Linear Supertypes
Serializable, Serializable, Product, Equals, FileSubFeedAction, Action, AtlasExportable, SmartDataLakeLogger, DAGNode, ParsableFromConfig[Action], SdlConfigObject, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CustomFileAction
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. FileSubFeedAction
  7. Action
  8. AtlasExportable
  9. SmartDataLakeLogger
  10. DAGNode
  11. ParsableFromConfig
  12. SdlConfigObject
  13. AnyRef
  14. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CustomFileAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, transformer: CustomFileTransformerConfig, deleteDataAfterRead: Boolean = false, filesPerPartition: Int = 10, breakFileRefLineage: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry)

    Permalink

    inputId

    inputs DataObject

    outputId

    output DataObject

    transformer

    a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject

    deleteDataAfterRead

    if the input files should be deleted after processing successfully

    filesPerPartition

    number of files per Spark partition

    executionMode

    optional execution mode for this Action

    executionCondition

    optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.

    metricsFailCondition

    optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addRuntimeEvent(executionId: ExecutionId, phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq(), tstmp: LocalDateTime = LocalDateTime.now): Unit

    Permalink

    Adds a runtime event for this Action

    Adds a runtime event for this Action

    Definition Classes
    Action
  5. def addRuntimeMetrics(executionId: Option[ExecutionId], dataObjectId: Option[DataObjectId], metric: ActionMetrics): Unit

    Permalink

    Adds a runtime metric for this Action

    Adds a runtime metric for this Action

    Definition Classes
    Action
  6. def applyExecutionMode(mainInput: DataObject, mainOutput: DataObject, subFeed: SubFeed, partitionValuesTransform: (Seq[PartitionValues]) ⇒ Map[PartitionValues, PartitionValues])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Applies the executionMode and stores result in executionModeResult variable

    Applies the executionMode and stores result in executionModeResult variable

    Attributes
    protected
    Definition Classes
    Action
  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def atlasName: String

    Permalink
    Definition Classes
    Action → AtlasExportable
  9. def atlasQualifiedName(prefix: String): String

    Permalink
    Definition Classes
    AtlasExportable
  10. val breakFileRefLineage: Boolean

    Permalink

    Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue.

    Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue. This is needed to reprocess all files of a path/partition instead of the FileRef's passed from the previous Action.

    Definition Classes
    CustomFileActionFileSubFeedAction
  11. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. def doTransform(inputSubFeed: FileSubFeed, outputSubFeed: FileSubFeed, doExec: Boolean)(implicit session: SparkSession, context: ActionPipelineContext): FileSubFeed

    Permalink

    "Transforms" a given FileSubFeed Note usage of doExec to choose between initialization or actual execution.

    "Transforms" a given FileSubFeed Note usage of doExec to choose between initialization or actual execution.

    inputSubFeed

    subFeed to be processed (referencing files to be read)

    outputSubFeed

    prepared output subFeed

    doExec

    true if action should be executed. If false this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.

    returns

    processed output subFeed (referencing files written by this action)

    Definition Classes
    CustomFileActionFileSubFeedAction
  13. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def exec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Action.exec implementation

    Action.exec implementation

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    FileSubFeedAction → Action
  15. val executionCondition: Option[Condition]

    Permalink

    optional spark sql expression evaluated against SubFeedsExpressionData.

    optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.

    Definition Classes
    CustomFileAction → Action
  16. var executionConditionResult: Option[(Boolean, Option[String])]

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  17. val executionMode: Option[ExecutionMode]

    Permalink

    optional execution mode for this Action

    optional execution mode for this Action

    Definition Classes
    CustomFileAction → Action
  18. var executionModeResult: Option[Try[Option[ExecutionModeResult]]]

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  19. def factory: FromConfigFactory[Action]

    Permalink

    Returns the factory that can parse this type (that is, type CO).

    Returns the factory that can parse this type (that is, type CO).

    Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.

    returns

    the factory (object) for this class.

    Definition Classes
    CustomFileAction → ParsableFromConfig
  20. val filesPerPartition: Int

    Permalink

    number of files per Spark partition

  21. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  22. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  23. def getDataObjectsState: Seq[DataObjectState]

    Permalink

    Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.

    Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.

    Definition Classes
    Action
  24. def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  25. def getLatestRuntimeEventState: Option[RuntimeEventState]

    Permalink

    Get latest runtime state

    Get latest runtime state

    Definition Classes
    Action
  26. def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  27. def getRuntimeDataImpl: RuntimeData

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  28. def getRuntimeInfo(executionId: Option[ExecutionId] = None): Option[RuntimeInfo]

    Permalink

    Get summarized runtime information for a given ExecutionId.

    Get summarized runtime information for a given ExecutionId.

    executionId

    ExecutionId to get runtime information for. If empty runtime information for last ExecutionId are returned.

    Definition Classes
    Action
  29. def getRuntimeMetrics(executionId: Option[ExecutionId] = None): Map[DataObjectId, Option[ActionMetrics]]

    Permalink

    Get the latest metrics for all DataObjects and a given SDLExecutionId.

    Get the latest metrics for all DataObjects and a given SDLExecutionId.

    executionId

    ExecutionId to get metrics for. If empty metrics for last ExecutionId are returned.

    Definition Classes
    Action
  30. val id: ActionId

    Permalink

    A unique identifier for this instance.

    A unique identifier for this instance.

    Definition Classes
    CustomFileAction → Action → SdlConfigObject
  31. final def init(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Action.init implementation

    Action.init implementation

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    FileSubFeedAction → Action
  32. val input: HadoopFileDataObject

    Permalink

    Input FileRefDataObject which can CanCreateInputStream

    Input FileRefDataObject which can CanCreateInputStream

    Definition Classes
    CustomFileActionFileSubFeedAction
  33. val inputId: DataObjectId

    Permalink

    inputs DataObject

  34. val inputs: Seq[HadoopFileDataObject]

    Permalink

    Input DataObjects To be implemented by subclasses

    Input DataObjects To be implemented by subclasses

    Definition Classes
    CustomFileAction → Action
  35. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  36. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  37. val metadata: Option[ActionMetadata]

    Permalink

    Additional metadata for the Action

    Additional metadata for the Action

    Definition Classes
    CustomFileAction → Action
  38. val metricsFailCondition: Option[String]

    Permalink

    optional spark sql expression evaluated as where-clause against dataframe of metrics.

    optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

    Definition Classes
    CustomFileAction → Action
  39. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  40. def nodeId: String

    Permalink

    provide an implementation of the DAG node id

    provide an implementation of the DAG node id

    Definition Classes
    Action → DAGNode
  41. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  42. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  43. val output: HadoopFileDataObject

    Permalink

    Output FileRefDataObject which can CanCreateOutputStream

    Output FileRefDataObject which can CanCreateOutputStream

    Definition Classes
    CustomFileActionFileSubFeedAction
  44. val outputId: DataObjectId

    Permalink

    output DataObject

  45. val outputs: Seq[HadoopFileDataObject]

    Permalink

    Output DataObjects To be implemented by subclasses

    Output DataObjects To be implemented by subclasses

    Definition Classes
    CustomFileAction → Action
  46. final def postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed after executing an action.

    Executes operations needed after executing an action. In this step any task on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.

    Definition Classes
    FileSubFeedAction → Action
  47. def postExecFailed(implicit session: SparkSession): Unit

    Permalink

    Executes operations needed to cleanup after executing an action failed.

    Executes operations needed to cleanup after executing an action failed.

    Definition Classes
    Action
  48. def postExecSubFeed(inputSubFeed: SubFeed, outputSubFeed: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink
    Definition Classes
    FileSubFeedAction
  49. def preExec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed before executing an action.

    Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql

    Definition Classes
    Action
  50. def preInit(subFeeds: Seq[SubFeed], dataObjectsState: Seq[DataObjectState])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.

    Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.

    Definition Classes
    Action
  51. def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Prepare DataObjects prerequisites.

    Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table

    This runs during the "prepare" phase of the DAG.

    Definition Classes
    FileSubFeedAction → Action
  52. def recursiveInputs: Seq[FileRefDataObject with CanCreateInputStream]

    Permalink

    Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.

    Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.

    Definition Classes
    FileSubFeedAction → Action
  53. def setSparkJobMetadata(operation: Option[String] = None)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Sets the util job description for better traceability in the Spark UI

    Sets the util job description for better traceability in the Spark UI

    Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.

    operation

    phase description (be short...)

    Definition Classes
    Action
  54. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  55. final def toString(executionId: Option[ExecutionId]): String

    Permalink
    Definition Classes
    Action
  56. final def toString(): String

    Permalink

    This is displayed in ascii graph visualization

    This is displayed in ascii graph visualization

    Definition Classes
    Action → AnyRef → Any
  57. def toStringMedium: String

    Permalink
    Definition Classes
    Action
  58. def toStringShort: String

    Permalink
    Definition Classes
    Action
  59. val transformer: CustomFileTransformerConfig

    Permalink

    a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject

  60. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  61. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  62. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. val deleteDataAfterRead: Boolean

    Permalink

    if the input files should be deleted after processing successfully

    if the input files should be deleted after processing successfully

    Definition Classes
    CustomFileActionFileSubFeedAction
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.3) use executionMode = FileIncrementalMoveMode instead

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from FileSubFeedAction

Inherited from Action

Inherited from AtlasExportable

Inherited from SmartDataLakeLogger

Inherited from DAGNode

Inherited from ParsableFromConfig[Action]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped