dataobject

Type Members

class DeltaLakeModulePlugin extends ModulePlugin
case class DeltaLakeTableDataObject(id: DataObjectId, path: Option[String], partitions: Seq[String] = Seq(), options: Option[Map[String, String]] = None, schemaMin: Option[StructType] = None, table: Table, saveMode: SDLSaveMode = SDLSaveMode.Overwrite, allowSchemaEvolution: Boolean = false, retentionPeriod: Option[Int] = None, acl: Option[AclDef] = None, connectionId: Option[ConnectionId] = None, expectedPartitionsCondition: Option[String] = None, housekeepingMode: Option[HousekeepingMode] = None, metadata: Option[DataObjectMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends TransactionalSparkTableDataObject with CanMergeDataFrame with CanEvolveSchema with CanHandlePartitions with HasHadoopStandardFilestore with Product with Serializable

DataObject of type DeltaLakeTableDataObject.
DataObject of type DeltaLakeTableDataObject. Provides details to access Tables in delta format to an Action. Note that in Spark 2.x Catalog for DeltaTable is not supported. This means that table db/name are not used. It's the path that
Delta format maintains a transaction log in a separate _delta_log subfolder. The schema is registered in Metastore by DeltaLakeTableDataObject.
The following anomalies might occur: - table is registered in metastore but path does not exist -> table is dropped from metastore - table is registered in metastore but path is empty -> error is thrown. Delete the path to clean up - table is registered and path contains parquet files, but _delta_log subfolder is missing -> path is converted to delta format - table is not registered but path contains parquet files and _delta_log subfolder -> Table is registered - table is not registered but path contains parquet files without _delta_log subfolder -> path is converted to delta format and table is registered - table is not registered and path does not exists -> table is created on write
id
unique name of this data object
path
hadoop directory for this table. If it doesn't contain scheme and authority, the connections pathPrefix is applied. If pathPrefix is not defined or doesn't define scheme and authority, default schema and authority is applied.
partitions
partition columns for this data object
options
Options for Delta Lake tables see: https://docs.delta.io/latest/delta-batch.html and org.apache.spark.sql.delta.DeltaOptions
schemaMin
An optional, minimal schema that this DataObject must have to pass schema validation on reading and writing.
table
DeltaLake table to be written by this output
saveMode
SDLSaveMode to use when writing files, default is "overwrite". Overwrite, Append and Merge are supported for now.
allowSchemaEvolution
If set to true schema evolution will automatically occur when writing to this DataObject with different schema, otherwise SDL will stop with error.
retentionPeriod
Optional delta lake retention threshold in hours. Files required by the table for reading versions younger than retentionPeriod will be preserved and the rest of them will be deleted.
acl
override connection permissions for files created tables hadoop directory with this connection
connectionId
optional id of io.smartdatalake.workflow.connection.HiveTableConnection
expectedPartitionsCondition
Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.
housekeepingMode
Optional definition of a housekeeping mode applied after every write. E.g. it can be used to cleanup, archive and compact partitions. See HousekeepingMode for available implementations. Default is None.
metadata
meta data

Value Members

object DeltaLakeTableDataObject extends FromConfigFactory[DataObject] with Serializable

package dataobject

Type Members

class DeltaLakeModulePlugin extends ModulePlugin

Value Members

object DeltaLakeTableDataObject extends FromConfigFactory[DataObject] with Serializable

Ungrouped