SFtpFileRefDataObject

Instance Constructors

new SFtpFileRefDataObject(id: DataObjectId, path: String, connectionId: ConnectionId, partitions: Seq[String] = Seq(), partitionLayout: Option[String] = None, saveMode: SDLSaveMode = SDLSaveMode.Overwrite, expectedPartitionsCondition: Option[String] = None, metadata: Option[DataObjectMetadata] = None)(implicit instanceRegistry: InstanceRegistry)

partitionLayout
partition layout defines how partition values can be extracted from the path. Use "%<colname>%" as token to extract the value for a partition column. With "%<colname:regex>%" a regex can be given to limit search. This is especially useful if there is no char to delimit the last token from the rest of the path or also between two tokens.
saveMode
Overwrite or Append new data.
expectedPartitionsCondition
Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def atlasName: String

Definition Classes
DataObject → AtlasExportable
def atlasQualifiedName(prefix: String): String

Definition Classes
AtlasExportable
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
val connectionId: ConnectionId
def createInputStream(path: String)(implicit session: SparkSession): InputStream

Definition Classes
SFtpFileRefDataObject → CanCreateInputStream
def createOutputStream(path: String, overwrite: Boolean)(implicit session: SparkSession, context: ActionPipelineContext): OutputStream

Create an OutputStream for a given path, that the Action can use to write data into.
Create an OutputStream for a given path, that the Action can use to write data into.

Definition Classes
SFtpFileRefDataObject → CanCreateOutputStream
def deleteAll(implicit session: SparkSession): Unit

Delete all data.
Delete all data. This is used to implement SaveMode.Overwrite.

Definition Classes
FileRefDataObject
def deleteFileRefs(fileRefs: Seq[FileRef])(implicit session: SparkSession): Unit

Delete given files.
Delete given files. This is used to cleanup files after they are processed.

Definition Classes
SFtpFileRefDataObject → FileRefDataObject
def endWritingOutputStreams(partitionValues: Seq[PartitionValues])(implicit session: SparkSession, context: ActionPipelineContext): Unit

This is called after all output streams have been written.
This is called after all output streams have been written. It is used for e.g. making sure empty partitions are created as well.

Definition Classes
SFtpFileRefDataObject → CanCreateOutputStream
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
val expectedPartitionsCondition: Option[String]

Optional definition of partitions expected to exist.
Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.

Definition Classes
SFtpFileRefDataObject → CanHandlePartitions
def extractPartitionValuesFromPath(filePath: String)(implicit session: SparkSession): PartitionValues

Extract partition values from a given file path
Extract partition values from a given file path

Attributes
protected
Definition Classes
FileRefDataObject
def factory: FromConfigFactory[DataObject]

Returns the factory that can parse this type (that is, type CO).
Returns the factory that can parse this type (that is, type CO).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
returns
the factory (object) for this class.

Definition Classes
SFtpFileRefDataObject → ParsableFromConfig
val fileName: String

Definition of fileName.
Definition of fileName. Default is an asterix to match everything. This is concatenated with the partition layout to search for files.

Definition Classes
FileRefDataObject
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getConnection[T <: Connection](connectionId: ConnectionId)(implicit registry: InstanceRegistry, ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T

Handle class cast exception when getting objects from instance registry
Handle class cast exception when getting objects from instance registry

Attributes
protected
Definition Classes
DataObject
def getConnectionReg[T <: Connection](connectionId: ConnectionId, registry: InstanceRegistry)(implicit ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T

Attributes
protected
Definition Classes
DataObject
def getFileRefs(partitionValues: Seq[PartitionValues])(implicit session: SparkSession): Seq[FileRef]

List files for given partition values
List files for given partition values
partitionValues
List of partition values to be filtered. If empty all files in root path of DataObject will be listed.
returns
List of FileRefs

Definition Classes
SFtpFileRefDataObject → FileRefDataObject
def getPartitionString(partitionValues: PartitionValues)(implicit session: SparkSession): Option[String]

get partition values formatted by partition layout
get partition values formatted by partition layout

Definition Classes
FileRefDataObject
def getPath(implicit session: SparkSession): String

Method for subclasses to override the base path for this DataObject.
Method for subclasses to override the base path for this DataObject. This is for instance needed if pathPrefix is defined in a connection.

Definition Classes
FileRefDataObject
def getSearchPaths(partitionValues: Seq[PartitionValues])(implicit session: SparkSession): Seq[(PartitionValues, String)]

prepare paths to be searched
prepare paths to be searched

Attributes
protected
Definition Classes
FileRefDataObject
def housekeepingMode: Option[HousekeepingMode]

Configure a housekeeping mode to e.g cleanup, archive and compact partitions.
Configure a housekeeping mode to e.g cleanup, archive and compact partitions. Default is None.

Definition Classes
DataObject
val id: DataObjectId

A unique identifier for this instance.
A unique identifier for this instance.

Definition Classes
SFtpFileRefDataObject → DataObject → SdlConfigObject
implicit val instanceRegistry: InstanceRegistry
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def listPartitions(implicit session: SparkSession, context: ActionPipelineContext): Seq[PartitionValues]

List partitions on data object's root path
List partitions on data object's root path

Definition Classes
SFtpFileRefDataObject → CanHandlePartitions
lazy val logger: Logger

Attributes
protected
Definition Classes
SmartDataLakeLogger
val metadata: Option[DataObjectMetadata]

Additional metadata for the DataObject
Additional metadata for the DataObject

Definition Classes
SFtpFileRefDataObject → DataObject
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val partitionLayout: Option[String]

partition layout defines how partition values can be extracted from the path.
partition layout defines how partition values can be extracted from the path. Use "%<colname>%" as token to extract the value for a partition column. With "%<colname:regex>%" a regex can be given to limit search. This is especially useful if there is no char to delimit the last token from the rest of the path or also between two tokens.

Definition Classes
SFtpFileRefDataObject → FileRefDataObject
val partitions: Seq[String]

Definition of partition columns
Definition of partition columns

Definition Classes
SFtpFileRefDataObject → CanHandlePartitions
val path: String

The root path of the files that are handled by this DataObject.
The root path of the files that are handled by this DataObject.

Definition Classes
SFtpFileRefDataObject → FileDataObject
def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

Prepare & test DataObject's prerequisits
Prepare & test DataObject's prerequisits
This runs during the "prepare" operation of the DAG.

Definition Classes
SFtpFileRefDataObject → FileDataObject → DataObject
def relativizePath(filePath: String)(implicit session: SparkSession): String

Make a given path relative to this DataObjects base path
Make a given path relative to this DataObjects base path

Definition Classes
SFtpFileRefDataObject → FileDataObject
val saveMode: SDLSaveMode

Overwrite or Append new data.
Overwrite or Append new data.

Definition Classes
SFtpFileRefDataObject → FileRefDataObject
val separator: Char

default separator for paths
default separator for paths

Attributes
protected
Definition Classes
FileDataObject
def startWritingOutputStreams(partitionValues: Seq[PartitionValues] = Seq())(implicit session: SparkSession, context: ActionPipelineContext): Unit

This is called before any output stream is created to initialize writing.
This is called before any output stream is created to initialize writing. It is used to apply SaveMode, e.g. deleting existing partitions.

Definition Classes
SFtpFileRefDataObject → CanCreateOutputStream
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toStringShort: String

Definition Classes
DataObject
def translateFileRefs(fileRefs: Seq[FileRef])(implicit session: SparkSession, context: ActionPipelineContext): Seq[FileRefMapping]

Given some FileRef for another DataObject, translate the paths to the root path of this DataObject
Given some FileRef for another DataObject, translate the paths to the root path of this DataObject

Definition Classes
FileRefDataObject
def validateSchemaHasPartitionCols(df: DataFrame, role: String): Unit

Validate the schema of a given Spark Data Frame df that it contains the specified partition columns
Validate the schema of a given Spark Data Frame df that it contains the specified partition columns
df
The data frame to validate.
role
role used in exception message. Set to read or write.

Definition Classes
CanHandlePartitions
Exceptions thrown
SchemaViolationException if the partitions columns are not included.
def validateSchemaHasPrimaryKeyCols(df: DataFrame, primaryKeyCols: Seq[String], role: String): Unit

Validate the schema of a given Spark Data Frame df that it contains the specified primary key columns
Validate the schema of a given Spark Data Frame df that it contains the specified primary key columns
df
The data frame to validate.
role
role used in exception message. Set to read or write.

Definition Classes
CanHandlePartitions
Exceptions thrown
SchemaViolationException if the partitions columns are not included.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Docs: object SFtpFileRefDataObject | package dataobject

Instance Constructors

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def atlasName: String

def atlasQualifiedName(prefix: String): String

def clone(): AnyRef

val connectionId: ConnectionId

def createInputStream(path: String)(implicit session: SparkSession): InputStream

def createOutputStream(path: String, overwrite: Boolean)(implicit session: SparkSession, context: ActionPipelineContext): OutputStream

def deleteAll(implicit session: SparkSession): Unit

def deleteFileRefs(fileRefs: Seq[FileRef])(implicit session: SparkSession): Unit

def endWritingOutputStreams(partitionValues: Seq[PartitionValues])(implicit session: SparkSession, context: ActionPipelineContext): Unit

final def eq(arg0: AnyRef): Boolean

val expectedPartitionsCondition: Option[String]

def extractPartitionValuesFromPath(filePath: String)(implicit session: SparkSession): PartitionValues

def factory: FromConfigFactory[DataObject]

val fileName: String

def finalize(): Unit

final def getClass(): Class[_]

def getConnection[T <: Connection](connectionId: ConnectionId)(implicit registry: InstanceRegistry, ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T

def getConnectionReg[T <: Connection](connectionId: ConnectionId, registry: InstanceRegistry)(implicit ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T

def getFileRefs(partitionValues: Seq[PartitionValues])(implicit session: SparkSession): Seq[FileRef]

def getPartitionString(partitionValues: PartitionValues)(implicit session: SparkSession): Option[String]

def getPath(implicit session: SparkSession): String

def getSearchPaths(partitionValues: Seq[PartitionValues])(implicit session: SparkSession): Seq[(PartitionValues, String)]

def housekeepingMode: Option[HousekeepingMode]

val id: DataObjectId

implicit val instanceRegistry: InstanceRegistry

final def isInstanceOf[T0]: Boolean

def listPartitions(implicit session: SparkSession, context: ActionPipelineContext): Seq[PartitionValues]

lazy val logger: Logger

val metadata: Option[DataObjectMetadata]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val partitionLayout: Option[String]

val partitions: Seq[String]

val path: String

def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

def relativizePath(filePath: String)(implicit session: SparkSession): String

val saveMode: SDLSaveMode

val separator: Char

def startWritingOutputStreams(partitionValues: Seq[PartitionValues] = Seq())(implicit session: SparkSession, context: ActionPipelineContext): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toStringShort: String

def translateFileRefs(fileRefs: Seq[FileRef])(implicit session: SparkSession, context: ActionPipelineContext): Seq[FileRefMapping]

def validateSchemaHasPartitionCols(df: DataFrame, role: String): Unit

def validateSchemaHasPrimaryKeyCols(df: DataFrame, primaryKeyCols: Seq[String], role: String): Unit

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from CanCreateOutputStream

Inherited from CanCreateInputStream

Inherited from FileRefDataObject

Inherited from FileDataObject

Inherited from CanHandlePartitions

Inherited from DataObject

Inherited from AtlasExportable

Inherited from SmartDataLakeLogger

Inherited from ParsableFromConfig[DataObject]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped