Class

com.coxautodata.waimak.rdbm.ingestion

RDBMExtractionTableConfig

Related Doc: package ingestion

Permalink

case class RDBMExtractionTableConfig(tableName: String, pkCols: Option[Seq[String]] = None, lastUpdatedColumn: Option[String] = None, maxRowsPerPartition: Option[Int] = None, forceRetainStorageHistory: Option[Boolean] = None) extends Product with Serializable

Table configuration used for RDBM extraction

tableName

The name of the table

pkCols

Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

lastUpdatedColumn

Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

maxRowsPerPartition

Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

forceRetainStorageHistory

Optionally specify whether to retain history for this table in the storage layer. Setting this to anything other than None will override the default behaviour which is:

  • if there is a lastUpdated column (either specified here or found by the RDBMExtractor) retain all history for this table
  • if there is no lastUpdated column, don't retain history for this table (history is removed when the table is compacted). The choice of this default behaviour is because, without a lastUpdatedColumn, the table will be extracted in full every time extraction is performed, causing the size of the data in storage to grow uncontrollably
Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RDBMExtractionTableConfig
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RDBMExtractionTableConfig(tableName: String, pkCols: Option[Seq[String]] = None, lastUpdatedColumn: Option[String] = None, maxRowsPerPartition: Option[Int] = None, forceRetainStorageHistory: Option[Boolean] = None)

    Permalink

    tableName

    The name of the table

    pkCols

    Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

    lastUpdatedColumn

    Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

    maxRowsPerPartition

    Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

    forceRetainStorageHistory

    Optionally specify whether to retain history for this table in the storage layer. Setting this to anything other than None will override the default behaviour which is:

    • if there is a lastUpdated column (either specified here or found by the RDBMExtractor) retain all history for this table
    • if there is no lastUpdated column, don't retain history for this table (history is removed when the table is compacted). The choice of this default behaviour is because, without a lastUpdatedColumn, the table will be extracted in full every time extraction is performed, causing the size of the data in storage to grow uncontrollably

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  8. val forceRetainStorageHistory: Option[Boolean]

    Permalink

    Optionally specify whether to retain history for this table in the storage layer.

    Optionally specify whether to retain history for this table in the storage layer. Setting this to anything other than None will override the default behaviour which is:

    • if there is a lastUpdated column (either specified here or found by the RDBMExtractor) retain all history for this table
    • if there is no lastUpdated column, don't retain history for this table (history is removed when the table is compacted). The choice of this default behaviour is because, without a lastUpdatedColumn, the table will be extracted in full every time extraction is performed, causing the size of the data in storage to grow uncontrollably
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  11. val lastUpdatedColumn: Option[String]

    Permalink

    Optionally, the last updated column for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

  12. val maxRowsPerPartition: Option[Int]

    Permalink

    Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition.

    Optionally, the maximum number of rows to be read per Dataset partition for this table This number will be used to generate predicates to be passed to org.apache.spark.sql.SparkSession.read.jdbc If this is not set, the DataFrame will only have one partition. This could result in memory issues when extracting large tables. Be careful not to create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems. You can also control the maximum number of jdbc connections to open by limiting the number of executors for your application.

  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. val pkCols: Option[Seq[String]]

    Permalink

    Optionally, the primary key columns for this table (don't need if the implementation of RDBMExtractor is capable of getting this information itself)

  17. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  18. val tableName: String

    Permalink

    The name of the table

  19. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped