Package

com.coxautodata.waimak

storage

Permalink

package storage

Visibility
  1. Public
  2. All

Type Members

  1. trait AuditTable extends AnyRef

    Permalink

    Main abstraction for an audit table that a client application must use to store records with a timestamp.

    Main abstraction for an audit table that a client application must use to store records with a timestamp. It hides all details of the physical storage, so that client apps can use various file systems (Ex: HDFS, ADLS, S3, Local, etc) or key value (Ex: HBase).

    Also this abstraction can produce a snapshot of data de-duplicated on the primary key and true to the specified moment in time.

    Also surfaces custom attributes initialised during table creation, so that client applications do not need to worry about storing the relevant metadata in a separate storage. It also will simplify backup, restore and sharing of data between environments.

    Some storage layers might be quite inefficient when it comes to storing lots of appends in multiple files and storage optimisation, aka compaction, should not intervene with normal operation of the application. Therefore application should be able to control when compaction can take place.

    An instance of AuditTable represents a functional state, if data was modified, do not use it again.

    There are 2 types of operations on the table:

    1. data extraction - which do not modify the state of the table, thus same instance of the AuditTable can be used for multiple data extraction operations; 2. data mutators - adding data to the table, optimising storage. These lead to new state of the underlying storage and the same instance of AuditTable can not be used for data mutators again.

    Created by Alexei Perelighin on 2018/03/03

  2. class AuditTableFile extends AuditTable with Logging

    Permalink

    Implementation of the AuditTable which is backed up by append only block storage like HDFS.

    Implementation of the AuditTable which is backed up by append only block storage like HDFS.

    Created by Alexei Perelighin on 2018/03/03

  3. case class AuditTableInfo(table_name: String, primary_keys: Seq[String], meta: Map[String, String], retain_history: Boolean) extends Product with Serializable

    Permalink

    Static information about the table, that is persisted when audit table is initialised.

    Static information about the table, that is persisted when audit table is initialised.

    table_name

    name of the table

    primary_keys

    list of columns that make up primary key, these will be used for snapshot generation and record deduplication

    meta

    application/custom metadata that will not be used in this library.

    retain_history

    whether to retain history for this table. If set to false, the table will be deduplicated on every compaction

  4. case class AuditTableRegionInfo(table_name: String, store_type: String, store_region: String, created_on: Timestamp, is_deprecated: Boolean, count: Long, max_last_updated: Timestamp) extends Product with Serializable

    Permalink

    table_name

    name of the table

    store_type

    cold or hot, appended regions are added to hot and after compaction make it into cold. Cold regions can also be compacted

    store_region

    id of the region, for simplicity, at least for now it will be GUID

    created_on

    timestamp when region was created as a result of an append or compact operation

    is_deprecated

    true - its data was compacted into another region, false - it was not compacted

    count

    number of records in the region, can be used for optimisation and compaction decisions

    max_last_updated

    all records in the audit table will contain column that shows the last updated time, this will be used to generated ingestion queries

  5. trait CompactionPartitionerGenerator extends AnyRef

    Permalink
  6. trait FileStorageOps extends AnyRef

    Permalink

    Contains operations that interact with physical storage.

    Contains operations that interact with physical storage. Will also handle commit to the file system.

    Created by Alexei Perelighin on 2018/03/05

  7. class FileStorageOpsWithStaging extends FileStorageOps with Logging

    Permalink

    Implementation around FileSystem and SparkSession with temporary and trash folders.

  8. case class StorageException(text: String, cause: Throwable = null) extends RuntimeException with Product with Serializable

    Permalink

    Is thrown by storage layer.

    Is thrown by storage layer.

    Created by Alexei Perelighin on 2018/03/04

Value Members

  1. object AuditTable

    Permalink
  2. object AuditTableFile extends Logging

    Permalink
  3. object CompactionPartitionerGenerator

    Permalink
  4. object Storage

    Permalink

    Contains methods to create tables, open tables.

    Contains methods to create tables, open tables.

    Created by Alexei Perelighin on 2018/04/11

  5. object StorageActions extends Logging

    Permalink

    Created by Vicky Avison on 11/05/18.

  6. object TotalBytesPartitioner extends CompactionPartitionerGenerator

    Permalink

    A compaction partitioner that partitions on the approximate maximum number of bytes to be in each partition file

  7. object TotalCellsPartitioner extends CompactionPartitionerGenerator

    Permalink

    A compaction partitioner that partitions on the approximate maximum number of cells (numRows * numColumns) to be in each partition file

Ungrouped