Include all records between the given timestamps.
Include all records between the given timestamps.
if no data in storage layer, return None
Appends a new set of records to the audit table.
Appends a new set of records to the audit table.
Fails when is called second time on same instance.
records to append
column that returns java.sql.Timestamp that will be used for de-duplication on the primary keys
timestamp of when the append has happened. It will not be used for de-duplications
(new state of the AuditTable, count of appended records) or error
Request optimisation of the storage layer.
Request optimisation of the storage layer.
Fails when is called second time on same instance.
timestamp of when the compaction is requested, will not be used for any filtering of the data
Maximum age of old region files kept in the .Trash folder after a compaction has happened.
the row number threshold to use for determining small regions to be compacted.
a partitioner function that dictates how many partitions should be generated for a given region
Whether to recompact all regions regardless of size (i.e. ignore smallRegionRowThreshold)
new state of the AuditTable
Returns latest timestamp of records stored in the audit table.
Initializes audit table in the storage layer.
Initializes audit table in the storage layer. It will also persist all of the metadata (name, primary keys, custom meta) to the storage layer.
new state of the table or error
Custom attributes assigned by the client application during table creation.
Generates snapshot that contains only the latest records for the given timestamp.
Generates snapshot that contains only the latest records for the given timestamp. De-duplication happens on the primary keys.
use records that are closest to this timestamp
if no data in storage layer, return None
Name of the table.
Update the metadata for this table
Update the metadata for this table
the new metadata
new state of the AuditTable
Main abstraction for an audit table that a client application must use to store records with a timestamp. It hides all details of the physical storage, so that client apps can use various file systems (Ex: HDFS, ADLS, S3, Local, etc) or key value (Ex: HBase).
Also this abstraction can produce a snapshot of data de-duplicated on the primary key and true to the specified moment in time.
Also surfaces custom attributes initialised during table creation, so that client applications do not need to worry about storing the relevant metadata in a separate storage. It also will simplify backup, restore and sharing of data between environments.
Some storage layers might be quite inefficient when it comes to storing lots of appends in multiple files and storage optimisation, aka compaction, should not intervene with normal operation of the application. Therefore application should be able to control when compaction can take place.
An instance of AuditTable represents a functional state, if data was modified, do not use it again.
There are 2 types of operations on the table:
Created by Alexei Perelighin on 2018/03/03