A class to help with comparing checkpoints with each other, where we may have had concurrent writers that checkpoint with different number of parts.
Records information about a checkpoint.
Record metrics about a successful commit.
Thrown when files are added that would have been read by the current transaction.
Thrown when the current transaction deletes data that was deleted by a concurrent transaction.
Thrown when the current transaction reads data that was deleted by a concurrent transaction.
Thrown when concurrent transaction both attempt to update the same idempotent transaction.
Thrown when a concurrent transaction has written data after the current transaction read the table.
The basic class for all Tahoe commit conflict exceptions.
This class keeps tracks of the version of commits and their timestamps for a Delta table to help with operations like describing the history of a table.
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.
Options for the Delta data source.
An identifier for a Delta table containing one of the path or the table identifier.
The specification to time travel a Delta Table to the given timestamp
or version
.
The specification to time travel a Delta Table to the given timestamp
or version
.
An expression that can be evaluated into a timestamp. The expression cannot be a subquery.
The version of the table to time travel to. Must be >= 0.
The API used to perform time travel, e.g. atSyntax
, dfReader
or SQL
An initial snapshot with only metadata specified.
An initial snapshot with only metadata specified. Useful for creating a DataFrame from an existing parquet table during its conversion to delta.
Thrown when the metadata of the Delta table has changed between the time of read and the time of commit.
Cleans up expired Delta table metadata.
A helper class in building a helpful error message in case of metadata mismatches.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This class is not thread-safe.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This trait is not thread-safe.
Thrown when the protocol version has changed between the time of read and the time of commit.
Record the state of the table as a checksum file along with a commit.
An immutable snapshot of the state of the log at some delta version.
An immutable snapshot of the state of the log at some delta version. Internally this class manages the replay of actions stored in checkpoint or delta files, given an optional starting snapshot.
After resolving any new actions, it caches the result and collects the following basic information to the driver:
Trait with helper functions to generate expressions to update target columns, even if they are nested fields.
Verify the state of the table using the checksum files.
Stats calculated within a snapshot, which we store along individual transactions for verification.
Stats calculated within a snapshot, which we store along individual transactions for verification.
The size of the table in bytes
Number of AddFile
actions in the snapshot
Number of Metadata
actions in the snapshot
Number of Protocol
actions in the snapshot
Number of SetTransaction
actions in the snapshot
Contains list of reservoir configs and validation checks.
A holder object for Delta errors.
Extractor Object for pulling out the full table scan of a Delta table.
Contains many utility methods that can also be executed on Spark executors.
Exhaustive list of operations that can be performed on a Delta table.
Exhaustive list of operations that can be performed on a Delta table. These operations are
tracked as the first line in delta logs, and power DESCRIBE HISTORY
for Delta tables.
Extractor Object for pulling out the table scan of a Delta table.
Extractor Object for pulling out the table scan of a Delta table. It could be a full scan or a partial scan.
Utilities for DeltaTableIdentifier.
Records information about a checkpoint.
the version of this checkpoint
the number of actions in the checkpoint
the number of parts when the checkpoint has multiple parts. None if this is a singular checkpoint