Packages

package actions

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. trait Action extends HyperspaceEventLogging with Logging with ActiveSparkSession

    This is a generic Index-Modifying Action interface.

    This is a generic Index-Modifying Action interface. It provides APIs to begin and commit operations which logically lock an index from further operations.

    TODO: Action classes can be revisited to make them more generic to support more types of metadata:

    1. Any metadata dependent logic should be passed in as functions instead. 2. IndexLogEntry specific code should be removed.
  2. class CancelAction extends Action

    Cancelling an action.

    Cancelling an action. This action is used if index maintenance operations fail and leave the index in a hanging intermediate state. E.g. If refresh action fails, the index is in REFRESHING state, preventing future index operations. Cancelling brings the index back to the last known stable state.

    Algorithm:

    • Find the last stable active state log entry
    • save the next log entry with the contents of the last active state log entry
    • TODO: (optionally cleanup any partial files created during previous jobs)
  3. class CreateAction extends CreateActionBase with Action
  4. class DeleteAction extends Action
  5. class OptimizeAction extends CreateActionBase with Action

    Optimize Action provides an optimize support for indexes where small index files can be merged into larger ones for better index performance.

    Optimize Action provides an optimize support for indexes where small index files can be merged into larger ones for better index performance.

    Algorithm outline: 1. Collect all the currently valid index files. 2. Split files into small and large, based on a threshold. 3. Bucketwise combine smaller files into 1 file per bucket. 4. Update index snapshot to remove small files and keep large files + newly created files.

    NOTE: This is an index-only operation. It does not look at the current state of the data at all. If the data was changed after index creation, optimize will NOT include the data changes.

    Available modes: Quick mode: This mode allows for fast optimization. Files smaller than a predefined threshold "spark.hyperspace.index.optimize.fileSizeThreshold" will be picked for compaction.

    Full mode: This allows for slow but complete optimization. ALL index files are picked for compaction.

  6. class RefreshAction extends RefreshActionBase

    The Index refresh action is used to perform a full rebuild of the index.

    The Index refresh action is used to perform a full rebuild of the index. Consequently, it ends up creating a new version of the index and involves a full scan of the underlying source data.

  7. class RefreshIncrementalAction extends RefreshActionBase

    Action to refresh indexes with newly appended files and deleted files in an incremental way.

    Action to refresh indexes with newly appended files and deleted files in an incremental way.

    For appended files, newly arrived data in the original source dataset (more specifically under rootPaths), will be handled as follows: - Identify newly added data files. - Create new index version on these files. - Update metadata to reflect the latest snapshot of index. This snapshot includes all the old and the newly created index files. The source content points to the latest data files.

    For deleted files, some original source data file(s) are removed between previous version of index and now, will be handled as follows: - Identify deleted source data files. - Index records' lineage is leveraged to remove any index entry coming from those deleted source data files.

  8. class RefreshQuickAction extends RefreshActionBase

    Action to refresh index metadata only with newly appended files and deleted files.

  9. class RestoreAction extends Action
  10. class VacuumAction extends Action

Value Members

  1. object Constants

Ungrouped