Class OnlineIndexer

  • All Implemented Interfaces:
    AutoCloseable

    @API(UNSTABLE)
    public class OnlineIndexer
    extends Object
    implements AutoCloseable
    Builds an index online, i.e., concurrently with other database operations. In order to minimize the impact that these operations have with other operations, this attempts to minimize the priorities of its transactions. Additionally, it attempts to limit the amount of work it will done in a fashion that will decrease as the number of failures for a given build attempt increases.

    As ranges of elements are rebuilt, the fact that the range has rebuilt is added to a RangeSet associated with the index being built. This RangeSet is used to (a) coordinate work between different builders that might be running on different machines to ensure that the same work isn't duplicated and to (b) make sure that non-idempotent indexes (like COUNT or SUM_LONG) don't update themselves (or fail to update themselves) incorrectly.

    Unlike many other features in the Record Layer core, this has a retry loop.

    Build an index immediately in the current transaction:

    
     try (OnlineIndexer indexBuilder = OnlineIndexer.forRecordStoreAndIndex(recordStore, "newIndex")) {
         indexBuilder.rebuildIndex(recordStore);
     }
     

    Build an index synchronously in the multiple transactions:

    
     try (OnlineIndexer indexBuilder = OnlineIndexer.forRecordStoreAndIndex(recordStore, "newIndex")) {
         indexBuilder.buildIndex();
     }
     
    • Field Detail

      • DEFAULT_LIMIT

        public static final int DEFAULT_LIMIT
        Default number of records to attempt to run in a single transaction.
        See Also:
        Constant Field Values
      • DEFAULT_WRITE_LIMIT_BYTES

        public static final int DEFAULT_WRITE_LIMIT_BYTES
        Default transaction write size limit. Note that the actual write might be "a little" bigger.
        See Also:
        Constant Field Values
      • DEFAULT_RECORDS_PER_SECOND

        public static final int DEFAULT_RECORDS_PER_SECOND
        Default limit to the number of records to attempt in a single second.
        See Also:
        Constant Field Values
      • DEFAULT_MAX_RETRIES

        public static final int DEFAULT_MAX_RETRIES
        Default number of times to retry a single range rebuild.
        See Also:
        Constant Field Values
      • DEFAULT_PROGRESS_LOG_INTERVAL

        public static final int DEFAULT_PROGRESS_LOG_INTERVAL
        Default interval to be logging successful progress in millis when building across transactions. -1 means it will not log.
        See Also:
        Constant Field Values
      • DEFAULT_LEASE_LENGTH_MILLIS

        public static final long DEFAULT_LEASE_LENGTH_MILLIS
        Default length between last access and lease's end time in milliseconds.
        See Also:
        Constant Field Values
      • UNLIMITED

        public static final int UNLIMITED
        Constant indicating that there should be no limit to some usually limited operation.
        See Also:
        Constant Field Values
    • Method Detail

      • getLimit

        public int getLimit()
        Get the current number of records to process in one transaction. This may go up or down while throttledRunAsync(Function, BiFunction, BiConsumer, List) is running, if there are failures committing or repeated successes.
        Returns:
        the current number of records to process in one transaction
      • buildRange

        @Nonnull
        public CompletableFuture<Void> buildRange​(@Nonnull
                                                  FDBRecordStore store,
                                                  @Nullable
                                                  Key.Evaluated start,
                                                  @Nullable
                                                  Key.Evaluated end)
        Builds (transactionally) the index by adding records with primary keys within the given range. This will look for gaps of keys within the given range that haven't yet been rebuilt and then rebuild only those ranges. As a result, if this method is called twice, the first time, it will build whatever needs to be built, and then the second time, it will notice that there are no ranges that need to be built, so it will do nothing. In this way, it is idempotent and thus safe to use in retry loops. This method will fail if there is too much work to be done in a single transaction. If one wants to handle building a range that does not fit in a single transaction, one should use the buildRange() function that takes an FDBDatabase as its first parameter.
        Parameters:
        store - the record store in which to rebuild the range
        start - the (inclusive) beginning primary key of the range to build (or null to go to the end)
        end - the (exclusive) end primary key of the range to build (or null to go to the end)
        Returns:
        a future that will be ready when the build has completed
      • buildRange

        @Nonnull
        public CompletableFuture<Void> buildRange​(@Nullable
                                                  Key.Evaluated start,
                                                  @Nullable
                                                  Key.Evaluated end)
        Builds (with a retry loop) the index by adding records with primary keys within the given range. This will look for gaps of keys within the given range that haven't yet been rebuilt and then rebuild only those ranges. It will also limit each transaction to the number of records specified by the limit parameter of this class's constructor. In the case that that limit is too high (i.e., it can't make any progress or errors out on a non-retriable error like transaction_too_large, this method will actually decrease the limit so that less work is attempted each transaction. It will also rate limit itself as to not make too many requests per second.

        Note that it does not have the protections (synchronized sessions and index state precondition) which are imposed on buildIndexAsync() (or its variations), but it does use the created synchronized session if a buildIndexAsync() is running on the OnlineIndexer simultaneously or this range build is used as part of buildIndexAsync() internally.

        Parameters:
        start - the (inclusive) beginning primary key of the range to build (or null to go from the beginning)
        end - the (exclusive) end primary key of the range to build (or null to go to the end)
        Returns:
        a future that will be ready when the build has completed
      • buildUnbuiltRange

        @Nonnull
        public CompletableFuture<Key.Evaluated> buildUnbuiltRange​(@Nonnull
                                                                  FDBRecordStore store,
                                                                  @Nullable
                                                                  Key.Evaluated start,
                                                                  @Nullable
                                                                  Key.Evaluated end)
        Builds (transactionally) the index by adding records with primary keys within the given range. This requires that the range is initially "unbuilt", i.e., no records within the given range have yet been processed by the index build job. It is acceptable if there are records within that range that have already been added to the index because they were added to the store after the index was added in write-only mode but have not yet been processed by the index build job. Note that this function is not idempotent in that if the first time this function runs, if it fails with commit_unknown_result but the transaction actually succeeds, running this function again will result in a OnlineIndexer.RecordBuiltRangeException being thrown the second time. Retry loops used by the OnlineIndexer class that call this method handle this contingency. For the most part, this method should only be used by those who know what they are doing. It is included because it is less expensive to make this call if one already knows that the range will be unbuilt, but the caller must be ready to handle the circumstance that the range might be built the second time. Most users should use the buildRange() method with the same parameters in the case that they want to build a range of keys into the index. That method is idempotent, but it is slightly more costly as it firsts determines what ranges are have not yet been built before building them.
        Parameters:
        store - the record store in which to rebuild the range
        start - the (inclusive) beginning primary key of the range to build (or null to start from the beginning)
        end - the (exclusive) end primary key of the range to build (or null to go to the end)
        Returns:
        a future with the key of the first record not processed by this range rebuild
        Throws:
        OnlineIndexer.RecordBuiltRangeException - if the given range contains keys already processed by the index build
      • rebuildIndexAsync

        @Nonnull
        public CompletableFuture<Void> rebuildIndexAsync​(@Nonnull
                                                         FDBRecordStore store)
        Transactionally rebuild an entire index. This will (1) delete any data in the index that is already there and (2) rebuild the entire key range for the given index. It will attempt to do this within a single transaction, and it may fail if there are too many records, so this is only safe to do for small record stores. Many large use-cases should use the buildIndexAsync() method along with temporarily changing an index to write-only mode while the index is being rebuilt.
        Parameters:
        store - the record store in which to rebuild the index
        Returns:
        a future that will be ready when the build has completed
      • buildEndpoints

        @Nonnull
        public CompletableFuture<TupleRange> buildEndpoints​(@Nonnull
                                                            FDBRecordStore store)
        Builds (transactionally) the endpoints of an index. What this means is that builds everything from the beginning of the key space to the first record and everything from the last record to the end of the key space. There won't be any records within these ranges (except for the last record of the record store), but it does mean that any records in the future that get added to these ranges will correctly update the index. This means, e.g., that if the workload primarily adds records to the record store after the current last record (because perhaps the primary key is based off of an atomic counter or the current time), running this method will be highly contentious, but once it completes, the rest of the index build should happen without any more conflicts. This will return a (possibly null) TupleRange that contains the primary keys of the first and last records within the record store. This can then be used to either build the range right away or to then divy-up the remaining ranges between multiple agents working in parallel if one desires.
        Parameters:
        store - the record store in which to rebuild the index
        Returns:
        a future that will contain the range of records in the interior of the record store
      • buildEndpoints

        @Nonnull
        public CompletableFuture<TupleRange> buildEndpoints()
        Builds (with a retry loop) the endpoints of an index. See the buildEndpoints() method that takes an FDBRecordStore as its parameter for more details. This will retry on that function until it gets a non-exceptional result and return the results back.
        Returns:
        a future that will contain the range of records in the interior of the record store
      • stopOngoingOnlineIndexBuildsAsync

        public CompletableFuture<Void> stopOngoingOnlineIndexBuildsAsync()
        Stop any ongoing online index build (only if it uses SynchronizedSessions) by forcefully releasing the lock.
        Returns:
        a future that will be ready when the lock is released
        See Also:
        SynchronizedSession.endAnySession(com.apple.foundationdb.Transaction)
      • stopOngoingOnlineIndexBuilds

        public static void stopOngoingOnlineIndexBuilds​(@Nonnull
                                                        FDBRecordStore recordStore,
                                                        @Nonnull
                                                        Index index)
        Stop any ongoing online index build (only if it uses SynchronizedSessions) by forcefully releasing the lock.
        Parameters:
        recordStore - record store whose index builds need to be stopped
        index - the index whose builds need to be stopped
      • checkAnyOngoingOnlineIndexBuilds

        public boolean checkAnyOngoingOnlineIndexBuilds()
        Synchronous/blocking version of checkAnyOngoingOnlineIndexBuildsAsync().
        Returns:
        true if the index is being built and false otherwise
      • checkAnyOngoingOnlineIndexBuildsAsync

        public CompletableFuture<Boolean> checkAnyOngoingOnlineIndexBuildsAsync()
        Check if the index is being built by any of the OnlineIndexers (only if they use SynchronizedSessions), including this OnlineIndexer.
        Returns:
        a future that will complete to true if the index is being built and false otherwise
      • checkAnyOngoingOnlineIndexBuildsAsync

        public static CompletableFuture<Boolean> checkAnyOngoingOnlineIndexBuildsAsync​(@Nonnull
                                                                                       FDBRecordStore recordStore,
                                                                                       @Nonnull
                                                                                       Index index)
        Check if the index is being built by any of OnlineIndexers (only if they use SynchronizedSessions).
        Parameters:
        recordStore - record store whose index builds need to be checked
        index - the index to check for ongoing index builds
        Returns:
        a future that will complete to true if the index is being built and false otherwise
      • buildIndexAsync

        @Nonnull
        public CompletableFuture<Void> buildIndexAsync()
        Builds an index across multiple transactions.

        If it is set to use synchronized sessions, it stops with SynchronizedSessionLockedException when there is another runner actively working on the same index. It first checks and updates index states and clear index data respecting the OnlineIndexer.IndexStatePrecondition being set. It then builds the index across multiple transactions honoring the rate-limiting parameters set in the constructor of this class. It also retries any retriable errors that it encounters while it runs the build. At the end, it marks the index readable in the store.

        One may consider to set the index state precondition to OnlineIndexer.IndexStatePrecondition.ERROR_IF_DISABLED_CONTINUE_IF_WRITE_ONLY and OnlineIndexer.Builder.setUseSynchronizedSession(boolean) to false, which makes the indexer follow the same behavior as before version 2.8.90.0. But it is not recommended.

        Returns:
        a future that will be ready when the build has completed
        Throws:
        com.apple.foundationdb.synchronizedsession.SynchronizedSessionLockedException - the build is stopped because there may be another build running actively on this index.
      • buildIndex

        public void buildIndex​(boolean markReadable)
        Builds an index across multiple transactions. Synchronous version of buildIndexAsync().
        Parameters:
        markReadable - whether to mark the index as readable after building the index
      • buildIndex

        public void buildIndex()
        Builds an index across multiple transactions. Synchronous version of buildIndexAsync().
      • splitIndexBuildRange

        @API(EXPERIMENTAL)
        @Nonnull
        public List<org.apache.commons.lang3.tuple.Pair<Tuple,​Tuple>> splitIndexBuildRange​(int minSplit,
                                                                                                 int maxSplit)
        Split the index build range to support building an index across multiple transactions in parallel if needed.

        It is blocking and should not be called in asynchronous contexts.

        Parameters:
        minSplit - not split if it cannot be split into at least minSplit ranges
        maxSplit - the maximum number of splits generated
        Returns:
        a list of split primary key ranges (the low endpoint is inclusive and the high endpoint is exclusive)
      • markReadableIfBuilt

        @API(EXPERIMENTAL)
        @Nonnull
        public CompletableFuture<Boolean> markReadableIfBuilt()
        Mark the index as readable if it is built.
        Returns:
        a future that will complete to true if the index is readable and false otherwise
      • markReadable

        @API(EXPERIMENTAL)
        @Nonnull
        public CompletableFuture<Boolean> markReadable()
        Mark the index as readable.
        Returns:
        a future that will either complete exceptionally if the index can not be made readable or will contain true if the store was modified and false otherwise
      • asyncToSync

        @API(INTERNAL)
        public <T> T asyncToSync​(@Nonnull
                                 StoreTimer.Wait event,
                                 @Nonnull
                                 CompletableFuture<T> async)
        Wait for an asynchronous task to complete. This returns the result from the future or propagates the error if the future completes exceptionally.
        Type Parameters:
        T - the task's return type
        Parameters:
        event - the event being waited on (for instrumentation purposes)
        async - the asynchronous task to wait on
        Returns:
        the result of the asynchronous task
      • forRecordStoreAndIndex

        @Nonnull
        public static OnlineIndexer forRecordStoreAndIndex​(@Nonnull
                                                           FDBRecordStore recordStore,
                                                           @Nonnull
                                                           String index)
        Create an online indexer for the given record store and index.
        Parameters:
        recordStore - record store in which to index
        index - name of index to build
        Returns:
        a new online indexer