Class BunchedMap<K,​V>

  • Type Parameters:
    K - type of keys in the map
    V - type of values in the map

    @API(EXPERIMENTAL)
    public class BunchedMap<K,​V>
    extends Object
    An implementation of a FoundationDB-backed map that bunches close keys together to minimize the overhead of storing keys with a common prefix. The most straight-forward way to store a map in FoundationDB is to store one key-value pair in the some subspace of the database for each key and value of the map. However, this can lead to problems if there are either too many keys or if the subspace prefix is too large as that prefix will be repeated many times (once for each key in the map).

    This structure "bunches" adjacent keys together so that one key in the database is responsible for storing multiple entries in the map, which effectively amortizes the cost of this subspace prefix across multiple map entries. In particular, the map will choose "signpost" keys in the map. For each signpost, a key in the database is constructed that is the subspace prefix concatenated with the serialized key. This key is then responsible for storing every entry in the map for which the key is greater than or equal to the signpost key but less than the next signpost key. The signposts are chosen dynamically as keys are added and removed from the map. In particular, there is a target "bunch size" that is a parameter to the map, and upon inserting, the map will see if there is a bunch that the given key can be placed in without exceeding the bunch size. If not, it will create one by adding a new signpost key.

    The cost for bunching entries this way is that it requires that the client perform additional database reads while inserting, so mutations have a higher latency than under the simpler scheme, and two clients attempting to modify the same map are likely to experience contention. It is also more expensive to read a single key from the map (as the read now will also read the data for keys in the same bunch as the desired key). A full scan of the map requires less data be transferred over the wire as the subspace prefix can be sent fewer times, so scan-heavy use-cases might not experience much of an overhead at all.

    Most methods of this class take a subspace. For the most part, these methods assume that there is one durable instance of a BunchedMap within the bounds of the subspace provided. The exception to this is the scanMulti(ReadTransaction, Subspace, SubspaceSplitter, byte[], byte[], byte[], int, boolean) scanMulti()} family of methods. See the documentation on those methods for more information.

    This class is not thread-safe in the general case. Assuming that the serializer and key-comparator are both thread-safe, this class is safe to use from multiple transactions at once or with multiple subspaces concurrently within a single transaction. However, it is unsafe to modify two keys within the same subspace in the same transaction from multiple threads.

    • Constructor Detail

      • BunchedMap

        public BunchedMap​(@Nonnull
                          BunchedSerializer<K,​V> serializer,
                          @Nonnull
                          Comparator<K> keyComparator,
                          int bunchSize)
        Create a bunched map with the given serializer, key comparator, and bunch size. The provided serializer is used to serialize keys and values when writing to the database and to deserialize them when reading. The comparator is used to maintain keys in a sorted order. The sorted order of keys, however, should be consistent with the byte order of serialized keys (when using unsigned lexicographic comparison), as that comparison method is used by the map when it is more efficient. The bunch size is the maximum number of map keys within any bunch of keys within the database. This value is not stored durably in the database, and it is safe to change this value over time (and to have different writers using different values for the bunch size concurrently), though one writer might undo the work of another writer or make different decisions when splitting up values or adding to bunches.
        Parameters:
        serializer - serialize to use when reading or writing data
        keyComparator - comparator used to order keys
        bunchSize - maximum size of bunch within the database
      • BunchedMap

        protected BunchedMap​(@Nonnull
                             BunchedMap<K,​V> model)
        Copy constructor for BunchedMaps. This will create a new BunchedMap with the same internal state as the original one. Because the BunchedMap has entirely immutable state, the primary use case for this is for extending subclasses to ensure that the class gets initialized correctly, which is why this is not a public method.
        Parameters:
        model - original BunchedMap to base the new one on
    • Method Detail

      • instrumentRangeRead

        @Nonnull
        protected CompletableFuture<List<KeyValue>> instrumentRangeRead​(@Nonnull
                                                                        CompletableFuture<List<KeyValue>> readFuture)
        Instrument a range read. The base implementation only returns the original future, but extenders are encouraged to override this method with their own implementations that, for example, records the total numbers of keys read and their sizes.
        Parameters:
        readFuture - a future that will complete to a list of keys and values
        Returns:
        an instrumented future that returns the same values as the original future
      • instrumentWrite

        protected void instrumentWrite​(@Nonnull
                                       byte[] key,
                                       @Nonnull
                                       byte[] value,
                                       @Nullable
                                       byte[] oldValue)
        Instrument a write. The default implementation does nothing, but extenders are encouraged to override this method with their own implementations that, for example, counts how many bytes have been added or removed from the database. Implementations should not modify the values in any of the arrays, and they should also be aware that the oldValue parameter may be null.
        Parameters:
        key - the key being written
        value - the new value being written to the key
        oldValue - the previous value being overwritten or null if the key is new or the previous value unknown
      • instrumentDelete

        protected void instrumentDelete​(@Nonnull
                                        byte[] key,
                                        @Nullable
                                        byte[] oldValue)
        Instrument a delete. The default implementation does nothing, but extenders are encouraged to override this method with their own implementation that, for example, counts how many bytes have been removed from the database. Implementations should not modify the values in any of the arrays, and they should also be aware that the oldValue parameter may be null.
        Parameters:
        key - the key being deleted
        oldValue - the previous value being delete or null if the previous value is unknown
      • put

        @Nonnull
        public CompletableFuture<Optional<V>> put​(@Nonnull
                                                  TransactionContext tcx,
                                                  @Nonnull
                                                  Subspace subspace,
                                                  @Nonnull
                                                  K key,
                                                  @Nonnull
                                                  V value)
        Inserts or updates a key into a map with a new value. This will find an appropriate bunch to insert the key into (or create one if one doesn't exist or if all of the candidates are full). It will do work to make sure that the placement is locally optimal (that is, it will choose between the one or two bunches closest to the key when performing its bunching). It makes no attempt to fix suboptimal bunches elsewhere within the map. If the map already contains key, it will overwrite the existing key with the new value. This will return the old value if one is present.

        Note that this method is not thread-safe if multiple threads call it with the same transaction and subspace. (Multiple calls with different transactions or subspaces are safe.)

        Note that this call is asynchronous. It will return a CompletableFuture that will be completed when this task has completed.

        Parameters:
        tcx - database or transaction to use when performing the insertion
        subspace - subspace within which the map's data are located
        key - key of the map entry to insert
        value - value of the map entry to insert
        Returns:
        a future that will complete with an optional that will either contain the previous value associated with the key or be empty if there was not a previous value
      • containsKey

        @Nonnull
        public CompletableFuture<Boolean> containsKey​(@Nonnull
                                                      TransactionContext tcx,
                                                      @Nonnull
                                                      Subspace subspace,
                                                      @Nonnull
                                                      K key)
        Determines whether a key is contained within the map. This method is safe to run concurrently with other map operations in other threads. However, if there are concurrent put or remove calls, then there are no guarantees as to whether this will return true or false.
        Parameters:
        tcx - database or transaction to use when performing reads
        subspace - subspace within which the map's data are located
        key - the key to check for membership within the map
        Returns:
        a future that will be completed to true if the map contains key and false otherwise
      • get

        @Nonnull
        public CompletableFuture<Optional<V>> get​(@Nonnull
                                                  TransactionContext tcx,
                                                  @Nonnull
                                                  Subspace subspace,
                                                  @Nonnull
                                                  K key)
        Retrieves the value associated with a key from the map. This method is safe to run concurrently with other map operations. However, if there are concurrent put or remove operations, then there are no guarantees as to whether this operation will see the result of the concurrent operation or not.
        Parameters:
        tcx - database or transaction to use when performing reads
        subspace - subspace within which the map's data are located
        key - the key within the map to retrieve the value of
        Returns:
        a future that will be completed with an optional that will be present with the value associated with the key in the database or empty if the key is not contained within the map
      • remove

        @Nonnull
        public CompletableFuture<Optional<V>> remove​(@Nonnull
                                                     TransactionContext tcx,
                                                     @Nonnull
                                                     Subspace subspace,
                                                     @Nonnull
                                                     K key)
        Removes a key from the map. This returns a future that will contain an optional with the old value associated with the key within the map (prior to deletion) if one is present or will be empty if the key was not contained within the map.

        Note that this method is not thread-safe if multiple threads call it with the same transaction and subspace. (Multiple calls with different transactions or subspaces are safe.)

        Note that this call is asynchronous. It will return a CompletableFuture that will be completed when this task has completed.

        Parameters:
        tcx - database or transaction to use when removing the key
        subspace - subspace within which the map's data are located
        key - the key to remove from the map
        Returns:
        a future that will be completed with an optional that will be present with the value associated with the key in the database (prior to removal) or will be empty if the key was not present
      • verifyIntegrity

        @Nonnull
        public CompletableFuture<Void> verifyIntegrity​(@Nonnull
                                                       TransactionContext tcx,
                                                       @Nonnull
                                                       Subspace subspace)
        Verify the integrity of the bunched map. This will read through all of the database keys associated with the map and verify that all of the keys are in order. If it encounters an error, it will complete with an exception. Otherwise, the returned future will complete normally.
        Parameters:
        tcx - database or transaction to use when removing the key
        subspace - subspace within which the map's data are located
        Returns:
        a future that will complete when the integrity check has finished
      • compact

        @Nonnull
        protected CompletableFuture<byte[]> compact​(@Nonnull
                                                    TransactionContext tcx,
                                                    @Nonnull
                                                    Subspace subspace,
                                                    int keyLimit,
                                                    @Nullable
                                                    byte[] continuation)
        Compact the values within the map into as few keys as possible. This will scan through and re-write the keys to be optimal. This feature is experimental at the moment, but it should be used to better pack entries if needed.
        Parameters:
        tcx - database or transaction to use when compacting data
        subspace - subspace within which the map's data are located
        keyLimit - maximum number of database keys to read in a single transaction
        continuation - the continuation returned from a previous call or null to start from the beginning of the subspace
        Returns:
        future that will complete with a continuation that can be used to complete the compaction across multiple transactions (null if finished)
      • getKeyComparator

        @Nonnull
        public Comparator<K> getKeyComparator()
        Get the comparator used to order keys of the map.
        Returns:
        this map's key comparator
      • getBunchSize

        public int getBunchSize()
        Get the maximum number of map entries to encode in a single database key. This property controls how "bunched" the map is. A higher value will use less space at the cost of additional I/O at update time (and at read time in the case of get(TransactionContext, Subspace, Object) operations).
        Returns:
        the maximum number of map entries to encode in a single database key
      • scan

        @Nonnull
        public BunchedMapIterator<K,​V> scan​(@Nonnull
                                                  ReadTransaction tr,
                                                  @Nonnull
                                                  Subspace subspace)
        Scans the map and returns an iterator over all entries. This has the same behavior as the scan() method that takes more parameters, but it will return an iterator that has no limit and always returns entries in ascending order by key.
        Parameters:
        tr - transaction to use when scanning the map
        subspace - subspace in which the map's data are located
        Returns:
        an iterator over the entries in the map
      • scan

        @Nonnull
        public BunchedMapIterator<K,​V> scan​(@Nonnull
                                                  ReadTransaction tr,
                                                  @Nonnull
                                                  Subspace subspace,
                                                  @Nullable
                                                  byte[] continuation)
        Scans the maps and returns an iterator over all entries. This has the same behavior as the scan() method that takes more parameters, but it will return an iterator that has no limit and always returns entries in ascending order by key.
        Parameters:
        tr - transaction to use when scanning the map
        subspace - subspace in which the map's data are located
        continuation - continuation from a previous scan (or null to start from the beginning)
        Returns:
        an iterator over the entries in the map
      • scan

        @Nonnull
        public BunchedMapIterator<K,​V> scan​(@Nonnull
                                                  ReadTransaction tr,
                                                  @Nonnull
                                                  Subspace subspace,
                                                  @Nullable
                                                  byte[] continuation,
                                                  int limit,
                                                  boolean reverse)
        Scans the map and returns an iterator over all entries. All entries will be returned sorted by key. If reverse is true, it will return keys in descending order instead of in ascending order. It will return at most limit keys from the map. Note that because of bunching, this will probably require reading fewer keys from the database. Scans that require multiple transactions can provide a continuation from a previous scan. The returned iterator has a getContinuation() method that can be used to get an appropriate value for that parameter.
        Parameters:
        tr - transaction to use when scanning the map
        subspace - subspace in which the map's data are located
        continuation - continuation from a previous scan (or null to start from the beginning)
        limit - maximum number of keys to return or ReadTransaction.ROW_LIMIT_UNLIMITED if no limit
        reverse - true if keys are wanted in descending instead of ascending order
        Returns:
        an iterator over the entries in the map
      • scanMulti

        @Nonnull
        public <T> BunchedMapMultiIterator<K,​V,​T> scanMulti​(@Nonnull
                                                                        ReadTransaction tr,
                                                                        @Nonnull
                                                                        Subspace subspace,
                                                                        @Nonnull
                                                                        SubspaceSplitter<T> splitter)
        Scans multiple maps and returns an iterator over all of them. This behaves the same was as the scanMulti() method that takes additional subspaceStart and subspaceEnd parameters, but this will always scan from the beginning of subspace to the end, assumes that there is no limit to the number of entries to return, and always returns items in ascending order.
        Type Parameters:
        T - type of the tag of each map subspace
        Parameters:
        tr - transaction to use when scanning the maps
        subspace - subspace containing one or more maps
        splitter - object to determine which map a given key is in
        Returns:
        an iterator over the entries in multiple maps
      • scanMulti

        @Nonnull
        public <T> BunchedMapMultiIterator<K,​V,​T> scanMulti​(@Nonnull
                                                                        ReadTransaction tr,
                                                                        @Nonnull
                                                                        Subspace subspace,
                                                                        @Nonnull
                                                                        SubspaceSplitter<T> splitter,
                                                                        @Nullable
                                                                        byte[] continuation,
                                                                        int limit,
                                                                        boolean reverse)
        Scans multiple maps and returns an iterator over all of them. This behaves the same was as the scanMulti() method that takes additional subspaceStart and subspaceEnd parameters, but this will always scan from the beginning of subspace to the end.
        Type Parameters:
        T - type of the tag of each map subspace
        Parameters:
        tr - transaction to use when scanning the maps
        subspace - subspace containing one or more maps
        splitter - object to determine which map a given key is in
        continuation - continuation from previous scan (or null to start from the beginning)
        limit - maximum number of entries to return
        reverse - true if the entries should be returned in descending order or false otherwise
        Returns:
        an iterator over the entries in multiple maps
      • scanMulti

        @Nonnull
        public <T> BunchedMapMultiIterator<K,​V,​T> scanMulti​(@Nonnull
                                                                        ReadTransaction tr,
                                                                        @Nonnull
                                                                        Subspace subspace,
                                                                        @Nonnull
                                                                        SubspaceSplitter<T> splitter,
                                                                        @Nullable
                                                                        byte[] subspaceStart,
                                                                        @Nullable
                                                                        byte[] subspaceEnd,
                                                                        @Nullable
                                                                        byte[] continuation,
                                                                        int limit,
                                                                        boolean reverse)
        Scans multiple maps and returns an iterator over all of them. The returned iterator will produce one item for each entry in each map. To do so, the maps must be backed by contiguous subspaces within subspace. The scan will start at the first subspace greater than or equal to the subspace formed by concatenating the raw prefix of subspace with subspaceStart (or with nothing if subspaceStart is null) and will end with the last subspace less than the subspace formed by concatenating the raw prefix of subspace with subspaceEnd (or with nothing if subspaceEnd is null). The provided SubspaceSplitter should be able to determine the subspace of the map that a given key contains and (optionally) provide some tag that can be used to associate that map with some value that might be used to distinguish one map from the other (if the application needs it).

        For example, suppose there are ten BunchedMaps that all begin with a raw prefix prefix followed by a Tuple-encoded integer (0 through 10). If one wanted to scan the three maps starting with map 3 and ending with map 6, one would supply the following parameters:

        • subspace should be set to new Subspace(prefix)
        • splitter should be implemented so that if given key within map n, subspaceOf(key) returns subspace.subspace(Tuple.from(n)) and subspaceTag(subspaceOf(key)) returns n.
        • subspaceStart should be set to Tuple.pack(3)
        • subspaceEnd should be set to Tuple.pack(7) (or Tuple.pack(6) concatenated with a 0-byte)

        Furthermore, this method can be used across multiple calls or transactions by setting the continuation parameter to the result of getContinuation() from a previous scan. (One should pass null to restart the scan from the beginning.) The iterator will return at most limit entries. The entries will ordered first by subspace and within a subspace by key. If reverse is true, the items will be returned in descending order. Otherwise, they will be returned in ascending order.

        Type Parameters:
        T - type of the tag of each map subspace
        Parameters:
        tr - transaction to use when scanning the maps
        subspace - subspace containing one or more maps
        splitter - object to determine which map a given key is in
        subspaceStart - inclusive starting suffix of subspace to start the scan
        subspaceEnd - exclusive ending suffix of subspace to end the scan
        continuation - continuation from previous scan (or null to start from the beginning)
        limit - maximum number of entries to return
        reverse - true if the entries should be returned in descending order or false otherwise
        Returns:
        an iterator over the entries in multiple maps