TextIndexMaintainer (fdb-record-layer-core 2.8.88.0 API)

java.lang.Object
- com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer
- - com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer
  - - com.apple.foundationdb.record.provider.foundationdb.indexes.TextIndexMaintainer

```
@API(value=EXPERIMENTAL)
public class TextIndexMaintainer
extends StandardIndexMaintainer
```
The index maintainer class for full-text indexes. This takes an expression whose first column (not counting grouping columns) is of type string. It will split the text found at that column using a TextTokenizer and then write separate index keys for each token found in the text. This then supports queries on the tokenized text, such as:
- All records containing all elements from a set of tokens: Query.field(fieldName).text().containsAll(tokens)
- All records containing any elements from a set of tokens: Query.field(fieldName).text().containsAny(tokens)
- All records containing all elements from a set of tokens within some maximum span: Query.field(fieldName).text().containsAll(tokens, span)
- All records containing an exact phrase (modulo normalization and stop-word removal done by the tokenizer): Query.field(fieldName).text().containsPhrase(phrase)
- All records containing at least one token that begins with a given prefix: Query.field(fieldName).text().containsPrefix(prefix)
- All records containing at least one token that begins with any of a set of prefixes: Query.field(fieldName).text().containsAnyPrefix(prefixes)
- All records containing at least one token that begins with each of a set of prefixes: Query.field(fieldName).text().containsAllPrefixes(prefixes)
One can specify a tokenizer to use by setting the and options on the index. If no tokenizer is given, it will use a DefaultTextTokenizer, and if no version is specified, it will assume version . There should be one TextTokenizer implementation that uses that name and one TextTokenizerFactory implementation that will supply instances of the tokenizer of that name. The version of the tokenizer used to serialize each record is stored by this index maintainer, so if an index's tokenizer version changes, then this index maintainer will continue to use the older tokenizer version to tokenize the fields of any records present in the index prior to the version change. This guarantees that for every record, the same tokenizer version is used when inserting it and when deleting it. If one wants to re-tokenize a record following a tokenizer version change, then if one takes an existing record (tokenized with an older version) and saves the record again, then that record will be re-indexed using the newer version.

Because each update will add a conflict range for each token included in each indexed text field per record, index updates can be particularly taxing on the resolver process within the FoundationDB cluster. Some use cases can therefore benefit from having fewer, larger conflict ranges per transaction to lessen the work done. The trade-off is that there is now potentially less parallelism in that there is a larger change of conflicts between records that arrive simultaneously, though it should be noted that the underlying data structure of the text index means that it is already likely that two records that happen to share common tokens that are updated simultaneously will conflict, so it might not actually produce more conflicts in practice. To enable adding conflict ranges over larger areas, set the option to true. Warning: This feature is currently experimental, and may change at any moment without prior notice.

Note: At the moment, this index is under active development and should be considered experimental. At the current time, this index will be correctly updated on insert and removal and can be manually scanned, but it will only be selected by the query planner in limited circumstances to satisfy full text queries. For example, the query planner will not select this index if there are sorts involved in the query or if the filter involves using the position list to determine the relative positions of tokens within a document.

Field Summary
- Fields inherited from class com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer
  TOO_LARGE_VALUE_MESSAGE_LIMIT
- Fields inherited from class com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer
  state

Constructor Summary

Constructors
Modifier Constructor and Description

protected TextIndexMaintainer(IndexMaintainerState state)

Constructors
Modifier	Constructor and Description
`protected`	`TextIndexMaintainer(IndexMaintainerState state)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`canDeleteWhere(QueryToKeyMatcher matcher, Key.Evaluated evaluated)` Indicates whether the expression allows for this index to perform a `FDBRecordStoreBase.deleteRecordsWhere(QueryComponent)` operation.
`static int`	`getIndexTokenizerVersion(Index index)` Get the tokenizer version associated with this index.
`static TextTokenizer`	`getTokenizer(Index index)` Get the text tokenizer associated with this index.
`RecordCursor<IndexEntry>`	`scan(IndexScanType scanType, TupleRange range, byte[] continuation, ScanProperties scanProperties)` Scan this index between a range of tokens.
`<M extends Message> CompletableFuture<Void>`	`update(FDBIndexableRecord<M> oldRecord, FDBIndexableRecord<M> newRecord)` Updates an associated text index with the data associated with a new record.
`protected <M extends Message> CompletableFuture<Void>`	`updateIndexKeys(FDBIndexableRecord<M> savedRecord, boolean remove, List<IndexEntry> indexEntries)` Update index according to record keys.

Methods inherited from class com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer
addedRangeWithKey, canDeleteWhere, canEvaluateAggregateFunction, canEvaluateRecordFunction, checkKeyValueSizes, commonKeys, decodeValue, deleteWhere, evaluateAggregateFunction, evaluateIndex, evaluateRecordFunction, filteredIndexEntries, getExecutor, getGroupedCount, getGroupingCount, getTimer, indexEntryKey, isIdempotent, makeMutable, performOperation, saveIndexEntryAsKeyValue, scan, scanUniquenessViolations, skipUpdateForUnchangedKeys, trimTooLargeTuple, unpackKeyValue, unpackKeyValue, updateIndexKeysFunction, updateOneKey, updateUniquenessViolations, validateEntries, validateMissingEntries, validateOrphanEntries

Methods inherited from class com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer
getIndexSubspace, getSecondarySubspace, unsupportedAggregateFunction, unsupportedRecordFunction

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - TextIndexMaintainer
```
protected TextIndexMaintainer(@Nonnull
                              IndexMaintainerState state)
```
- Method Detail
  - getTokenizer
```
@Nonnull
public static TextTokenizer getTokenizer(@Nonnull
                                                  Index index)
```
    Get the text tokenizer associated with this index. This uses the value of the "" option to determine the name of the tokenizer and then looks up the tokenizer in the tokenizer registry.
    
    Parameters:
    
    index - the index to get the tokenizer of
    
    Returns:
    
    the tokenizer associated with this index
  - getIndexTokenizerVersion
```
public static int getIndexTokenizerVersion(@Nonnull
                                           Index index)
```
    Get the tokenizer version associated with this index. This will parse the "" option and produce an integer value from it. If none is specified, this returns the global miminum tokenizer version.
    
    Parameters:
    
    index - the index to get the tokenizer version of
    
    Returns:
    
    the tokenizer version associated with the given index
  - updateIndexKeys
```
@Nonnull
protected <M extends Message> CompletableFuture<Void> updateIndexKeys(@Nonnull
                                                                               FDBIndexableRecord<M> savedRecord,
                                                                               boolean remove,
                                                                               @Nonnull
                                                                               List<IndexEntry> indexEntries)
```
    Update index according to record keys. This will tokenize the text associated with this record and write out one index key for each token containing the position list as its value. Because writing to the full-text data structures requires reading from the database, so this future should be assumed to take a while to complete.
    
    Overrides:
    
    updateIndexKeys in class StandardIndexMaintainer
    
    Type Parameters:
    
    M - the message type of the record
    
    Parameters:
    
    savedRecord - the record being indexed
    
    remove - true if removing from index.
    
    indexEntries - the result of StandardIndexMaintainer.evaluateIndex(com.apple.foundationdb.record.provider.foundationdb.FDBRecord)
    
    Returns:
    
    a future completed when update is done
  - update
```
@Nonnull
public <M extends Message> CompletableFuture<Void> update(@Nullable
                                                                   FDBIndexableRecord<M> oldRecord,
                                                                   @Nullable
                                                                   FDBIndexableRecord<M> newRecord)
```
    Updates an associated text index with the data associated with a new record. Unlike most standard indexes, the text-index can behave somewhat differently if a record was previously written with this index but with an older tokenizer version, then it will always re-index the record and will write index entries to the database even if they are un-changed. The record will then be registered as having been written at the new tokenizer version (so subsequent updates will not have to do any additional updates for unchanged fields).
    
    Overrides:
    
    update in class StandardIndexMaintainer
    
    Type Parameters:
    
    M - type of message
    
    Parameters:
    
    oldRecord - the previous stored record or null if a new record is being created
    
    newRecord - the new record or null if an old record is being deleted
    
    Returns:
    
    a future that is complete when the record update is done
    
    See Also:
    
    IndexMaintainer.update(FDBIndexableRecord, FDBIndexableRecord)
  - canDeleteWhere
```
public boolean canDeleteWhere(@Nonnull
                              QueryToKeyMatcher matcher,
                              @Nonnull
                              Key.Evaluated evaluated)
```
    Indicates whether the expression allows for this index to perform a FDBRecordStoreBase.deleteRecordsWhere(QueryComponent) operation. A text index can only delete records that are aligned with its grouping key, as once text from the index has been tokenized, there is not a way to efficiently remove all of documents within the grouped part of the index.
    
    Overrides:
    
    canDeleteWhere in class StandardIndexMaintainer
    
    Parameters:
    
    matcher - object to match the grouping key to a query component
    
    evaluated - an evaluated key that might align with this index's grouping key
    
    Returns:
    
    whether the index maintainer can remove all records matching matcher
  - scan
```
@Nonnull
public RecordCursor<IndexEntry> scan(@Nonnull
                                              IndexScanType scanType,
                                              @Nonnull
                                              TupleRange range,
                                              @Nullable
                                              byte[] continuation,
                                              @Nonnull
                                              ScanProperties scanProperties)
```
    Scan this index between a range of tokens. This index type requires that it be scanned only by text token. The range to scan can otherwise be between any two entries in the list, and scans over a prefix are supported by passing a value of range that uses PREFIX_STRING as both endpoint types. The keys returned in the index entry will include the token that was found in the index when scanning in the column that is used for the text field of the index's root expression. The value portion of each index entry will be a tuple whose first element is the position list for that entry within its associated record's field.
    
    Specified by:
    
    scan in class IndexMaintainer
    
    Parameters:
    
    scanType - the type of scan to perform
    
    range - the range to scan
    
    continuation - any continuation from a previous scan invocation
    
    scanProperties - skip, limit and other properties of the scan
    
    Returns:
    
    a cursor over all index entries in range
    
    Throws:
    
    RecordCoreException - if scanType is not IndexScanType.BY_TEXT_TOKEN
    
    See Also:
    
    TextCursor

Class TextIndexMaintainer

Field Summary

Fields inherited from class com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer

Fields inherited from class com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer

Constructor Summary

Method Summary

Methods inherited from class com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer

Methods inherited from class com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer

Methods inherited from class java.lang.Object

Constructor Detail

TextIndexMaintainer

Method Detail

getTokenizer

getIndexTokenizerVersion

updateIndexKeys

update

canDeleteWhere

scan