Class TextIndexMaintainer
- java.lang.Object
-
- com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer
-
- com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer
-
- com.apple.foundationdb.record.provider.foundationdb.indexes.TextIndexMaintainer
-
@API(EXPERIMENTAL) public class TextIndexMaintainer extends StandardIndexMaintainer
The index maintainer class for full-text indexes. This takes an expression whose first column (not counting grouping columns) is of typestring
. It will split the text found at that column using aTextTokenizer
and then write separate index keys for each token found in the text. This then supports queries on the tokenized text, such as:- All records containing all elements from a set of tokens:
Query.field(fieldName).text().containsAll(tokens)
- All records containing any elements from a set of tokens:
Query.field(fieldName).text().containsAny(tokens)
- All records containing all elements from a set of tokens within some maximum span:
Query.field(fieldName).text().containsAll(tokens, span)
- All records containing an exact phrase (modulo normalization and stop-word removal done by the tokenizer):
Query.field(fieldName).text().containsPhrase(phrase)
- All records containing at least one token that begins with a given prefix:
Query.field(fieldName).text().containsPrefix(prefix)
- All records containing at least one token that begins with any of a set of prefixes:
Query.field(fieldName).text().containsAnyPrefix(prefixes)
- All records containing at least one token that begins with each of a set of prefixes:
Query.field(fieldName).text().containsAllPrefixes(prefixes)
One can specify a tokenizer to use by setting the "textTokenizerName" and "textTokenizerVersion" options on the index. If no tokenizer is given, it will use a
DefaultTextTokenizer
, and if no version is specified, it will assume version 0. There should be oneTextTokenizer
implementation that uses that name and oneTextTokenizerFactory
implementation that will supply instances of the tokenizer of that name. The version of the tokenizer used to serialize each record is stored by this index maintainer, so if an index's tokenizer version changes, then this index maintainer will continue to use the older tokenizer version to tokenize the fields of any records present in the index prior to the version change. This guarantees that for every record, the same tokenizer version is used when inserting it and when deleting it. If one wants to re-tokenize a record following a tokenizer version change, then if one takes an existing record (tokenized with an older version) and saves the record again, then that record will be re-indexed using the newer version.Because each update will add a conflict range for each token included in each indexed text field per record, index updates can be particularly taxing on the resolver process within the FoundationDB cluster. Some use cases can therefore benefit from having fewer, larger conflict ranges per transaction to lessen the work done. The trade-off is that there is now potentially less parallelism in that there is a larger change of conflicts between records that arrive simultaneously, though it should be noted that the underlying data structure of the text index means that it is already likely that two records that happen to share common tokens that are updated simultaneously will conflict, so it might not actually produce more conflicts in practice. To enable adding conflict ranges over larger areas, set the "textAddAggressiveConflictRanges" option to
true
. Warning: This feature is currently experimental, and may change at any moment without prior notice.Note: At the moment, this index is under active development and should be considered experimental. At the current time, this index will be correctly updated on insert and removal and can be manually scanned, but it will only be selected by the query planner in limited circumstances to satisfy full text queries. For example, the query planner will not select this index if there are sorts involved in the query or if the filter involves using the position list to determine the relative positions of tokens within a document.
- All records containing all elements from a set of tokens:
-
-
Field Summary
-
Fields inherited from class com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer
TOO_LARGE_VALUE_MESSAGE_LIMIT
-
Fields inherited from class com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer
state
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
TextIndexMaintainer(IndexMaintainerState state)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
canDeleteWhere(QueryToKeyMatcher matcher, Key.Evaluated evaluated)
Indicates whether the expression allows for this index to perform aFDBRecordStoreBase.deleteRecordsWhere(QueryComponent)
operation.static int
getIndexTokenizerVersion(Index index)
Get the tokenizer version associated with this index.static TextTokenizer
getTokenizer(Index index)
Get the text tokenizer associated with this index.RecordCursor<IndexEntry>
scan(IndexScanType scanType, TupleRange range, byte[] continuation, ScanProperties scanProperties)
Scan this index between a range of tokens.<M extends Message>
CompletableFuture<Void>update(FDBIndexableRecord<M> oldRecord, FDBIndexableRecord<M> newRecord)
Updates an associated text index with the data associated with a new record.protected <M extends Message>
CompletableFuture<Void>updateIndexKeys(FDBIndexableRecord<M> savedRecord, boolean remove, List<IndexEntry> indexEntries)
Update index according to record keys.-
Methods inherited from class com.apple.foundationdb.record.provider.foundationdb.indexes.StandardIndexMaintainer
addedRangeWithKey, addUniquenessViolation, canDeleteWhere, canEvaluateAggregateFunction, canEvaluateRecordFunction, checkKeyValueSizes, checkUniqueness, commonKeys, decodeValue, deleteWhere, evaluateAggregateFunction, evaluateIndex, evaluateRecordFunction, filteredIndexEntries, getExecutor, getGroupedCount, getGroupingCount, getTimer, indexEntryKey, isIdempotent, makeMutable, performOperation, removeUniquenessViolationsAsync, saveIndexEntryAsKeyValue, scan, scanUniquenessViolations, skipUpdateForUnchangedKeys, trimTooLargeTuple, unpackKeyValue, unpackKeyValue, updateIndexKeysFunction, updateOneKeyAsync, validateEntries, validateMissingEntries, validateOrphanEntries
-
Methods inherited from class com.apple.foundationdb.record.provider.foundationdb.IndexMaintainer
getIndexSubspace, getSecondarySubspace, unsupportedAggregateFunction, unsupportedRecordFunction
-
-
-
-
Constructor Detail
-
TextIndexMaintainer
protected TextIndexMaintainer(@Nonnull IndexMaintainerState state)
-
-
Method Detail
-
getTokenizer
@Nonnull public static TextTokenizer getTokenizer(@Nonnull Index index)
Get the text tokenizer associated with this index. This uses the value of the ""textTokenizerName"" option to determine the name of the tokenizer and then looks up the tokenizer in the tokenizer registry.- Parameters:
index
- the index to get the tokenizer of- Returns:
- the tokenizer associated with this index
-
getIndexTokenizerVersion
public static int getIndexTokenizerVersion(@Nonnull Index index)
Get the tokenizer version associated with this index. This will parse the ""textTokenizerVersion"" option and produce an integer value from it. If none is specified, this returns the global miminum tokenizer version.- Parameters:
index
- the index to get the tokenizer version of- Returns:
- the tokenizer version associated with the given index
-
updateIndexKeys
@Nonnull protected <M extends Message> CompletableFuture<Void> updateIndexKeys(@Nonnull FDBIndexableRecord<M> savedRecord, boolean remove, @Nonnull List<IndexEntry> indexEntries)
Update index according to record keys. This will tokenize the text associated with this record and write out one index key for each token containing the position list as its value. Because writing to the full-text data structures requires reading from the database, so this future should be assumed to take a while to complete.- Overrides:
updateIndexKeys
in classStandardIndexMaintainer
- Type Parameters:
M
- the message type of the record- Parameters:
savedRecord
- the record being indexedremove
-true
if removing from index.indexEntries
- the result ofStandardIndexMaintainer.evaluateIndex(com.apple.foundationdb.record.provider.foundationdb.FDBRecord)
- Returns:
- a future completed when update is done
-
update
@Nonnull public <M extends Message> CompletableFuture<Void> update(@Nullable FDBIndexableRecord<M> oldRecord, @Nullable FDBIndexableRecord<M> newRecord)
Updates an associated text index with the data associated with a new record. Unlike most standard indexes, the text-index can behave somewhat differently if a record was previously written with this index but with an older tokenizer version, then it will always re-index the record and will write index entries to the database even if they are un-changed. The record will then be registered as having been written at the new tokenizer version (so subsequent updates will not have to do any additional updates for unchanged fields).- Overrides:
update
in classStandardIndexMaintainer
- Type Parameters:
M
- type of message- Parameters:
oldRecord
- the previous stored record ornull
if a new record is being creatednewRecord
- the new record ornull
if an old record is being deleted- Returns:
- a future that is complete when the record update is done
- See Also:
IndexMaintainer.update(FDBIndexableRecord, FDBIndexableRecord)
-
canDeleteWhere
public boolean canDeleteWhere(@Nonnull QueryToKeyMatcher matcher, @Nonnull Key.Evaluated evaluated)
Indicates whether the expression allows for this index to perform aFDBRecordStoreBase.deleteRecordsWhere(QueryComponent)
operation. A text index can only delete records that are aligned with its grouping key, as once text from the index has been tokenized, there is not a way to efficiently remove all of documents within the grouped part of the index.- Overrides:
canDeleteWhere
in classStandardIndexMaintainer
- Parameters:
matcher
- object to match the grouping key to a query componentevaluated
- an evaluated key that might align with this index's grouping key- Returns:
- whether the index maintainer can remove all records matching
matcher
-
scan
@Nonnull public RecordCursor<IndexEntry> scan(@Nonnull IndexScanType scanType, @Nonnull TupleRange range, @Nullable byte[] continuation, @Nonnull ScanProperties scanProperties)
Scan this index between a range of tokens. This index type requires that it be scanned only by text token. The range to scan can otherwise be between any two entries in the list, and scans over a prefix are supported by passing a value ofrange
that usesPREFIX_STRING
as both endpoint types. The keys returned in the index entry will include the token that was found in the index when scanning in the column that is used for the text field of the index's root expression. The value portion of each index entry will be a tuple whose first element is the position list for that entry within its associated record's field.- Specified by:
scan
in classIndexMaintainer
- Parameters:
scanType
- thetype
of scan to performrange
- the range to scancontinuation
- any continuation from a previous scan invocationscanProperties
- skip, limit and other properties of the scan- Returns:
- a cursor over all index entries in
range
- Throws:
RecordCoreException
- ifscanType
is notIndexScanType.BY_TEXT_TOKEN
- See Also:
TextCursor
-
-