@API(value=EXPERIMENTAL) public class TextIndexMaintainer extends StandardIndexMaintainer
string
.
It will split the text found at that column using a TextTokenizer
and then write separate index keys
for each token found in the text. This then supports queries on the tokenized text, such as:
Query.field(fieldName).text().containsAll(tokens)
Query.field(fieldName).text().containsAny(tokens)
Query.field(fieldName).text().containsAll(tokens, span)
Query.field(fieldName).text().containsPhrase(phrase)
Query.field(fieldName).text().containsPrefix(prefix)
Query.field(fieldName).text().containsAnyPrefix(prefixes)
Query.field(fieldName).text().containsAllPrefixes(prefixes)
One can specify a tokenizer to use by setting the and
options on the index. If no tokenizer is given,
it will use a DefaultTextTokenizer
,
and if no version is specified, it will assume version .
There should be one TextTokenizer
implementation that uses that name and one
TextTokenizerFactory
implementation
that will supply instances of the tokenizer of that name. The version of the tokenizer used to serialize
each record is stored by this index maintainer, so if an index's tokenizer version changes, then this
index maintainer will continue to use the older tokenizer version to tokenize the fields of any records
present in the index prior to the version change. This guarantees that for every record, the same tokenizer
version is used when inserting it and when deleting it. If one wants to re-tokenize a record following
a tokenizer version change, then if one takes an existing record (tokenized with an older version) and saves
the record again, then that record will be re-indexed using the newer version.
Because each update will add a conflict range for each token included in each indexed text field per record,
index updates can be particularly taxing on the resolver process within the FoundationDB cluster. Some use cases
can therefore benefit from having fewer, larger conflict ranges per transaction to lessen the work done. The
trade-off is that there is now potentially less parallelism in that there is a larger change of conflicts
between records that arrive simultaneously, though it should be noted that the underlying data structure of the
text index means that it is already likely that two records that happen to share common tokens that are updated
simultaneously will conflict, so it might not actually produce more conflicts in practice. To enable adding
conflict ranges over larger areas, set the option
to true
. Warning: This feature is currently experimental, and may change at any moment without prior notice.
Note: At the moment, this index is under active development and should be considered experimental. At the current time, this index will be correctly updated on insert and removal and can be manually scanned, but it will only be selected by the query planner in limited circumstances to satisfy full text queries. For example, the query planner will not select this index if there are sorts involved in the query or if the filter involves using the position list to determine the relative positions of tokens within a document.
TOO_LARGE_VALUE_MESSAGE_LIMIT
state
Modifier | Constructor and Description |
---|---|
protected |
TextIndexMaintainer(IndexMaintainerState state) |
Modifier and Type | Method and Description |
---|---|
boolean |
canDeleteWhere(QueryToKeyMatcher matcher,
Key.Evaluated evaluated)
Indicates whether the expression allows for this index to perform a
FDBRecordStoreBase.deleteRecordsWhere(QueryComponent)
operation. |
static int |
getIndexTokenizerVersion(Index index)
Get the tokenizer version associated with this index.
|
static TextTokenizer |
getTokenizer(Index index)
Get the text tokenizer associated with this index.
|
RecordCursor<IndexEntry> |
scan(IndexScanType scanType,
TupleRange range,
byte[] continuation,
ScanProperties scanProperties)
Scan this index between a range of tokens.
|
<M extends Message> |
update(FDBIndexableRecord<M> oldRecord,
FDBIndexableRecord<M> newRecord)
Updates an associated text index with the data associated with a new record.
|
protected <M extends Message> |
updateIndexKeys(FDBIndexableRecord<M> savedRecord,
boolean remove,
List<IndexEntry> indexEntries)
Update index according to record keys.
|
addedRangeWithKey, canDeleteWhere, canEvaluateAggregateFunction, canEvaluateRecordFunction, checkKeyValueSizes, commonKeys, decodeValue, deleteWhere, evaluateAggregateFunction, evaluateIndex, evaluateRecordFunction, filteredIndexEntries, getExecutor, getGroupedCount, getGroupingCount, getTimer, indexEntryKey, isIdempotent, makeMutable, performOperation, saveIndexEntryAsKeyValue, scan, scanUniquenessViolations, skipUpdateForUnchangedKeys, trimTooLargeTuple, unpackKeyValue, unpackKeyValue, updateIndexKeysFunction, updateOneKey, updateUniquenessViolations, validateEntries, validateMissingEntries, validateOrphanEntries
getIndexSubspace, getSecondarySubspace, unsupportedAggregateFunction, unsupportedRecordFunction
protected TextIndexMaintainer(@Nonnull IndexMaintainerState state)
@Nonnull public static TextTokenizer getTokenizer(@Nonnull Index index)
index
- the index to get the tokenizer ofpublic static int getIndexTokenizerVersion(@Nonnull Index index)
index
- the index to get the tokenizer version of@Nonnull protected <M extends Message> CompletableFuture<Void> updateIndexKeys(@Nonnull FDBIndexableRecord<M> savedRecord, boolean remove, @Nonnull List<IndexEntry> indexEntries)
updateIndexKeys
in class StandardIndexMaintainer
M
- the message type of the recordsavedRecord
- the record being indexedremove
- true
if removing from index.indexEntries
- the result of StandardIndexMaintainer.evaluateIndex(com.apple.foundationdb.record.provider.foundationdb.FDBRecord)
@Nonnull public <M extends Message> CompletableFuture<Void> update(@Nullable FDBIndexableRecord<M> oldRecord, @Nullable FDBIndexableRecord<M> newRecord)
update
in class StandardIndexMaintainer
M
- type of messageoldRecord
- the previous stored record or null
if a new record is being creatednewRecord
- the new record or null
if an old record is being deletedIndexMaintainer.update(FDBIndexableRecord, FDBIndexableRecord)
public boolean canDeleteWhere(@Nonnull QueryToKeyMatcher matcher, @Nonnull Key.Evaluated evaluated)
FDBRecordStoreBase.deleteRecordsWhere(QueryComponent)
operation. A text index can only delete records that are aligned with its grouping key, as
once text from the index has been tokenized, there is not a way to efficiently remove all of
documents within the grouped part of the index.canDeleteWhere
in class StandardIndexMaintainer
matcher
- object to match the grouping key to a query componentevaluated
- an evaluated key that might align with this index's grouping keymatcher
@Nonnull public RecordCursor<IndexEntry> scan(@Nonnull IndexScanType scanType, @Nonnull TupleRange range, @Nullable byte[] continuation, @Nonnull ScanProperties scanProperties)
range
that uses
PREFIX_STRING
as both endpoint types.
The keys returned in the index entry will include the token that was found in the index
when scanning in the column that is used for the text field of the index's root expression.
The value portion of each index entry will be a tuple whose first element is the position
list for that entry within its associated record's field.scan
in class IndexMaintainer
scanType
- the type
of scan to performrange
- the range to scancontinuation
- any continuation from a previous scan invocationscanProperties
- skip, limit and other properties of the scanrange
RecordCoreException
- if scanType
is not IndexScanType.BY_TEXT_TOKEN
TextCursor