@API(value=EXPERIMENTAL) public abstract class Text extends Object
This type allows the user to specify a "tokenizer name". If one is given, then it will use
this tokenizer to tokenize the query string (if not pre-tokenized) and will require that
if an index is used, it uses the tokenizer provided. If no tokenizer is specified, then
it will allow itself to be matched against any text index on the field and apply the
index's tokenizer to the query string. If no suitable index can be found and a full
scan with a post-filter has to be done, then a fallback tokenizer will be used both to
tokenize the query string as well as to tokenize the record's text. By default, this
is the DefaultTextTokenizer
(with name ""), but
one can specify a different one if one wishes.
This should be created by calling the text()
method on a query
Field
or OneOfThem
instance. For example, one might call: Query.field("text").text()
to create a predicate on the text
field's contents.
TextIndexMaintainer
,
TextTokenizer
,
DefaultTextTokenizer
Modifier and Type | Method and Description |
---|---|
QueryComponent |
contains(String token)
Checks if the field contains a token.
|
QueryComponent |
containsAll(List<String> tokens)
Checks if the field contains all of provided tokens.
|
QueryComponent |
containsAll(List<String> tokens,
int maxDistance)
Checks if the field text contains all of the provided tokens within
a given number of tokens.
|
QueryComponent |
containsAll(String tokens)
Checks if the field contains all of the provided tokens.
|
QueryComponent |
containsAll(String tokens,
int maxDistance)
Checks if the field text contains all of the provided tokens within
a given number of tokens.
|
QueryComponent |
containsAllPrefixes(List<String> tokenPrefixes)
Checks if the field contains tokens matching all of of the given prefixes.
|
QueryComponent |
containsAllPrefixes(List<String> tokenPrefixes,
boolean strict)
Checks if the field contains tokens matching all of of the given prefixes.
|
QueryComponent |
containsAllPrefixes(List<String> tokenPrefixes,
boolean strict,
long expectedRecords,
double falsePositivePercentage)
Checks if the field contains tokens matching all of of the given prefixes.
|
QueryComponent |
containsAllPrefixes(String tokenPrefixes)
Checks if the field contains tokens matching all of of the given prefixes.
|
QueryComponent |
containsAllPrefixes(String tokenPrefixes,
boolean strict)
Checks if the field contains tokens matching all of of the given prefixes.
|
QueryComponent |
containsAllPrefixes(String tokenPrefixes,
boolean strict,
long expectedRecords,
double falsePositivePercentage)
Checks if the field contains tokens matching all of of the given prefixes.
|
QueryComponent |
containsAny(List<String> tokens)
Checks if the field contains all of provided tokens.
|
QueryComponent |
containsAny(String tokens)
Checks if the field contains any of the provided tokens.
|
QueryComponent |
containsAnyPrefix(List<String> tokenPrefixes)
Checks if the field contains a token that matches any of the given
prefixes.
|
QueryComponent |
containsAnyPrefix(String tokenPrefixes)
Checks if the field contains a token that matches any of the given
prefixes.
|
QueryComponent |
containsPhrase(List<String> phraseTokens)
Checks if the field text contains the given phrase.
|
QueryComponent |
containsPhrase(String phrase)
Checks if the field contains the given phrase.
|
QueryComponent |
containsPrefix(String prefix)
Checks if the field contains any token matching the provided prefix.
|
@Nonnull public QueryComponent contains(@Nonnull String token)
token
- the token to search for@Nonnull public QueryComponent containsPrefix(@Nonnull String prefix)
prefix
- the prefix to search for@Nonnull public QueryComponent containsAll(@Nonnull String tokens)
Boolean.TRUE
if all of the tokens (except stop words) are present in the text field,
Boolean.FALSE
if any of them are not, and null
if
either the field is null
or if the token list contains only
stop words or is empty. If the same token appears multiple times in the token
list, then the token must only appear at least once in the searched
text to satisfy the filter (i.e., it is not required to appear as many
times in the text as in the token list).tokens
- the tokens to search for@Nonnull public QueryComponent containsAll(@Nonnull List<String> tokens)
containsAll(String)
, except that the token list
is assumed to have already been tokenized with an appropriate
tokenizer. No further sanitization or normalization is performed
on the tokens before searching for them in the text.tokens
- the tokens to search for@Nonnull public QueryComponent containsAll(@Nonnull String tokens, int maxDistance)
containsAll("a c", 2)
when evaluated
against that string would return Boolean.TRUE
, but
containsAll("a c", 1)
would return Boolean.FALSE
.
Stop words in the query string are ignored, and if there are no
tokens in the string (or all tokens are stop words), this will
evaluate to null
. It will also evaluate to null
if the field is null
. If the same token appears multiple times
in the token list, then the token must only appear at least once in the searched
text to satisfy the filter (i.e., it is not required to appear as many
times in the text as in the token list).tokens
- the tokens to search formaxDistance
- the maximum distance (expressed in number of tokens) to allow between found@Nonnull public QueryComponent containsAll(@Nonnull List<String> tokens, int maxDistance)
containsAll(String, int)
except that the token list is assumed to have already been tokenized with
an appropriate tokenizer. No further sanitization or normalization is
performed on the tokens before searching for them in the text.tokens
- the tokens to search formaxDistance
- the maximum distance (expressed in number of tokens) to allow between found@Nonnull public QueryComponent containsAllPrefixes(@Nonnull String tokenPrefixes)
String
will be tokenized into multiple tokens using an
appropriate tokenizer. This variant of containsAllPrefixes
is
strict, i.e., the planner will ensure that it does not return any false
positives when evaluated with an index scan. However, the scan can be made more
efficient (if false positives are acceptable) if one uses one of the other
variants of this function and supply false
to the strict
parameter.tokenPrefixes
- the token prefixes to search forcontainsAllPrefixes(String, boolean)
@Nonnull public QueryComponent containsAllPrefixes(@Nonnull String tokenPrefixes, boolean strict)
String
will be tokenized into multiple tokens using an
appropriate tokenizer. The strict
parameter determines whether this
comparison is strictly evaluated against an index. If the parameter
is set to true
, then this will return no false positives, but it
may require that there are additional reads performed to filter out any false
positives that occur internally during query execution.tokenPrefixes
- the token prefixes to search forstrict
- true
if this should not return false positives@Nonnull public QueryComponent containsAllPrefixes(@Nonnull String tokenPrefixes, boolean strict, long expectedRecords, double falsePositivePercentage)
String
will be tokenized into multiple tokens using an
appropriate tokenizer. The strict
parameter behaves the same way
here as it does in the other overload of containsAllPrefixes()
.
The expectedRecords
and falsePositivePercentage
flags
can be used to tweak the behavior of underlying probabilistic data structures
used during query execution. See the Comparisons.TextContainsAllPrefixesComparison
class for more details.tokenPrefixes
- the token prefixes to search forstrict
- true
if this should not return any false positivesexpectedRecords
- the expected number of records read for each prefixfalsePositivePercentage
- an acceptable percentage of false positives for each token prefixComparisons.TextContainsAllPrefixesComparison
,
containsAllPrefixes(String, boolean)
@Nonnull public QueryComponent containsAllPrefixes(@Nonnull List<String> tokenPrefixes)
containsAllPrefixes(String)
that takes a single
String
, but this method assumes the token prefixes given are already
tokenized and normalized.tokenPrefixes
- the token prefixes to search forcontainsAllPrefixes(String)
@Nonnull public QueryComponent containsAllPrefixes(@Nonnull List<String> tokenPrefixes, boolean strict)
containsAllPrefixes(String, boolean)
that takes a single
String
, but this method assumes the token prefixes given are already
tokenized and normalized.tokenPrefixes
- the token prefixes to search forstrict
- true
if this should not return any false positivescontainsAllPrefixes(String, boolean)
@Nonnull public QueryComponent containsAllPrefixes(@Nonnull List<String> tokenPrefixes, boolean strict, long expectedRecords, double falsePositivePercentage)
containsAllPrefixes(String, boolean, long, double)
that takes a single
String
, but this method assumes the token prefixes given are already
tokenized and normalized.tokenPrefixes
- the token prefixes to search forstrict
- true
if this should not return any false positivesexpectedRecords
- the expected number of records read for each prefixfalsePositivePercentage
- an acceptable percentage of false positives for each token prefixcontainsAllPrefixes(String, boolean, long, double)
@Nonnull public QueryComponent containsPhrase(@Nonnull String phrase)
Boolean.TRUE
if all of the tokens (except stop words)
can be found in the given document in the correct order,
Boolean.FALSE
if any cannot, and null
if the
phrase is empty or contains only stop words or if the field
itself is null
.phrase
- the phrase to search for@Nonnull public QueryComponent containsPhrase(@Nonnull List<String> phraseTokens)
containsPhrase(String)
except that the token list is assumed to
have already been tokenized with an appropriate tokenizer. No further
sanitization or normalization is performed on the tokens before searching
for them in the text. It is assumed that the order of the tokens in the
list is the same as the order of the tokens in the original phrase and
that there are no gaps (except as indicated by including the empty string to indicate
that there was a stop word in the original phrase).phraseTokens
- the tokens to search for in the order they appear in the phrase@Nonnull public QueryComponent containsAny(@Nonnull String tokens)
Boolean.TRUE
if any of the tokens (not counting stop words) are present,
Boolean.FALSE
if all of them are not, and null
if either the field is null
or if the token list contains
only stop words or is empty.tokens
- the tokens to search for@Nonnull public QueryComponent containsAny(@Nonnull List<String> tokens)
containsAny(String)
, except that the token list
is assumed to have already been tokenized with an appropriate
tokenizer. No further sanitization or normalization is performed
on the tokens before searching for them in the text.tokens
- the tokens to search for@Nonnull public QueryComponent containsAnyPrefix(@Nonnull String tokenPrefixes)
tokenPrefixes
- the token prefixes to search for@Nonnull public QueryComponent containsAnyPrefix(@Nonnull List<String> tokenPrefixes)
containsAnyPrefix(String)
that takes a single String
except that it assumes the token
prefix list has already been tokenized and normalized.tokenPrefixes
- the token prefixes to search forcontainsAnyPrefix(String)