Class XFuzzySuggester
- java.lang.Object
-
- org.apache.lucene.search.suggest.Lookup
-
- org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester
-
- org.apache.lucene.search.suggest.analyzing.XFuzzySuggester
-
- All Implemented Interfaces:
org.apache.lucene.util.Accountable
public final class XFuzzySuggester extends XAnalyzingSuggester
Implements a fuzzyAnalyzingSuggester
. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passingfalse
for thetranspositions
parameter.At most, this query will match terms up to 2 edits. Higher distances are not supported. Note that the fuzzy distance is measured in "byte space" on the bytes returned by the
TokenStream
'sTermToBytesRefAttribute
, usually UTF8. By default the analyzed bytes must be at least 3DEFAULT_MIN_FUZZY_LENGTH
bytes before any edits are considered. Furthermore, the first 1DEFAULT_NON_FUZZY_PREFIX
byte is not allowed to be edited. We allow up to 1 (@link #DEFAULT_MAX_EDITS} edit. IfunicodeAware
parameter in the constructor is set to true, maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix are measured in Unicode code points (actual letters) instead of bytes.*NOTE: This suggester does not boost suggestions that required no edits over suggestions that did require edits. This is a known limitation.
Note: complex query analyzers can have a significant impact on the lookup performance. It's recommended to not use analyzers that drop or inject terms like synonyms to keep the complexity of the prefix intersection low for good lookup performance. At index time, complex analyzers can safely be used.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester
XAnalyzingSuggester.XBuilder
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_EDITS
The default maximum number of edits for fuzzy suggestions.static int
DEFAULT_MIN_FUZZY_LENGTH
The default minimum length of the key passed toXAnalyzingSuggester.lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
before any edits are allowed.static int
DEFAULT_NON_FUZZY_PREFIX
The default prefix length where edits are not allowed.static boolean
DEFAULT_TRANSPOSITIONS
The default transposition value passed toLevenshteinAutomata
static boolean
DEFAULT_UNICODE_AWARE
Measure maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters in Unicode code points (actual letters) instead of bytes.-
Fields inherited from class org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester
END_BYTE, EXACT_FIRST, HOLE_CHARACTER, PAYLOAD_SEP, PRESERVE_SEP, SEP_LABEL
-
-
Constructor Summary
Constructors Constructor Description XFuzzySuggester(org.apache.lucene.analysis.Analyzer analyzer)
Creates aFuzzySuggester
instance initialized with default values.XFuzzySuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)
Creates aFuzzySuggester
instance with an index & a query analyzer initialized with default values.XFuzzySuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, int maxEdits, boolean transpositions, int nonFuzzyPrefix, int minFuzzyLength, boolean unicodeAware, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter)
Creates aFuzzySuggester
instance.
-
Method Summary
Modifier and Type Method Description protected org.apache.lucene.util.automaton.Automaton
convertAutomaton(org.apache.lucene.util.automaton.Automaton a)
protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>>
getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst)
Returns all completion paths to initialize the search.org.apache.lucene.analysis.TokenStreamToAutomaton
getTokenStreamToAutomaton()
-
Methods inherited from class org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester
build, decodeWeight, encodeWeight, get, getCount, getMaxAnalyzedPathsForOneInput, getTempDir, load, load, lookup, ramBytesUsed, store, store, toFiniteStrings
-
-
-
-
Field Detail
-
DEFAULT_UNICODE_AWARE
public static final boolean DEFAULT_UNICODE_AWARE
Measure maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters in Unicode code points (actual letters) instead of bytes.- See Also:
- Constant Field Values
-
DEFAULT_MIN_FUZZY_LENGTH
public static final int DEFAULT_MIN_FUZZY_LENGTH
The default minimum length of the key passed toXAnalyzingSuggester.lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
before any edits are allowed.- See Also:
- Constant Field Values
-
DEFAULT_NON_FUZZY_PREFIX
public static final int DEFAULT_NON_FUZZY_PREFIX
The default prefix length where edits are not allowed.- See Also:
- Constant Field Values
-
DEFAULT_MAX_EDITS
public static final int DEFAULT_MAX_EDITS
The default maximum number of edits for fuzzy suggestions.- See Also:
- Constant Field Values
-
DEFAULT_TRANSPOSITIONS
public static final boolean DEFAULT_TRANSPOSITIONS
The default transposition value passed toLevenshteinAutomata
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
XFuzzySuggester
public XFuzzySuggester(org.apache.lucene.analysis.Analyzer analyzer)
Creates aFuzzySuggester
instance initialized with default values.- Parameters:
analyzer
- the analyzer used for this suggester
-
XFuzzySuggester
public XFuzzySuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)
Creates aFuzzySuggester
instance with an index & a query analyzer initialized with default values.- Parameters:
indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookup
-
XFuzzySuggester
public XFuzzySuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, int maxEdits, boolean transpositions, int nonFuzzyPrefix, int minFuzzyLength, boolean unicodeAware, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter)
Creates aFuzzySuggester
instance.- Parameters:
indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookupoptions
- seeXAnalyzingSuggester.EXACT_FIRST
,XAnalyzingSuggester.PRESERVE_SEP
maxSurfaceFormsPerAnalyzedForm
- Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.maxGraphExpansions
- Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.maxEdits
- must be >= 0 and <=LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE
.transpositions
-true
if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.nonFuzzyPrefix
- length of common (non-fuzzy) prefix (see defaultDEFAULT_NON_FUZZY_PREFIX
minFuzzyLength
- minimum length of lookup key before any edits are allowed (see defaultDEFAULT_MIN_FUZZY_LENGTH
)sepLabel
- separation labelpayloadSep
- payload separator byteendByte
- end byte marker byte
-
-
Method Detail
-
getFullPrefixPaths
protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst) throws java.io.IOException
Description copied from class:XAnalyzingSuggester
Returns all completion paths to initialize the search.- Overrides:
getFullPrefixPaths
in classXAnalyzingSuggester
- Throws:
java.io.IOException
-
convertAutomaton
protected org.apache.lucene.util.automaton.Automaton convertAutomaton(org.apache.lucene.util.automaton.Automaton a)
- Overrides:
convertAutomaton
in classXAnalyzingSuggester
-
getTokenStreamToAutomaton
public org.apache.lucene.analysis.TokenStreamToAutomaton getTokenStreamToAutomaton()
- Overrides:
getTokenStreamToAutomaton
in classXAnalyzingSuggester
-
-