Class XAnalyzingSuggester
- java.lang.Object
-
- org.apache.lucene.search.suggest.Lookup
-
- org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester
-
- All Implemented Interfaces:
org.apache.lucene.util.Accountable
- Direct Known Subclasses:
XFuzzySuggester
public class XAnalyzingSuggester extends org.apache.lucene.search.suggest.Lookup
Suggester that first analyzes the surface form, adds the analyzed form to a weighted FST, and then does the same thing at lookup time. This means lookup is based on the analyzed form while suggestions are still the surface form(s).This can result in powerful suggester functionality. For example, if you use an analyzer removing stop words, then the partial text "ghost chr..." could see the suggestion "The Ghost of Christmas Past". Note that position increments MUST NOT be preserved for this example to work, so you should call the constructor with
preservePositionIncrements
parameter set to falseIf SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.
When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.
There are some limitations:
- A lookup from a query like "net" in English won't be any different than "net " (ie, user added a trailing space) because analyzers don't reflect when they've seen a token separator and when they haven't.
- If you're using
StopFilter
, and the user will type "fast apple", but so far all they've typed is "fast a", again because the analyzer doesn't convey whether it's seen a token separator after the "a",StopFilter
will remove that "a" causing far more matches than you'd expect. - Lookups with the empty string return no results instead of all results.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
XAnalyzingSuggester.XBuilder
-
Field Summary
Fields Modifier and Type Field Description static int
END_BYTE
Marks end of the analyzed input and start of dedup byte.static int
EXACT_FIRST
Include this flag in the options parameter to#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
to always return the exact match first, regardless of score.static int
HOLE_CHARACTER
static int
PAYLOAD_SEP
static int
PRESERVE_SEP
Include this flag in the options parameter to#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
to preserve token separators when matching.static int
SEP_LABEL
Represents the separation between tokens, if PRESERVE_SEP was specified
-
Constructor Summary
Constructors Constructor Description XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)
Calls#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)
Calls#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter)
Creates a new suggester.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
build(org.apache.lucene.search.suggest.InputIterator iterator)
protected org.apache.lucene.util.automaton.Automaton
convertAutomaton(org.apache.lucene.util.automaton.Automaton a)
static int
decodeWeight(long encoded)
cost -> weightstatic int
encodeWeight(long value)
weight -> costjava.lang.Object
get(java.lang.CharSequence key)
Returns the weight associated with an input string, or null if it does not exist.long
getCount()
protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>>
getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst)
Returns all completion paths to initialize the search.int
getMaxAnalyzedPathsForOneInput()
protected static org.apache.lucene.store.FSDirectory
getTempDir()
org.apache.lucene.analysis.TokenStreamToAutomaton
getTokenStreamToAutomaton()
boolean
load(java.io.InputStream input)
boolean
load(org.apache.lucene.store.DataInput input)
java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult>
lookup(java.lang.CharSequence key, java.util.Set<org.apache.lucene.util.BytesRef> contexts, boolean onlyMorePopular, int num)
long
ramBytesUsed()
Returns byte size of the underlying FST.boolean
store(java.io.OutputStream output)
boolean
store(org.apache.lucene.store.DataOutput output)
java.util.Set<org.apache.lucene.util.IntsRef>
toFiniteStrings(org.apache.lucene.analysis.TokenStream stream)
-
-
-
Field Detail
-
EXACT_FIRST
public static final int EXACT_FIRST
Include this flag in the options parameter to#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
to always return the exact match first, regardless of score. This has no performance impact but could result in low-quality suggestions.- See Also:
- Constant Field Values
-
PRESERVE_SEP
public static final int PRESERVE_SEP
Include this flag in the options parameter to#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
to preserve token separators when matching.- See Also:
- Constant Field Values
-
SEP_LABEL
public static final int SEP_LABEL
Represents the separation between tokens, if PRESERVE_SEP was specified- See Also:
- Constant Field Values
-
END_BYTE
public static final int END_BYTE
Marks end of the analyzed input and start of dedup byte.- See Also:
- Constant Field Values
-
PAYLOAD_SEP
public static final int PAYLOAD_SEP
- See Also:
- Constant Field Values
-
HOLE_CHARACTER
public static final int HOLE_CHARACTER
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
XAnalyzingSuggester
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)
Calls#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
- Parameters:
analyzer
- Analyzer that will be used for analyzing suggestions while building the index.
-
XAnalyzingSuggester
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)
Calls#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
- Parameters:
indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookup
-
XAnalyzingSuggester
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter)
Creates a new suggester.- Parameters:
indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookupoptions
- seeEXACT_FIRST
,PRESERVE_SEP
maxSurfaceFormsPerAnalyzedForm
- Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.maxGraphExpansions
- Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.
-
-
Method Detail
-
ramBytesUsed
public long ramBytesUsed()
Returns byte size of the underlying FST.
-
getMaxAnalyzedPathsForOneInput
public int getMaxAnalyzedPathsForOneInput()
-
convertAutomaton
protected org.apache.lucene.util.automaton.Automaton convertAutomaton(org.apache.lucene.util.automaton.Automaton a)
-
getTokenStreamToAutomaton
public org.apache.lucene.analysis.TokenStreamToAutomaton getTokenStreamToAutomaton()
-
getTempDir
protected static org.apache.lucene.store.FSDirectory getTempDir()
-
build
public void build(org.apache.lucene.search.suggest.InputIterator iterator) throws java.io.IOException
- Specified by:
build
in classorg.apache.lucene.search.suggest.Lookup
- Throws:
java.io.IOException
-
store
public boolean store(java.io.OutputStream output) throws java.io.IOException
- Overrides:
store
in classorg.apache.lucene.search.suggest.Lookup
- Throws:
java.io.IOException
-
getCount
public long getCount()
- Specified by:
getCount
in classorg.apache.lucene.search.suggest.Lookup
-
load
public boolean load(java.io.InputStream input) throws java.io.IOException
- Overrides:
load
in classorg.apache.lucene.search.suggest.Lookup
- Throws:
java.io.IOException
-
lookup
public java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Set<org.apache.lucene.util.BytesRef> contexts, boolean onlyMorePopular, int num)
- Specified by:
lookup
in classorg.apache.lucene.search.suggest.Lookup
-
store
public boolean store(org.apache.lucene.store.DataOutput output) throws java.io.IOException
- Specified by:
store
in classorg.apache.lucene.search.suggest.Lookup
- Throws:
java.io.IOException
-
load
public boolean load(org.apache.lucene.store.DataInput input) throws java.io.IOException
- Specified by:
load
in classorg.apache.lucene.search.suggest.Lookup
- Throws:
java.io.IOException
-
getFullPrefixPaths
protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst) throws java.io.IOException
Returns all completion paths to initialize the search.- Throws:
java.io.IOException
-
toFiniteStrings
public java.util.Set<org.apache.lucene.util.IntsRef> toFiniteStrings(org.apache.lucene.analysis.TokenStream stream) throws java.io.IOException
- Throws:
java.io.IOException
-
get
public java.lang.Object get(java.lang.CharSequence key)
Returns the weight associated with an input string, or null if it does not exist. Unsupported in this implementation (and will throw anUnsupportedOperationException
).- Parameters:
key
- input string- Returns:
- the weight associated with the input string, or
null
if it does not exist.
-
decodeWeight
public static int decodeWeight(long encoded)
cost -> weight- Parameters:
encoded
- Cost- Returns:
- Weight
-
encodeWeight
public static int encodeWeight(long value)
weight -> cost- Parameters:
value
- Weight- Returns:
- Cost
-
-