public class XAnalyzingSuggester
extends org.apache.lucene.search.suggest.Lookup
This can result in powerful suggester functionality. For
example, if you use an analyzer removing stop words,
then the partial text "ghost chr..." could see the
suggestion "The Ghost of Christmas Past". Note that
position increments MUST NOT be preserved for this example
to work, so you should call the constructor with
preservePositionIncrements
parameter set to
false
If SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.
When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.
There are some limitations:
StopFilter
, and the user will
type "fast apple", but so far all they've typed is
"fast a", again because the analyzer doesn't convey whether
it's seen a token separator after the "a",
StopFilter
will remove that "a" causing
far more matches than you'd expect.
Modifier and Type | Class and Description |
---|---|
static class |
XAnalyzingSuggester.XBuilder |
Modifier and Type | Field and Description |
---|---|
static int |
END_BYTE
Marks end of the analyzed input and start of dedup
byte.
|
static int |
EXACT_FIRST
Include this flag in the options parameter to
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to always
return the exact match first, regardless of score. |
static int |
HOLE_CHARACTER |
static int |
PAYLOAD_SEP |
static int |
PRESERVE_SEP
Include this flag in the options parameter to
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to preserve
token separators when matching. |
static int |
SEP_LABEL
Represents the separation between tokens, if
PRESERVE_SEP was specified
|
Constructor and Description |
---|
XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)
Calls
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST |
PRESERVE_SEP, 256, -1) |
XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer)
Calls
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST |
PRESERVE_SEP, 256, -1) |
XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.util.automaton.Automaton queryPrefix,
org.apache.lucene.analysis.Analyzer queryAnalyzer,
int options,
int maxSurfaceFormsPerAnalyzedForm,
int maxGraphExpansions,
boolean preservePositionIncrements,
org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst,
boolean hasPayloads,
int maxAnalyzedPathsForOneInput,
int sepLabel,
int payloadSep,
int endByte,
int holeCharacter)
Creates a new suggester.
|
Modifier and Type | Method and Description |
---|---|
void |
build(org.apache.lucene.search.suggest.InputIterator iterator) |
protected org.apache.lucene.util.automaton.Automaton |
convertAutomaton(org.apache.lucene.util.automaton.Automaton a) |
static int |
decodeWeight(long encoded)
cost -> weight
|
static int |
encodeWeight(long value)
weight -> cost
|
java.lang.Object |
get(java.lang.CharSequence key)
Returns the weight associated with an input string, or null if it does not exist.
|
long |
getCount() |
protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> |
getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths,
org.apache.lucene.util.automaton.Automaton lookupAutomaton,
org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst)
Returns all completion paths to initialize the search.
|
int |
getMaxAnalyzedPathsForOneInput() |
protected static org.apache.lucene.store.FSDirectory |
getTempDir() |
org.apache.lucene.analysis.TokenStreamToAutomaton |
getTokenStreamToAutomaton() |
boolean |
load(org.apache.lucene.store.DataInput input) |
boolean |
load(java.io.InputStream input) |
java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> |
lookup(java.lang.CharSequence key,
java.util.Set<org.apache.lucene.util.BytesRef> contexts,
boolean onlyMorePopular,
int num) |
long |
ramBytesUsed()
Returns byte size of the underlying FST.
|
boolean |
store(org.apache.lucene.store.DataOutput output) |
boolean |
store(java.io.OutputStream output) |
java.util.Set<org.apache.lucene.util.IntsRef> |
toFiniteStrings(org.apache.lucene.analysis.TokenStream stream) |
public static final int EXACT_FIRST
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
to always
return the exact match first, regardless of score. This
has no performance impact but could result in
low-quality suggestions.public static final int PRESERVE_SEP
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
to preserve
token separators when matching.public static final int SEP_LABEL
public static final int END_BYTE
public static final int PAYLOAD_SEP
public static final int HOLE_CHARACTER
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST |
PRESERVE_SEP, 256, -1)
analyzer
- Analyzer that will be used for analyzing suggestions while building the index.public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)
#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)
AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST |
PRESERVE_SEP, 256, -1)
indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookuppublic XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter)
indexAnalyzer
- Analyzer that will be used for
analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for
analyzing query text during lookupoptions
- see EXACT_FIRST
, PRESERVE_SEP
maxSurfaceFormsPerAnalyzedForm
- Maximum number of
surface forms to keep for a single analyzed form.
When there are too many surface forms we discard the
lowest weighted ones.maxGraphExpansions
- Maximum number of graph paths
to expand from the analyzed form. Set this to -1 for
no limit.public long ramBytesUsed()
public int getMaxAnalyzedPathsForOneInput()
protected org.apache.lucene.util.automaton.Automaton convertAutomaton(org.apache.lucene.util.automaton.Automaton a)
public org.apache.lucene.analysis.TokenStreamToAutomaton getTokenStreamToAutomaton()
protected static org.apache.lucene.store.FSDirectory getTempDir()
public void build(org.apache.lucene.search.suggest.InputIterator iterator) throws java.io.IOException
build
in class org.apache.lucene.search.suggest.Lookup
java.io.IOException
public boolean store(java.io.OutputStream output) throws java.io.IOException
store
in class org.apache.lucene.search.suggest.Lookup
java.io.IOException
public long getCount()
getCount
in class org.apache.lucene.search.suggest.Lookup
public boolean load(java.io.InputStream input) throws java.io.IOException
load
in class org.apache.lucene.search.suggest.Lookup
java.io.IOException
public java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Set<org.apache.lucene.util.BytesRef> contexts, boolean onlyMorePopular, int num)
lookup
in class org.apache.lucene.search.suggest.Lookup
public boolean store(org.apache.lucene.store.DataOutput output) throws java.io.IOException
store
in class org.apache.lucene.search.suggest.Lookup
java.io.IOException
public boolean load(org.apache.lucene.store.DataInput input) throws java.io.IOException
load
in class org.apache.lucene.search.suggest.Lookup
java.io.IOException
protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst) throws java.io.IOException
java.io.IOException
public java.util.Set<org.apache.lucene.util.IntsRef> toFiniteStrings(org.apache.lucene.analysis.TokenStream stream) throws java.io.IOException
java.io.IOException
public java.lang.Object get(java.lang.CharSequence key)
UnsupportedOperationException
).key
- input stringnull
if it does not exist.public static int decodeWeight(long encoded)
encoded
- Costpublic static int encodeWeight(long value)
value
- Weight