XAnalyzingSuggester (core 5.4.0 API)

java.lang.Object
- org.apache.lucene.search.suggest.Lookup
- - org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester

All Implemented Interfaces:

org.apache.lucene.util.Accountable

Direct Known Subclasses:

XFuzzySuggester
```
public class XAnalyzingSuggester
extends org.apache.lucene.search.suggest.Lookup
```
Suggester that first analyzes the surface form, adds the analyzed form to a weighted FST, and then does the same thing at lookup time. This means lookup is based on the analyzed form while suggestions are still the surface form(s).
This can result in powerful suggester functionality. For example, if you use an analyzer removing stop words, then the partial text "ghost chr..." could see the suggestion "The Ghost of Christmas Past". Note that position increments MUST NOT be preserved for this example to work, so you should call the constructor with preservePositionIncrements parameter set to false
If SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.
When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.
There are some limitations:
- A lookup from a query like "net" in English won't be any different than "net " (ie, user added a trailing space) because analyzers don't reflect when they've seen a token separator and when they haven't.
- If you're using StopFilter, and the user will type "fast apple", but so far all they've typed is "fast a", again because the analyzer doesn't convey whether it's seen a token separator after the "a", StopFilter will remove that "a" causing far more matches than you'd expect.
- Lookups with the empty string return no results instead of all results.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class XAnalyzingSuggester.XBuilder
- Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
  org.apache.lucene.search.suggest.Lookup.LookupPriorityQueue, org.apache.lucene.search.suggest.Lookup.LookupResult

Nested Classes
Modifier and Type	Class and Description
`static class`	`XAnalyzingSuggester.XBuilder`

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`END_BYTE` Marks end of the analyzed input and start of dedup byte.
`static int`	`EXACT_FIRST` Include this flag in the options parameter to `#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)` to always return the exact match first, regardless of score.
`static int`	`HOLE_CHARACTER`
`static int`	`PAYLOAD_SEP`
`static int`	`PRESERVE_SEP` Include this flag in the options parameter to `#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int)` to preserve token separators when matching.
`static int`	`SEP_LABEL` Represents the separation between tokens, if PRESERVE_SEP was specified

Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR

Constructor Summary

Constructors
Constructor and Description
`XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)` Calls `#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST \| PRESERVE_SEP, 256, -1)`
`XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.analysis.Analyzer queryAnalyzer)` Calls `#XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST \| PRESERVE_SEP, 256, -1)`
XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer, org.apache.lucene.util.automaton.Automaton queryPrefix, org.apache.lucene.analysis.Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst, boolean hasPayloads, int maxAnalyzedPathsForOneInput, int sepLabel, int payloadSep, int endByte, int holeCharacter) Creates a new suggester.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`build(org.apache.lucene.search.suggest.InputIterator iterator)`
`protected org.apache.lucene.util.automaton.Automaton`	`convertAutomaton(org.apache.lucene.util.automaton.Automaton a)`
`static int`	`decodeWeight(long encoded)` cost -> weight
`static int`	`encodeWeight(long value)` weight -> cost
`java.lang.Object`	`get(java.lang.CharSequence key)` Returns the weight associated with an input string, or null if it does not exist.
`long`	`getCount()`
`protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>>`	`getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths, org.apache.lucene.util.automaton.Automaton lookupAutomaton, org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst)` Returns all completion paths to initialize the search.
`int`	`getMaxAnalyzedPathsForOneInput()`
`protected static org.apache.lucene.store.FSDirectory`	`getTempDir()`
`org.apache.lucene.analysis.TokenStreamToAutomaton`	`getTokenStreamToAutomaton()`
`boolean`	`load(org.apache.lucene.store.DataInput input)`
`boolean`	`load(java.io.InputStream input)`
`java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult>`	`lookup(java.lang.CharSequence key, java.util.Set<org.apache.lucene.util.BytesRef> contexts, boolean onlyMorePopular, int num)`
`long`	`ramBytesUsed()` Returns byte size of the underlying FST.
`boolean`	`store(org.apache.lucene.store.DataOutput output)`
`boolean`	`store(java.io.OutputStream output)`
`java.util.Set<org.apache.lucene.util.IntsRef>`	`toFiniteStrings(org.apache.lucene.analysis.TokenStream stream)`

Methods inherited from class org.apache.lucene.search.suggest.Lookup
build, lookup, lookup

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources

Field Detail
- EXACT_FIRST
```
public static final int EXACT_FIRST
```
  Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to always return the exact match first, regardless of score. This has no performance impact but could result in low-quality suggestions.
  
  See Also:
  
  Constant Field Values
- PRESERVE_SEP
```
public static final int PRESERVE_SEP
```
  Include this flag in the options parameter to #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) to preserve token separators when matching.
  
  See Also:
  
  Constant Field Values
- SEP_LABEL
```
public static final int SEP_LABEL
```
  Represents the separation between tokens, if PRESERVE_SEP was specified
  
  See Also:
  
  Constant Field Values
- END_BYTE
```
public static final int END_BYTE
```
  Marks end of the analyzed input and start of dedup byte.
  
  See Also:
  
  Constant Field Values
- PAYLOAD_SEP
```
public static final int PAYLOAD_SEP
```
  See Also:
  
  Constant Field Values
- HOLE_CHARACTER
```
public static final int HOLE_CHARACTER
```
  See Also:
  
  Constant Field Values

Constructor Detail
- XAnalyzingSuggester
```
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer analyzer)
```
  Calls #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
  
  Parameters:
  
  analyzer - Analyzer that will be used for analyzing suggestions while building the index.
- XAnalyzingSuggester
```
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
                           org.apache.lucene.analysis.Analyzer queryAnalyzer)
```
  Calls #XAnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean,FST,boolean,int,int,int,int,int) AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1)
  
  Parameters:
  
  indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.
  
  queryAnalyzer - Analyzer that will be used for analyzing query text during lookup
- XAnalyzingSuggester
```
public XAnalyzingSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
                           org.apache.lucene.util.automaton.Automaton queryPrefix,
                           org.apache.lucene.analysis.Analyzer queryAnalyzer,
                           int options,
                           int maxSurfaceFormsPerAnalyzedForm,
                           int maxGraphExpansions,
                           boolean preservePositionIncrements,
                           org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst,
                           boolean hasPayloads,
                           int maxAnalyzedPathsForOneInput,
                           int sepLabel,
                           int payloadSep,
                           int endByte,
                           int holeCharacter)
```
  Creates a new suggester.
  
  Parameters:
  
  indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.
  
  queryAnalyzer - Analyzer that will be used for analyzing query text during lookup
  
  options - see EXACT_FIRST, PRESERVE_SEP
  
  maxSurfaceFormsPerAnalyzedForm - Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.
  
  maxGraphExpansions - Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.

Method Detail

ramBytesUsed
```
public long ramBytesUsed()
```
Returns byte size of the underlying FST.

getMaxAnalyzedPathsForOneInput

public int getMaxAnalyzedPathsForOneInput()

convertAutomaton

protected org.apache.lucene.util.automaton.Automaton convertAutomaton(org.apache.lucene.util.automaton.Automaton a)

getTokenStreamToAutomaton

public org.apache.lucene.analysis.TokenStreamToAutomaton getTokenStreamToAutomaton()

getTempDir

protected static org.apache.lucene.store.FSDirectory getTempDir()

build

public void build(org.apache.lucene.search.suggest.InputIterator iterator)
           throws java.io.IOException

Specified by:: build in class org.apache.lucene.search.suggest.Lookup
Throws:: java.io.IOException

store

public boolean store(java.io.OutputStream output)
              throws java.io.IOException

Overrides:: store in class org.apache.lucene.search.suggest.Lookup
Throws:: java.io.IOException

getCount
```
public long getCount()
```
Specified by:

getCount in class org.apache.lucene.search.suggest.Lookup

load
```
public boolean load(java.io.InputStream input)
             throws java.io.IOException
```
Overrides:

load in class org.apache.lucene.search.suggest.Lookup

Throws:

java.io.IOException

lookup

public java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(java.lang.CharSequence key,
                                                                                   java.util.Set<org.apache.lucene.util.BytesRef> contexts,
                                                                                   boolean onlyMorePopular,
                                                                                   int num)

Specified by:: lookup in class org.apache.lucene.search.suggest.Lookup

store

public boolean store(org.apache.lucene.store.DataOutput output)
              throws java.io.IOException

Specified by:: store in class org.apache.lucene.search.suggest.Lookup
Throws:: java.io.IOException

load

public boolean load(org.apache.lucene.store.DataInput input)
             throws java.io.IOException

Specified by:: load in class org.apache.lucene.search.suggest.Lookup
Throws:: java.io.IOException

getFullPrefixPaths

protected java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> getFullPrefixPaths(java.util.List<org.apache.lucene.search.suggest.analyzing.FSTUtil.Path<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>>> prefixPaths,
                                                                                                                                                                                                  org.apache.lucene.util.automaton.Automaton lookupAutomaton,
                                                                                                                                                                                                  org.apache.lucene.util.fst.FST<org.apache.lucene.util.fst.PairOutputs.Pair<java.lang.Long,org.apache.lucene.util.BytesRef>> fst)
                                                                                                                                                                                           throws java.io.IOException

Returns all completion paths to initialize the search.

Throws:: java.io.IOException

toFiniteStrings

public java.util.Set<org.apache.lucene.util.IntsRef> toFiniteStrings(org.apache.lucene.analysis.TokenStream stream)
                                                              throws java.io.IOException

Throws:: java.io.IOException

get
```
public java.lang.Object get(java.lang.CharSequence key)
```
Returns the weight associated with an input string, or null if it does not exist. Unsupported in this implementation (and will throw an UnsupportedOperationException).

Parameters:

key - input string

Returns:

the weight associated with the input string, or null if it does not exist.

decodeWeight
```
public static int decodeWeight(long encoded)
```
cost -> weight

Parameters:

encoded - Cost

Returns:

Weight

encodeWeight
```
public static int encodeWeight(long value)
```
weight -> cost

Parameters:

value - Weight

Returns:

Cost

Class XAnalyzingSuggester

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup

Field Summary

Fields inherited from class org.apache.lucene.search.suggest.Lookup

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.search.suggest.Lookup

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.lucene.util.Accountable

Field Detail

EXACT_FIRST

PRESERVE_SEP

SEP_LABEL

END_BYTE

PAYLOAD_SEP

HOLE_CHARACTER

Constructor Detail

XAnalyzingSuggester

XAnalyzingSuggester

XAnalyzingSuggester

Method Detail

ramBytesUsed

getMaxAnalyzedPathsForOneInput

convertAutomaton

getTokenStreamToAutomaton

getTempDir

build

store

getCount

load

lookup

store

load

getFullPrefixPaths

toFiniteStrings

get

decodeWeight

encodeWeight