Class WordBreakSpellChecker
- java.lang.Object
-
- org.apache.lucene.search.spell.WordBreakSpellChecker
-
public class WordBreakSpellChecker extends java.lang.Object
A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
WordBreakSpellChecker.BreakSuggestionSortMethod
Determines the order to list word break suggestions
-
Field Summary
Fields Modifier and Type Field Description static Term
SEPARATOR_TERM
Term that can be used to prohibit adjacent terms from being combined
-
Constructor Summary
Constructors Constructor Description WordBreakSpellChecker()
Creates a new spellchecker with default configuration values
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getMaxChanges()
Returns the maximum number of changes to perform on the inputint
getMaxCombineWordLength()
Returns the maximum length of a combined suggestionint
getMaxEvaluations()
Returns the maximum number of word combinations to evaluate.int
getMinBreakWordLength()
Returns the minimum size of a broken wordint
getMinSuggestionFrequency()
Returns the minimum frequency a term must have to be part of a suggestion.void
setMaxChanges(int maxChanges)
The maximum numbers of changes (word breaks or combinations) to make on the original term(s).void
setMaxCombineWordLength(int maxCombineWordLength)
The maximum length of a suggestion made by combining 1 or more original terms.void
setMaxEvaluations(int maxEvaluations)
The maximum number of word combinations to evaluate.void
setMinBreakWordLength(int minBreakWordLength)
The minimum length to break words down to.void
setMinSuggestionFrequency(int minSuggestionFrequency)
The minimum frequency a term must have to be included as part of a suggestion.SuggestWord[][]
suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod)
Generate suggestions by breaking the passed-in term into multiple words.CombineSuggestion[]
suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode)
Generate suggestions by combining one or more of the passed-in terms into single words.
-
-
-
Field Detail
-
SEPARATOR_TERM
public static final Term SEPARATOR_TERM
Term that can be used to prohibit adjacent terms from being combined
-
-
Method Detail
-
suggestWordBreaks
public SuggestWord[][] suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) throws java.io.IOException
Generate suggestions by breaking the passed-in term into multiple words. The scores returned are equal to the number of word breaks needed so a lower score is generally preferred over a higher score.
- Parameters:
suggestMode
- - default =SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX
sortMethod
- - default =WordBreakSpellChecker.BreakSuggestionSortMethod.NUM_CHANGES_THEN_MAX_FREQUENCY
- Returns:
- one or more arrays of words formed by breaking up the original term
- Throws:
java.io.IOException
- If there is a low-level I/O error.
-
suggestWordCombinations
public CombineSuggestion[] suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode) throws java.io.IOException
Generate suggestions by combining one or more of the passed-in terms into single words. The returned
CombineSuggestion
contains both aSuggestWord
and also an array detailing which passed-in terms were involved in creating this combination. The scores returned are equal to the number of word combinations needed, also one less than the length of the arrayCombineSuggestion.originalTermIndexes
. Generally, a suggestion with a lower score is preferred over a higher score.To prevent two adjacent terms from being combined (for instance, if one is mandatory and the other is prohibited), separate the two terms with
SEPARATOR_TERM
When suggestMode equals
SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX
, each suggestion will include at least one term not in the index.When suggestMode equals
SuggestMode.SUGGEST_MORE_POPULAR
, each suggestion will have the same, or better frequency than the most-popular included term.- Returns:
- an array of words generated by combining original terms
- Throws:
java.io.IOException
- If there is a low-level I/O error.
-
getMinSuggestionFrequency
public int getMinSuggestionFrequency()
Returns the minimum frequency a term must have to be part of a suggestion.- See Also:
setMinSuggestionFrequency(int)
-
getMaxCombineWordLength
public int getMaxCombineWordLength()
Returns the maximum length of a combined suggestion- See Also:
setMaxCombineWordLength(int)
-
getMinBreakWordLength
public int getMinBreakWordLength()
Returns the minimum size of a broken word- See Also:
setMinBreakWordLength(int)
-
getMaxChanges
public int getMaxChanges()
Returns the maximum number of changes to perform on the input- See Also:
setMaxChanges(int)
-
getMaxEvaluations
public int getMaxEvaluations()
Returns the maximum number of word combinations to evaluate.- See Also:
setMaxEvaluations(int)
-
setMinSuggestionFrequency
public void setMinSuggestionFrequency(int minSuggestionFrequency)
The minimum frequency a term must have to be included as part of a suggestion. Default=1 Not applicable when used with
SuggestMode.SUGGEST_MORE_POPULAR
- See Also:
getMinSuggestionFrequency()
-
setMaxCombineWordLength
public void setMaxCombineWordLength(int maxCombineWordLength)
The maximum length of a suggestion made by combining 1 or more original terms. Default=20
- See Also:
getMaxCombineWordLength()
-
setMinBreakWordLength
public void setMinBreakWordLength(int minBreakWordLength)
The minimum length to break words down to. Default=1
- See Also:
getMinBreakWordLength()
-
setMaxChanges
public void setMaxChanges(int maxChanges)
The maximum numbers of changes (word breaks or combinations) to make on the original term(s). Default=1
- See Also:
getMaxChanges()
-
setMaxEvaluations
public void setMaxEvaluations(int maxEvaluations)
The maximum number of word combinations to evaluate. Default=1000. A higher value might improve result quality. A lower value might improve performance.
- See Also:
getMaxEvaluations()
-
-