Class DictionaryCompoundWordTokenFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
-
- org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class DictionaryCompoundWordTokenFilter extends CompoundWordTokenFilterBase
ATokenFilter
that decomposes compound words found in many Germanic languages."Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.
You must specify the required
Version
compatibility when creating CompoundWordTokenFilterBase:- As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE
-
-
Constructor Summary
Constructors Constructor Description DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary)
Creates a newDictionaryCompoundWordTokenFilter
DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Creates a newDictionaryCompoundWordTokenFilter
-
Method Summary
-
Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
incrementToken, reset
-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Constructor Detail
-
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary)
Creates a newDictionaryCompoundWordTokenFilter
- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input
- theTokenStream
to processdictionary
- the word dictionary to match against.
-
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Creates a newDictionaryCompoundWordTokenFilter
- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input
- theTokenStream
to processdictionary
- the word dictionary to match against.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the stream
-
-