Package org.apache.lucene.analysis.miscellaneous
Miscellaneous TokenStreams
-
Class Summary Class Description ASCIIFoldingFilter This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.ASCIIFoldingFilterFactory Factory forASCIIFoldingFilter
.CapitalizationFilter A filter to apply normal capitalization rules to Tokens.CapitalizationFilterFactory Factory forCapitalizationFilter
.CodepointCountFilter Removes words that are too long or too short from the stream.CodepointCountFilterFactory Factory forCodepointCountFilter
.EmptyTokenStream An always exhausted token stream.HyphenatedWordsFilter When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.HyphenatedWordsFilterFactory Factory forHyphenatedWordsFilter
.KeepWordFilter A TokenFilter that only keeps tokens with text contained in the required words.KeepWordFilterFactory Factory forKeepWordFilter
.KeywordMarkerFilter Marks terms as keywords via theKeywordAttribute
.KeywordMarkerFilterFactory Factory forKeywordMarkerFilter
.KeywordRepeatFilter This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once withKeywordAttribute.setKeyword(boolean)
set totrue
and once set tofalse
.KeywordRepeatFilterFactory Factory forKeywordRepeatFilter
.LengthFilter Removes words that are too long or too short from the stream.LengthFilterFactory Factory forLengthFilter
.LimitTokenCountAnalyzer This Analyzer limits the number of tokens while indexing.LimitTokenCountFilter This TokenFilter limits the number of tokens while indexing.LimitTokenCountFilterFactory Factory forLimitTokenCountFilter
.LimitTokenPositionFilter This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit.LimitTokenPositionFilterFactory Factory forLimitTokenPositionFilter
.PatternAnalyzer Deprecated. (4.0) use the pattern-based analysis in the analysis/pattern package instead.PatternKeywordMarkerFilter Marks terms as keywords via theKeywordAttribute
.PerFieldAnalyzerWrapper This analyzer is used to facilitate scenarios where different fields require different analysis techniques.PrefixAndSuffixAwareTokenFilter Links twoPrefixAwareTokenFilter
.PrefixAwareTokenFilter Joins two token streams and leaves the last token of the first stream available to be used when updating the token values in the second stream based on that token.RemoveDuplicatesTokenFilter A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.RemoveDuplicatesTokenFilterFactory Factory forRemoveDuplicatesTokenFilter
.ScandinavianFoldingFilter This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.ScandinavianFoldingFilterFactory Factory forScandinavianFoldingFilter
.ScandinavianNormalizationFilter This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.ScandinavianNormalizationFilterFactory Factory forScandinavianNormalizationFilter
.SetKeywordMarkerFilter Marks terms as keywords via theKeywordAttribute
.SingleTokenTokenStream ATokenStream
containing a single token.StemmerOverrideFilter Provides the ability to override anyKeywordAttribute
aware stemmer with custom dictionary-based stemming.StemmerOverrideFilter.Builder This builder builds anFST
for theStemmerOverrideFilter
StemmerOverrideFilter.StemmerOverrideMap A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups forStemmerOverrideFilter
StemmerOverrideFilterFactory Factory forStemmerOverrideFilter
.TrimFilter Trims leading and trailing whitespace from Tokens in the stream.TrimFilterFactory Factory forTrimFilter
.WordDelimiterFilter Splits words into subwords and performs optional transformations on subword groups.WordDelimiterFilterFactory Factory forWordDelimiterFilter
.WordDelimiterIterator A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterFilter rules.