Index
All Classes and Interfaces|All Packages|Constant Field Values
A
- accept() - Method in class org.codelibs.analysis.en.ReloadableStopFilter
- accept() - Method in class org.codelibs.analysis.ja.CharTypeFilter
- accept() - Method in class org.codelibs.analysis.StopTokenFilter
- accept(String, String) - Method in class org.codelibs.analysis.ja.StopTokenPrefixFilter
- accept(String, String) - Method in class org.codelibs.analysis.ja.StopTokenSuffixFilter
- accept(String, String) - Method in class org.codelibs.analysis.StopTokenFilter
-
Determines whether the given text should be accepted based on a comparison with a stop word.
- add(char) - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Add a character to the word being stemmed.
- advance() - Method in class org.codelibs.analysis.ja.KanjiNumberFilter.NumberBuffer
-
Advances the position index by one.
- ALPHANUM - Static variable in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Token type constant for alphanumeric tokens
- AlphaNumWordFilter - Class in org.codelibs.analysis.en
-
Token filter that concatenates adjacent alphanumeric and numeric tokens.
- AlphaNumWordFilter(TokenStream) - Constructor for class org.codelibs.analysis.en.AlphaNumWordFilter
-
Creates a new AlphaNumWordFilter.
B
- BufferedCharFilter - Class in org.codelibs.analysis
-
Abstract base class for character filters that buffer input before processing.
- BufferedCharFilter(Reader) - Constructor for class org.codelibs.analysis.BufferedCharFilter
-
Creates a new BufferedCharFilter.
- bufferedInput - Variable in class org.codelibs.analysis.BufferedCharFilter
-
The reader containing the processed buffered input
C
- charAt(int) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter.NumberBuffer
-
Returns the character at the specified index.
- CharTypeFilter - Class in org.codelibs.analysis.ja
-
Token filter that accepts tokens based on character type criteria.
- CharTypeFilter(TokenStream, boolean, boolean, boolean) - Constructor for class org.codelibs.analysis.ja.CharTypeFilter
-
Creates a new CharTypeFilter.
- concatenateTerms(AttributeSource.State) - Method in class org.codelibs.analysis.ConcatenationFilter
-
Concatenates the current token with the previous token.
- ConcatenationFilter - Class in org.codelibs.analysis
-
Abstract base class for token filters that concatenate adjacent tokens.
- ConcatenationFilter(TokenStream) - Constructor for class org.codelibs.analysis.ConcatenationFilter
-
Creates a new ConcatenationFilter.
- current - Variable in class org.codelibs.analysis.ConcatenationFilter
-
State for storing lookahead tokens
- current - Variable in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Current state of the token stream for lookahead processing
D
- DEFAULT_MAX_TOKEN_LENGTH - Static variable in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Default maximum token length (255 characters)
F
- FlexiblePorterStemFilter - Class in org.codelibs.analysis.en
-
Token filter that applies the Porter stemming algorithm with configurable steps.
- FlexiblePorterStemFilter(TokenStream, boolean, boolean, boolean, boolean, boolean, boolean) - Constructor for class org.codelibs.analysis.en.FlexiblePorterStemFilter
-
Creates a new FlexiblePorterStemFilter with configurable stemming steps.
- FlexiblePorterStemmer - Class in org.codelibs.analysis.en
-
Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form.
- FlexiblePorterStemmer() - Constructor for class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Creates a new FlexiblePorterStemmer with all steps enabled.
- FlexiblePorterStemmer(boolean, boolean, boolean, boolean, boolean, boolean) - Constructor for class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Creates a new FlexiblePorterStemmer with configurable steps.
G
- get() - Method in interface org.codelibs.analysis.ja.PosConcatenationFilter.PartOfSpeechSupplier
-
Retrieves the part-of-speech tag for the current token.
- getMaxTokenLength() - Method in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Gets the maximum token length for concatenated tokens.
- getResultBuffer() - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Returns a reference to a character buffer containing the results of the stemming process.
- getResultLength() - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Returns the length of the word resulting from the stemming process.
I
- ignoreCase - Variable in class org.codelibs.analysis.StopTokenFilter
-
Whether to ignore case when matching stop words
- incrementToken() - Method in class org.codelibs.analysis.ConcatenationFilter
- incrementToken() - Method in class org.codelibs.analysis.en.AlphaNumWordFilter
- incrementToken() - Method in class org.codelibs.analysis.en.FlexiblePorterStemFilter
- incrementToken() - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
- isArabicNumeral(char) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Arabic numeral predicate.
- isConcatenated() - Method in class org.codelibs.analysis.ConcatenationFilter
-
Determines if the next token should be concatenated with the current token.
- isConcatenated() - Method in class org.codelibs.analysis.ja.NumberConcatenationFilter
- isConcatenated() - Method in class org.codelibs.analysis.ja.PatternConcatenationFilter
- isConcatenated() - Method in class org.codelibs.analysis.ja.PosConcatenationFilter
- isKeyword() - Method in class org.codelibs.analysis.en.ReloadableKeywordMarkerFilter
- isNumeral(char) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Numeral predicate
- isNumeral(String) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Numeral predicate
- isNumeralPunctuation(char) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Numeral punctuation predicate
- isNumeralPunctuation(String) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Numeral punctuation predicate
- isTarget() - Method in class org.codelibs.analysis.ConcatenationFilter
-
Determines if the current token should be processed for concatenation.
- isTarget() - Method in class org.codelibs.analysis.ja.NumberConcatenationFilter
- isTarget() - Method in class org.codelibs.analysis.ja.PatternConcatenationFilter
- isTarget() - Method in class org.codelibs.analysis.ja.PosConcatenationFilter
- IterationMarkCharFilter - Class in org.codelibs.analysis.ja
-
Character filter that expands Japanese iteration marks (odoriji).
- IterationMarkCharFilter(Reader) - Constructor for class org.codelibs.analysis.ja.IterationMarkCharFilter
-
Creates a new IterationMarkCharFilter.
K
- KanjiNumberFilter - Class in org.codelibs.analysis.ja
-
Normalizes Japanese numbers
- KanjiNumberFilter(TokenStream) - Constructor for class org.codelibs.analysis.ja.KanjiNumberFilter
-
Creates a new KanjiNumberFilter.
- KanjiNumberFilter.NumberBuffer - Class in org.codelibs.analysis.ja
-
Buffer that holds a Japanese number string and a position index used as a parsed-to marker
L
- length() - Method in class org.codelibs.analysis.ja.KanjiNumberFilter.NumberBuffer
-
Returns the length of the buffer.
M
- MAX_TOKEN_LENGTH_LIMIT - Static variable in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Maximum allowed token length limit (1MB)
- maxTokenLength - Variable in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Maximum length for concatenated tokens
N
- normalizedWords - Variable in class org.codelibs.analysis.StopTokenFilter
-
Array of stop words to match against (normalized to lowercase if ignoreCase is true)
- normalizeNumber(String) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Normalizes a Japanese number
- NUM - Static variable in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Token type constant for numeric tokens
- NumberBuffer(String) - Constructor for class org.codelibs.analysis.ja.KanjiNumberFilter.NumberBuffer
-
Creates a new NumberBuffer.
- NumberConcatenationFilter - Class in org.codelibs.analysis.ja
-
A token filter that concatenates tokens containing only numeric characters (digits).
- NumberConcatenationFilter(TokenStream, CharArraySet) - Constructor for class org.codelibs.analysis.ja.NumberConcatenationFilter
-
Constructs a NumberConcatenationFilter with the specified input token stream and word set.
O
- offsetAtt - Variable in class org.codelibs.analysis.ConcatenationFilter
-
The offset attribute for managing token offsets
- org.codelibs.analysis - package org.codelibs.analysis
- org.codelibs.analysis.en - package org.codelibs.analysis.en
- org.codelibs.analysis.ja - package org.codelibs.analysis.ja
P
- parseLargeKanjiNumeral(KanjiNumberFilter.NumberBuffer) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Parse large kanji numerals (ten thousands or larger)
- parseMediumKanjiNumeral(KanjiNumberFilter.NumberBuffer) - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
-
Parse medium kanji numerals (tens, hundreds or thousands)
- PatternConcatenationFilter - Class in org.codelibs.analysis.ja
-
A token filter that uses regular expression patterns to determine token concatenation behavior.
- PatternConcatenationFilter(TokenStream, Pattern, Pattern) - Constructor for class org.codelibs.analysis.ja.PatternConcatenationFilter
-
Constructs a PatternConcatenationFilter with the specified input token stream and patterns.
- PosConcatenationFilter - Class in org.codelibs.analysis.ja
-
A token filter that determines concatenation behavior based on part-of-speech (POS) tags.
- PosConcatenationFilter(TokenStream, Set<String>, PosConcatenationFilter.PartOfSpeechSupplier) - Constructor for class org.codelibs.analysis.ja.PosConcatenationFilter
-
Constructs a PosConcatenationFilter with the specified input token stream, POS tags, and supplier.
- PosConcatenationFilter.PartOfSpeechSupplier - Interface in org.codelibs.analysis.ja
-
Functional interface that supplies part-of-speech (POS) tag information for the current token.
- position() - Method in class org.codelibs.analysis.ja.KanjiNumberFilter.NumberBuffer
-
Returns the current position index.
- processInput(CharSequence) - Method in class org.codelibs.analysis.BufferedCharFilter
-
Processes the buffered input and returns the transformed character sequence.
- processInput(CharSequence) - Method in class org.codelibs.analysis.ja.IterationMarkCharFilter
- processInput(CharSequence) - Method in class org.codelibs.analysis.ja.ProlongedSoundMarkCharFilter
- processToken() - Method in class org.codelibs.analysis.ConcatenationFilter
-
Processes the current token, potentially concatenating it with following tokens.
- ProlongedSoundMarkCharFilter - Class in org.codelibs.analysis.ja
-
A character filter that normalizes various dash and hyphen characters to Japanese prolonged sound marks when they appear after Hiragana, Katakana, or Katakana phonetic extension characters.
- ProlongedSoundMarkCharFilter(Reader) - Constructor for class org.codelibs.analysis.ja.ProlongedSoundMarkCharFilter
-
Constructs a ProlongedSoundMarkCharFilter with the default replacement character (U+30FC).
- ProlongedSoundMarkCharFilter(Reader, char) - Constructor for class org.codelibs.analysis.ja.ProlongedSoundMarkCharFilter
-
Constructs a ProlongedSoundMarkCharFilter with a custom replacement character.
R
- read(char[], int, int) - Method in class org.codelibs.analysis.BufferedCharFilter
- ReloadableKeywordMarkerFilter - Class in org.codelibs.analysis.en
-
A keyword marker filter that can dynamically reload its keyword set from a file.
- ReloadableKeywordMarkerFilter(TokenStream, Path, long) - Constructor for class org.codelibs.analysis.en.ReloadableKeywordMarkerFilter
-
Constructs a ReloadableKeywordMarkerFilter with the specified input stream, keyword file path, and reload interval.
- ReloadableStopFilter - Class in org.codelibs.analysis.en
-
A stop word filter that can dynamically reload its stop word set from a file.
- ReloadableStopFilter(TokenStream, Path, boolean, long) - Constructor for class org.codelibs.analysis.en.ReloadableStopFilter
-
Constructs a ReloadableStopFilter with the specified input stream, stop word file path, case sensitivity, and reload interval.
- reset() - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
reset() resets the stemmer so it can stem another word.
- reset() - Method in class org.codelibs.analysis.en.ReloadableKeywordMarkerFilter
- reset() - Method in class org.codelibs.analysis.en.ReloadableStopFilter
- reset() - Method in class org.codelibs.analysis.ja.KanjiNumberFilter
S
- setMaxTokenLength(int) - Method in class org.codelibs.analysis.en.AlphaNumWordFilter
-
Sets the maximum token length for concatenated tokens.
- stem() - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Stem the word placed into the Stemmer buffer through calls to add().
- stem(char[]) - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Stem a word contained in a char[].
- stem(char[], int) - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Stem a word contained in a leading portion of a char[] array.
- stem(char[], int, int) - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Stem a word contained in a portion of a char[] array.
- stem(int) - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Stem the word in the buffer starting at the given offset.
- stem(String) - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
Stem a word provided as a String.
- StopTokenFilter - Class in org.codelibs.analysis
-
Abstract base class for stop token filters that match tokens against a word list.
- StopTokenFilter(TokenStream, String[], boolean) - Constructor for class org.codelibs.analysis.StopTokenFilter
-
Constructs a StopTokenFilter with the specified input stream, stop words, and case sensitivity.
- StopTokenPrefixFilter - Class in org.codelibs.analysis.ja
-
A stop token filter that removes tokens beginning with any of the specified prefix words.
- StopTokenPrefixFilter(TokenStream, String[], boolean) - Constructor for class org.codelibs.analysis.ja.StopTokenPrefixFilter
-
Constructs a StopTokenPrefixFilter with the specified input stream, prefix words, and case sensitivity.
- StopTokenSuffixFilter - Class in org.codelibs.analysis.ja
-
A stop token filter that removes tokens ending with any of the specified suffix words.
- StopTokenSuffixFilter(TokenStream, String[], boolean) - Constructor for class org.codelibs.analysis.ja.StopTokenSuffixFilter
-
Constructs a StopTokenSuffixFilter with the specified input stream, suffix words, and case sensitivity.
T
- termAtt - Variable in class org.codelibs.analysis.ConcatenationFilter
-
The term attribute for accessing and modifying token text
- termAtt - Variable in class org.codelibs.analysis.StopTokenFilter
-
Character term attribute for accessing the current token's text
- toString() - Method in class org.codelibs.analysis.en.FlexiblePorterStemmer
-
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
W
- words - Variable in class org.codelibs.analysis.ja.NumberConcatenationFilter
-
Set of words used to determine concatenation behavior
All Classes and Interfaces|All Packages|Constant Field Values