Package org.apache.lucene.analysis.core
Class LowerCaseTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.CharTokenizer
org.apache.lucene.analysis.core.LetterTokenizer
org.apache.lucene.analysis.core.LowerCaseTokenizer
- All Implemented Interfaces:
Closeable
,AutoCloseable
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together. It divides text at non-letters and converts
them to lower case. While it is functionally equivalent to the combination
of LetterTokenizer and LowerCaseFilter, there is a performance advantage
to doing the two tasks at once, hence this (redundant) implementation.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
You must specify the required Version
compatibility when creating
LowerCaseTokenizer
:
- As of 3.1,
CharTokenizer
uses an int based API to normalize and detect token characters. SeeCharTokenizer.isTokenChar(int)
andCharTokenizer.normalize(int)
for details.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
Constructor Summary
ConstructorsConstructorDescriptionLowerCaseTokenizer
(Version matchVersion, Reader in) Construct a new LowerCaseTokenizer.LowerCaseTokenizer
(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in) Construct a new LowerCaseTokenizer using a givenAttributeSource.AttributeFactory
. -
Method Summary
Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
end, incrementToken, reset
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
Constructor Details
-
LowerCaseTokenizer
Construct a new LowerCaseTokenizer.- Parameters:
matchVersion
- Lucene version to match Seeinvalid @link
{@link <a href="#version">above</a>
in
- the input to split up into tokens
-
LowerCaseTokenizer
public LowerCaseTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in) Construct a new LowerCaseTokenizer using a givenAttributeSource.AttributeFactory
.- Parameters:
matchVersion
- Lucene version to match Seeinvalid @link
{@link <a href="#version">above</a>
factory
- the attribute factory to use for thisTokenizer
in
- the input to split up into tokens
-