Package org.apache.lucene.analysis.core
Class LetterTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.CharTokenizer
org.apache.lucene.analysis.core.LetterTokenizer
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
ArabicLetterTokenizer
,LowerCaseTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to
say, it defines tokens as maximal strings of adjacent letters, as defined by
java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
You must specify the required Version
compatibility when creating
LetterTokenizer
:
- As of 3.1,
CharTokenizer
uses an int based API to normalize and detect token characters. SeeCharTokenizer.isTokenChar(int)
andCharTokenizer.normalize(int)
for details.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
Constructor Summary
ConstructorsConstructorDescriptionLetterTokenizer
(Version matchVersion, Reader in) Construct a new LetterTokenizer.LetterTokenizer
(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in) Construct a new LetterTokenizer using a givenAttributeSource.AttributeFactory
. -
Method Summary
Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
end, incrementToken, reset
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
Constructor Details
-
LetterTokenizer
Construct a new LetterTokenizer.- Parameters:
matchVersion
- Lucene version to match Seeinvalid @link
{@link <a href="#version">above</a>
in
- the input to split up into tokens
-
LetterTokenizer
Construct a new LetterTokenizer using a givenAttributeSource.AttributeFactory
.- Parameters:
matchVersion
- Lucene version to match Seeinvalid @link
{@link <a href="#version">above</a>
factory
- the attribute factory to use for thisTokenizer
in
- the input to split up into tokens
-