Package org.apache.lucene.analysis.ar
Class ArabicLetterTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- org.apache.lucene.analysis.util.CharTokenizer
-
- org.apache.lucene.analysis.core.LetterTokenizer
-
- org.apache.lucene.analysis.ar.ArabicLetterTokenizer
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
@Deprecated public class ArabicLetterTokenizer extends LetterTokenizer
Deprecated.(3.1) UseStandardTokenizer
instead.Tokenizer that breaks text into runs of letters and diacritics.The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required
Version
compatibility when creatingArabicLetterTokenizer
:- As of 3.1,
CharTokenizer
uses an int based API to normalize and detect token characters. SeeisTokenChar(int)
andCharTokenizer.normalize(int)
for details.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
-
Constructor Summary
Constructors Constructor Description ArabicLetterTokenizer(Version matchVersion, java.io.Reader in)
Deprecated.Construct a new ArabicLetterTokenizer.ArabicLetterTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, java.io.Reader in)
Deprecated.Construct a new ArabicLetterTokenizer using a givenAttributeSource.AttributeFactory
.
-
Method Summary
-
Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
end, incrementToken, reset
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Constructor Detail
-
ArabicLetterTokenizer
public ArabicLetterTokenizer(Version matchVersion, java.io.Reader in)
Deprecated.Construct a new ArabicLetterTokenizer.- Parameters:
matchVersion
- Lucene version to match See {@link above}in
- the input to split up into tokens
-
ArabicLetterTokenizer
public ArabicLetterTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, java.io.Reader in)
Deprecated.Construct a new ArabicLetterTokenizer using a givenAttributeSource.AttributeFactory
. * @param matchVersion Lucene version to match See {@link above}- Parameters:
factory
- the attribute factory to use for this Tokenizerin
- the input to split up into tokens
-
-