Package org.apache.lucene.analysis.ar
Class ArabicStemmer
- java.lang.Object
-
- org.apache.lucene.analysis.ar.ArabicStemmer
-
public class ArabicStemmer extends Object
Stemmer for Arabic.Stemming is done in-place for efficiency, operating on a termbuffer.
Stemming is defined as:
- Removal of attached definite article, conjunction, and prepositions.
- Stemming of common suffixes.
-
-
Constructor Summary
Constructors Constructor Description ArabicStemmer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
stem(char[] s, int len)
Stem an input buffer of Arabic text.int
stemPrefix(char[] s, int len)
Stem a prefix off an Arabic word.int
stemSuffix(char[] s, int len)
Stem suffix(es) off an Arabic word.
-
-
-
Field Detail
-
ALEF
public static final char ALEF
- See Also:
- Constant Field Values
-
BEH
public static final char BEH
- See Also:
- Constant Field Values
-
TEH_MARBUTA
public static final char TEH_MARBUTA
- See Also:
- Constant Field Values
-
TEH
public static final char TEH
- See Also:
- Constant Field Values
-
FEH
public static final char FEH
- See Also:
- Constant Field Values
-
KAF
public static final char KAF
- See Also:
- Constant Field Values
-
LAM
public static final char LAM
- See Also:
- Constant Field Values
-
NOON
public static final char NOON
- See Also:
- Constant Field Values
-
HEH
public static final char HEH
- See Also:
- Constant Field Values
-
WAW
public static final char WAW
- See Also:
- Constant Field Values
-
YEH
public static final char YEH
- See Also:
- Constant Field Values
-
prefixes
public static final char[][] prefixes
-
suffixes
public static final char[][] suffixes
-
-
Method Detail
-
stem
public int stem(char[] s, int len)
Stem an input buffer of Arabic text.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-
stemPrefix
public int stemPrefix(char[] s, int len)
Stem a prefix off an Arabic word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming.
-
stemSuffix
public int stemSuffix(char[] s, int len)
Stem suffix(es) off an Arabic word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming
-
-