Package com.worksap.nlp.sudachi
Interface Morpheme
-
public interface Morpheme
A morpheme.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description int
begin()
Returns the start index of the morpheme.String
dictionaryForm()
Returns the dictionary form of morpheme.int
end()
Returns the offset after the last character of the morpheme.int
getDictionaryId()
Returns the ID of the dicitionary containing the morpheme.int[]
getSynonymGroupIds()
Returns the IDs of the synonym groups of the morpheme.int
getWordId()
Returns the ID of the morpheme.boolean
isOOV()
Returns whether the morpheme is out-of-vocabulary (OOV) or not.String
normalizedForm()
Returns the normalized form of morpheme.List<String>
partOfSpeech()
Returns the part of speech of the morpheme.short
partOfSpeechId()
Returns the ID of part of speech of the morpheme.String
readingForm()
Returns the reading form of morpheme.List<Morpheme>
split(Tokenizer.SplitMode mode)
Split the morpheme in another splitting mode.String
surface()
Returns the text of morpheme.
-
-
-
Method Detail
-
begin
int begin()
Returns the start index of the morpheme. When the input text is normalized, some morphemes have the same start index.- Returns:
- the index of first character of the morpheme
-
end
int end()
Returns the offset after the last character of the morpheme. When the input text is normalized, some morphemes have the same end index.- Returns:
- the offset after the last character of the morpheme
-
surface
String surface()
Returns the text of morpheme. When the input text is normalized, some morphemes have the same surface.- Returns:
- the text of morpheme
-
partOfSpeech
List<String> partOfSpeech()
Returns the part of speech of the morpheme.- Returns:
- the part of speech of the morpheme
-
partOfSpeechId
short partOfSpeechId()
Returns the ID of part of speech of the morpheme.- Returns:
- the ID of part of speech of the morpheme
-
dictionaryForm
String dictionaryForm()
Returns the dictionary form of morpheme. 'Dictionary form' means a word's lemma and '終止形' in Japanese.- Returns:
- the dictionary form of morpheme
-
normalizedForm
String normalizedForm()
Returns the normalized form of morpheme. This method returns the form normalizing inconsistent spellings and inflected forms.- Returns:
- the normalized form of morpheme
-
readingForm
String readingForm()
Returns the reading form of morpheme. This method returns Japanese syllabaries 'フリガナ' in katakana. If the morpheme is OOV, it returns a empty string.- Returns:
- the reading form of morpheme.
-
split
List<Morpheme> split(Tokenizer.SplitMode mode)
Split the morpheme in another splitting mode. Ifmode
is the same with using inTokenizer.tokenize(Tokenizer.SplitMode,String)
or no more splitting, this method returnsthis
.- Parameters:
mode
- a mode of splitting- Returns:
- the list of splitted morphemes
- See Also:
Tokenizer.tokenize(Tokenizer.SplitMode,String)
-
isOOV
boolean isOOV()
Returns whether the morpheme is out-of-vocabulary (OOV) or not.- Returns:
true
if, and only if the morpheme is OOV
-
getWordId
int getWordId()
Returns the ID of the morpheme. The IDs change when the dictionaries are updated or the combination of dictionaries changes. If the morpheme is OOV, it returns an undefined value.- Returns:
- the word ID
-
getDictionaryId
int getDictionaryId()
Returns the ID of the dicitionary containing the morpheme. If the morpheme is in the system dictionary, it returns0
. If the morpheme is OOV, it returns a negative value.- Returns:
- the dictionary ID
-
getSynonymGroupIds
int[] getSynonymGroupIds()
Returns the IDs of the synonym groups of the morpheme. If the morpheme has synonyms, it returns IDs of the sysnonym groups.- Returns:
- the array of synonym group IDs
-
-