Package org.predict4all.nlp.language
Interface LanguageModel
-
- All Known Implementing Classes:
AbstractLanguageModel
,FrenchLanguageModel
public interface LanguageModel
Represent a model specific to the input language.
This model is useful to better perform on NLP task by using specific parameters from a language.
E.G.getAverageWordLength()
is useful to optimize tokenizer.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description int
getAverageVocabularySize()
Average total vocabulary size (different existing words)int
getAverageWordLength()
BaseWordDictionary
getBaseWordDictionary(TrainingConfiguration configuration)
String
getId()
StopWordDictionary
getStopWordDictionary(TrainingConfiguration configuration)
TokenMatcher[]
getTokenMatchersForNGram()
TokenMatcher[]
getTokenMatchersForSemanticAnalysis()
Set<String>
getValidOneCharWords()
-
-
-
Method Detail
-
getId
String getId()
- Returns:
- identifier for this language model (e.g. ISO code)
-
getAverageWordLength
int getAverageWordLength()
- Returns:
- the average word length for this language (can be round to the upper value)
-
getAverageVocabularySize
int getAverageVocabularySize()
Average total vocabulary size (different existing words)- Returns:
- the average vocabulary size for this language.
-
getTokenMatchersForSemanticAnalysis
TokenMatcher[] getTokenMatchersForSemanticAnalysis()
-
getTokenMatchersForNGram
TokenMatcher[] getTokenMatchersForNGram()
-
getStopWordDictionary
StopWordDictionary getStopWordDictionary(TrainingConfiguration configuration)
-
getBaseWordDictionary
BaseWordDictionary getBaseWordDictionary(TrainingConfiguration configuration)
-
-