Interface LanguageModel

  • All Known Implementing Classes:
    AbstractLanguageModel, FrenchLanguageModel

    public interface LanguageModel
    Represent a model specific to the input language.
    This model is useful to better perform on NLP task by using specific parameters from a language.
    E.G. getAverageWordLength() is useful to optimize tokenizer.
    • Method Detail

      • getId

        String getId()
        Returns:
        identifier for this language model (e.g. ISO code)
      • getAverageWordLength

        int getAverageWordLength()
        Returns:
        the average word length for this language (can be round to the upper value)
      • getAverageVocabularySize

        int getAverageVocabularySize()
        Average total vocabulary size (different existing words)
        Returns:
        the average vocabulary size for this language.
      • getValidOneCharWords

        Set<String> getValidOneCharWords()
      • getTokenMatchersForSemanticAnalysis

        TokenMatcher[] getTokenMatchersForSemanticAnalysis()
      • getTokenMatchersForNGram

        TokenMatcher[] getTokenMatchersForNGram()