Class NGramWordPredictorUtils


  • public class NGramWordPredictorUtils
    extends java.lang.Object
    Utils class useful when predicting words with an ngram dictionaries.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Triple<int[],​java.lang.Boolean,​java.lang.Boolean> createPrefixFor​(java.util.List<Token> rawTokensList, WordPrefixDetected wordPrefixFound, int wantedOrder, boolean addUnknownWordToDictionary)
      Create the prefix for a given raw context (token list) : the context is meant to be used for ngram trie exploring.
      The context takes care of using only the last sentence, to detect the current written word, and to retrieve a context of the wanted order.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • createPrefixFor

        public Triple<int[],​java.lang.Boolean,​java.lang.Boolean> createPrefixFor​(java.util.List<Token> rawTokensList,
                                                                                             WordPrefixDetected wordPrefixFound,
                                                                                             int wantedOrder,
                                                                                             boolean addUnknownWordToDictionary)
        Create the prefix for a given raw context (token list) : the context is meant to be used for ngram trie exploring.
        The context takes care of using only the last sentence, to detect the current written word, and to retrieve a context of the wanted order.
        Parameters:
        rawTokensList - raw token list (retrieved from parsing a raw text input)
        wordPrefixFound - if a word is started, should contains the started prefix
        wantedOrder - the wanted prefix order
        addUnknownWordToDictionary - if true, when a word in the given token list is unknown, the method WordDictionary.putUserWord(Token) will be called to retrieve the word id
        Returns:
        a triple containing the prefix on left : containing the last (wantedOrder-1) words in the given token list.
        Resulting prefix will contains Tag.UNKNOWN if the token list contains unknown word and addUnknownWordToDictionary is false, and will contains Tag.START if the given token list is not long enough, or if the current sentence is too short.
        The boolean on middle is true if there is only "START" tag in the resulting prefix
        The boolean on right is true if we found a unknown word in the prefix (even if it's added to dictionary)