Package org.predict4all.nlp.ngram
Class NGramWordPredictorUtils
- java.lang.Object
-
- org.predict4all.nlp.ngram.NGramWordPredictorUtils
-
public class NGramWordPredictorUtils extends java.lang.Object
Utils class useful when predicting words with an ngram dictionaries.
-
-
Constructor Summary
Constructors Constructor Description NGramWordPredictorUtils(WordDictionary wordDictionary, PredictionParameter predictionParameter)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Triple<int[],java.lang.Boolean,java.lang.Boolean>
createPrefixFor(java.util.List<Token> rawTokensList, WordPrefixDetected wordPrefixFound, int wantedOrder, boolean addUnknownWordToDictionary)
Create the prefix for a given raw context (token list) : the context is meant to be used for ngram trie exploring.
The context takes care of using only the last sentence, to detect the current written word, and to retrieve a context of the wanted order.
-
-
-
Constructor Detail
-
NGramWordPredictorUtils
public NGramWordPredictorUtils(WordDictionary wordDictionary, PredictionParameter predictionParameter)
-
-
Method Detail
-
createPrefixFor
public Triple<int[],java.lang.Boolean,java.lang.Boolean> createPrefixFor(java.util.List<Token> rawTokensList, WordPrefixDetected wordPrefixFound, int wantedOrder, boolean addUnknownWordToDictionary)
Create the prefix for a given raw context (token list) : the context is meant to be used for ngram trie exploring.
The context takes care of using only the last sentence, to detect the current written word, and to retrieve a context of the wanted order.- Parameters:
rawTokensList
- raw token list (retrieved from parsing a raw text input)wordPrefixFound
- if a word is started, should contains the started prefixwantedOrder
- the wanted prefix orderaddUnknownWordToDictionary
- if true, when a word in the given token list is unknown, the methodWordDictionary.putUserWord(Token)
will be called to retrieve the word id- Returns:
- a triple containing the prefix on left : containing the last (wantedOrder-1) words in the given token list.
Resulting prefix will containsTag.UNKNOWN
if the token list contains unknown word and addUnknownWordToDictionary is false, and will containsTag.START
if the given token list is not long enough, or if the current sentence is too short.
The boolean on middle is true if there is only "START" tag in the resulting prefix
The boolean on right is true if we found a unknown word in the prefix (even if it's added to dictionary)
-
-