Class WordDictionary


  • public class WordDictionary
    extends Object
    Represent a word dictionary.
    This dictionary identify each sequence of chars as an unique "word" and keep information for this word.
    Each word are identified by a single int ID to save memory and space.
    The dictionary itself is identified with an UUID to verify consistency when using user dictionary.
    Note that Word added to WordDictionary cannot be removed : their ID should be consistent and they could have been used in a AbstractNGramDictionary : however, you can disable a word with Word.setForceInvalid(boolean, boolean)
    • Constructor Detail

    • Method Detail

      • getWordId

        public int getWordId​(String wordStr)
        To get a word ID.
        Note that this method will never return null : it can however return Tag.UNKNOWN id if there is no word in the dictionary for the given text.
        Parameters:
        wordStr - the word content
        Returns:
        word ID if found in the dictionary or Tag.UNKNOWN id
      • getWord

        public Word getWord​(String wordStr)
        To get the word entity from text.
        Note that this method will never return null : it can however return Tag.UNKNOWN id if there is no word in the dictionary for the given text.
        Parameters:
        wordStr - the word content
        Returns:
        word if found in the dictionary or TagWord with Tag.UNKNOWN
      • getWord

        public Word getWord​(int wordId)
        To get a word entity from id.
        Contrary to other getWord(String) method, this return null if there is no word for the given ID
        Parameters:
        wordId - the word ID
        Returns:
        the word associated with the given ID
      • size

        public int size()
        The word count stored in this dictionary.
        Returns:
        the stored word count
      • putUserWord

        public Word putUserWord​(String wordStr)
        To manually add an user word to this dictionary.
        This will create the associated word entity.
        This doesn't check that a previous word was in the dictionary with the same word : you should check it before calling this method (use getWord(String))
        Parameters:
        wordStr - the word to add
        Returns:
        the created user word
      • getValidWordForPredictionByPrefix

        public Map<BiIntegerKey,​NextWord> getValidWordForPredictionByPrefix​(String wordPrefix,
                                                                                  PredictionParameter predictionParameter,
                                                                                  int wantedWordCount,
                                                                                  Set<Integer> wordIdsToExclude)
        Returns all the words that starts with a given prefix.
        The returned list is not sorted.
        Parameters:
        wordPrefix - prefix
        predictionParameter - prediction parameters (used to validate word to predict)
        wantedWordCount - the number of word wanted (used to stop search)
        wordIdsToExclude - to exclude word from the result set (by id)
        Returns:
        the list of all word starting with the given prefix
      • putUserWord

        public int putUserWord​(Token token)
      • incrementUserWord

        public void incrementUserWord​(int wordId)
      • isTokenValidToCreateUserWord

        public boolean isTokenValidToCreateUserWord​(Token token)
      • putWordTraining

        public int putWordTraining​(String word)
      • loadDictionary

        public static WordDictionary loadDictionary​(LanguageModel languageModel,
                                                    File dictionaryFile)
                                             throws IOException
        Create a word dictionary from a word dictionary data file previously created with the training algorithm.
        This method should not be called on user dictionary file, use loadUserDictionary(File) instead.
        Parameters:
        languageModel - the language model contained in this dictionary
        dictionaryFile - the dictionary data file
        Returns:
        the loaded dictionary
        Throws:
        IOException - if the data file doesn't exist or if IO problem happens
      • loadUserDictionary

        public void loadUserDictionary​(File userDictionaryFile)
                                throws IOException,
                                       WordDictionaryMatchingException
        To load user dictionary on an existing trained dictionary.
        This will supplement this dictionary with custom word from user, or existing word with modified parameters.
        This should be called on dictionary previously saved with saveUserDictionary(File)
        Parameters:
        userDictionaryFile - the user dictionary data file
        Throws:
        IOException - if the data file doesn't exist or if IO problem happens
        WordDictionaryMatchingException - if the loaded word dictionary doesn't match this dictionary : the user dictionary should always be loaded on the same trained dictionary used to save it