Class StaticNGramTrieDictionary

    • Constructor Detail

      • StaticNGramTrieDictionary

        public StaticNGramTrieDictionary()
    • Method Detail

      • getNodeForPrefix

        public StaticNGramTrieNode getNodeForPrefix​(int[] prefix,
                                                    int index)
        Description copied from class: AbstractNGramDictionary
        Use to retrieve a node for a given prefix.
        For example, for prefix = [1,2] will return the trie node corresponding to {2}.
        The children of the given node may have not been loaded.
        Specified by:
        getNodeForPrefix in class AbstractNGramDictionary<StaticNGramTrieNode>
        Parameters:
        prefix - the node prefix
        index - first word in prefix index (to take the full prefix, index should be = 0)
        Returns:
        the node found for the given prefix, or null if there is no existing node for such prefix
      • open

        public static StaticNGramTrieDictionary open​(File dictionaryFile)
                                              throws IOException
        Create a static ngram dictionary from a given file.
        Parameters:
        dictionaryFile - file that contains the dictionary
        Returns:
        the static ngram dictionary, initialized and ready to use
        Throws:
        IOException - if dictionary can't be loaded
      • putAndIncrementBy

        public void putAndIncrementBy​(int[] ngram,
                                      int index,
                                      int increment)
        Description copied from class: AbstractNGramDictionary
        Add a given ngram to the dictionary and to increment its count.
        If the ngram is already in the dictionary, will just increment its count.
        Specified by:
        putAndIncrementBy in class AbstractNGramDictionary<StaticNGramTrieNode>
        Parameters:
        ngram - the ngram to put in dictionary
        index - index for ngram start (index when the ngram become valid : for example, if we want to skip the first ngram word, just set index = 1)
        increment - the increment value
      • saveDictionary

        public void saveDictionary​(File dictionaryFile)
                            throws IOException
        Description copied from class: AbstractNGramDictionary
        Save this dictionary to a file.
        Will save the dictionary relative with id only, this means that the same word dictionary should be loaded if this dictionary is opened later.
        Specified by:
        saveDictionary in class AbstractNGramDictionary<StaticNGramTrieNode>
        Parameters:
        dictionaryFile - the file where dictionary should be saved.
        Throws:
        IOException - if dictionary can't be saved
      • updateProbabilities

        public void updateProbabilities​(int[] prefix,
                                        int prefixIndex,
                                        double[] d)
        Description copied from class: AbstractNGramDictionary
        Update probabilities in this dictionary for a specific ngram prefix : this will update the probabilities of the prefix children, and update the backoff weight of the parent node.
        This is much more optimized than AbstractNGramDictionary.updateProbabilities(double[])
        Specified by:
        updateProbabilities in class AbstractNGramDictionary<StaticNGramTrieNode>
        Parameters:
        prefix - prefix of the node that should be updated
        prefixIndex - prefix start index (0 = full prefix, 1 = skip the first word in prefix, etc...)
        d - the d parameter for absolute discounting algorithm.
      • computeD

        public double[] computeD​(TrainingConfiguration configuration)
        Description copied from class: AbstractNGramDictionary
        Compute the optimal value for d (absolute discounting parameter).
        Usually d is computed with formula :
        D = C1 / (C1 + 2 * C2)
        Where C1 = number of ngram with count == 1, and C2 = number of ngram with count == 2. Theses values are computed for each order (0 index = unigram, 1 index = bigram, etc.)
        Specified by:
        computeD in class AbstractNGramDictionary<StaticNGramTrieNode>
        Parameters:
        configuration - configuration to use to compute D (can set min/max values and a D value)
        Returns:
        computed d value for this dictionary