Class StaticNGramTrieDictionary
- java.lang.Object
-
- org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary<StaticNGramTrieNode>
-
- org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
-
- All Implemented Interfaces:
AutoCloseable
public class StaticNGramTrieDictionary extends AbstractNGramDictionary<StaticNGramTrieNode>
Represent a static ngram dictionary where trie node are loaded "on demand" while browsing through the nodes.
This dictionary is read only and cannot be updated or saved : methods likeupdateProbabilities(double[])
,putAndIncrementBy(int[], int)
are not supported by this dictionary.
This dictionary is created from a savedTrainingNGramDictionary
.
-
-
Field Summary
-
Fields inherited from class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
DICTIONARY_INFORMATION_BYTE_COUNT, maxOrder, rootNode
-
-
Constructor Summary
Constructors Constructor Description StaticNGramTrieDictionary()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
checkChildrenLoading(StaticNGramTrieNode node)
To check that the children of a given node are loaded into memory (and can be used)void
close()
double[]
computeD(TrainingConfiguration configuration)
Compute the optimal value for d (absolute discounting parameter).
Usually d is computed with formula :
D = C1 / (C1 + 2 * C2)
Where C1 = number of ngram with count == 1, and C2 = number of ngram with count == 2.StaticNGramTrieNode
getNodeForPrefix(int[] prefix, int index)
Use to retrieve a node for a given prefix.
For example, for prefix = [1,2] will return the trie node corresponding to {2}.
The children of the given node may have not been loaded.static StaticNGramTrieDictionary
open(File dictionaryFile)
Create a static ngram dictionary from a given file.protected void
openDictionary(File dictionaryFile)
Open a dictionary from a file.
To use the dictionary, the sameWordDictionary
used to save it should be used.void
putAndIncrementBy(int[] ngram, int increment)
Add a given ngram to the dictionary and to increment its count.
If the ngram is already in the dictionary, will just increment its count.
This will callAbstractNGramDictionary.putAndIncrementBy(int[], int, int)
with a index = 0void
putAndIncrementBy(int[] ngram, int index, int increment)
Add a given ngram to the dictionary and to increment its count.
If the ngram is already in the dictionary, will just increment its count.void
saveDictionary(File dictionaryFile)
Save this dictionary to a file.
Will save the dictionary relative with id only, this means that the same word dictionary should be loaded if this dictionary is opened later.void
updateProbabilities(double[] d)
Update the whole probabilities in this dictionary.
Can take a while if there is a lot of nodes in the dictionary.void
updateProbabilities(int[] prefix, int prefixIndex, double[] d)
Update probabilities in this dictionary for a specific ngram prefix : this will update the probabilities of the prefix children, and update the backoff weight of the parent node.
This is much more optimized thanAbstractNGramDictionary.updateProbabilities(double[])
-
Methods inherited from class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
compact, getMaxOrder, getNextWord, getProbability, getRawProbability, getRoot, listNextWords, readDictionaryInformation, writeDictionaryInfo
-
-
-
-
Method Detail
-
getNodeForPrefix
public StaticNGramTrieNode getNodeForPrefix(int[] prefix, int index)
Description copied from class:AbstractNGramDictionary
Use to retrieve a node for a given prefix.
For example, for prefix = [1,2] will return the trie node corresponding to {2}.
The children of the given node may have not been loaded.- Specified by:
getNodeForPrefix
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
prefix
- the node prefixindex
- first word in prefix index (to take the full prefix, index should be = 0)- Returns:
- the node found for the given prefix, or null if there is no existing node for such prefix
-
checkChildrenLoading
public boolean checkChildrenLoading(StaticNGramTrieNode node)
Description copied from class:AbstractNGramDictionary
To check that the children of a given node are loaded into memory (and can be used)- Specified by:
checkChildrenLoading
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
node
- the node to check children loading on- Returns:
- true if there is children for this node, and these children are loaded.
-
openDictionary
protected void openDictionary(File dictionaryFile) throws IOException
Description copied from class:AbstractNGramDictionary
Open a dictionary from a file.
To use the dictionary, the sameWordDictionary
used to save it should be used.- Specified by:
openDictionary
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
dictionaryFile
- the file containing a dictionary.- Throws:
IOException
- if dictionary can't be opened
-
open
public static StaticNGramTrieDictionary open(File dictionaryFile) throws IOException
Create a static ngram dictionary from a given file.- Parameters:
dictionaryFile
- file that contains the dictionary- Returns:
- the static ngram dictionary, initialized and ready to use
- Throws:
IOException
- if dictionary can't be loaded
-
putAndIncrementBy
public void putAndIncrementBy(int[] ngram, int index, int increment)
Description copied from class:AbstractNGramDictionary
Add a given ngram to the dictionary and to increment its count.
If the ngram is already in the dictionary, will just increment its count.- Specified by:
putAndIncrementBy
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
ngram
- the ngram to put in dictionaryindex
- index for ngram start (index when the ngram become valid : for example, if we want to skip the first ngram word, just set index = 1)increment
- the increment value
-
putAndIncrementBy
public void putAndIncrementBy(int[] ngram, int increment)
Description copied from class:AbstractNGramDictionary
Add a given ngram to the dictionary and to increment its count.
If the ngram is already in the dictionary, will just increment its count.
This will callAbstractNGramDictionary.putAndIncrementBy(int[], int, int)
with a index = 0- Specified by:
putAndIncrementBy
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
ngram
- the ngram to put in dictionaryincrement
- the increment value
-
saveDictionary
public void saveDictionary(File dictionaryFile) throws IOException
Description copied from class:AbstractNGramDictionary
Save this dictionary to a file.
Will save the dictionary relative with id only, this means that the same word dictionary should be loaded if this dictionary is opened later.- Specified by:
saveDictionary
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
dictionaryFile
- the file where dictionary should be saved.- Throws:
IOException
- if dictionary can't be saved
-
updateProbabilities
public void updateProbabilities(double[] d)
Description copied from class:AbstractNGramDictionary
Update the whole probabilities in this dictionary.
Can take a while if there is a lot of nodes in the dictionary.- Specified by:
updateProbabilities
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
d
- the d parameter for absolute discounting algorithm.
-
updateProbabilities
public void updateProbabilities(int[] prefix, int prefixIndex, double[] d)
Description copied from class:AbstractNGramDictionary
Update probabilities in this dictionary for a specific ngram prefix : this will update the probabilities of the prefix children, and update the backoff weight of the parent node.
This is much more optimized thanAbstractNGramDictionary.updateProbabilities(double[])
- Specified by:
updateProbabilities
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
prefix
- prefix of the node that should be updatedprefixIndex
- prefix start index (0 = full prefix, 1 = skip the first word in prefix, etc...)d
- the d parameter for absolute discounting algorithm.
-
computeD
public double[] computeD(TrainingConfiguration configuration)
Description copied from class:AbstractNGramDictionary
Compute the optimal value for d (absolute discounting parameter).
Usually d is computed with formula :
D = C1 / (C1 + 2 * C2)
Where C1 = number of ngram with count == 1, and C2 = number of ngram with count == 2. Theses values are computed for each order (0 index = unigram, 1 index = bigram, etc.)- Specified by:
computeD
in classAbstractNGramDictionary<StaticNGramTrieNode>
- Parameters:
configuration
- configuration to use to compute D (can set min/max values and a D value)- Returns:
- computed d value for this dictionary
-
-