java.lang.Object
- org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary<T>

Type Parameters:

T - type of trie node stored in this dictionary.

All Implemented Interfaces:

AutoCloseable

Direct Known Subclasses:

StaticNGramTrieDictionary, TrainingNGramDictionary
```
public abstract class AbstractNGramDictionary<T extends AbstractNGramTrieNode<T>>
extends Object
implements AutoCloseable
```
Represent an ngram dictionary in an abstract way : dictionary can be static or dynamic.
Each type of dictionary can or can't support operation, such as dictionary saving, or updating probabilities.

The dictionary has a maxOrder that represents the max order gram that can be found in the dictionary. Order in a ngram correspond to the ngram rank : 1 = unigram, 2 bigram, etc... Order in dictionary is not bounded to a maximum value, but in practice, order is never more than 5.

Dictionary are represented as a trie, with also different kind of trie availabe. Each type of dictionary is associated with a different type of AbstractNGramTrieNode (e.g. dynamic dictionary is associated with a dynamic trie node).

Field Summary

Fields
Modifier and Type	Field	Description
`protected static int`	`DICTIONARY_INFORMATION_BYTE_COUNT`	Byte count needed to save general information about this dictionary.
`protected int`	`maxOrder`	Max order possible to store in this dictionary. Could be retrieved by opening the dictionary, or set by user as a limit.
`protected T`	`rootNode`	Root node of this dictionary (this node contains as children the whole vocabulary)

Constructor Summary

Constructors
Constructor Description

AbstractNGramDictionary(T rootNode, int maxOrderP)
Construct a dictionary with a given root node and a max possible order.

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method	Description
`abstract boolean`	`checkChildrenLoading(T node)`	To check that the children of a given node are loaded into memory (and can be used)
`void`	`compact()`	Compact the nodes in this dictionary (this will call `AbstractNGramTrieNode.compact()` on root)
`abstract double[]`	`computeD(TrainingConfiguration configuration)`	Compute the optimal value for d (absolute discounting parameter). Usually d is computed with formula : *D = C1 / (C1 + 2 C2)** Where C1 = number of ngram with count == 1, and C2 = number of ngram with count == 2.
`int`	`getMaxOrder()`
`TIntHashSet`	`getNextWord(int[] prefix)`	Return the immediate next words for a given prefix (without any filter)
`abstract T`	`getNodeForPrefix(int[] prefix, int index)`	Use to retrieve a node for a given prefix. For example, for prefix = [1,2] will return the trie node corresponding to {2}. The children of the given node may have not been loaded.
`double`	`getProbability(int[] prefix, int index, int length, int wordId)`	Return the probability of a word for a given prefix. Given index = 0 and length = prefix.length will return the maximum order probability (e.g. prefix.length = 3, will return probability for order 3)
`double`	`getRawProbability(int[] prefix, int index, int length, int wordId)`
`T`	`getRoot()`
`void`	`listNextWords(int[] prefix, WordDictionary wordDictionary, PredictionParameter predictionParameter, Set<Integer> wordsToExclude, Map<BiIntegerKey,NextWord> resultSet, int wantedCount, boolean unigramLevel)`	Will go through each ngram dictionary order to find the next possible words for a given prefix Will first go through the highest order for the given prefix (e.g. prefix length == 3 = order is 4), and if the wantedCount is not reached, will go to the lower order to find new next possible.
`protected abstract void`	`openDictionary(File dictionaryFile)`	Open a dictionary from a file. To use the dictionary, the same `WordDictionary` used to save it should be used.
`abstract void`	`putAndIncrementBy(int[] ngram, int increment)`	Add a given ngram to the dictionary and to increment its count. If the ngram is already in the dictionary, will just increment its count. This will call `putAndIncrementBy(int[], int, int)` with a index = 0
`abstract void`	`putAndIncrementBy(int[] ngram, int index, int increment)`	Add a given ngram to the dictionary and to increment its count. If the ngram is already in the dictionary, will just increment its count.
`protected void`	`readDictionaryInformation(ByteBuffer byteBuffer)`	Read the general information for this dictionary from a given buffer (doesn't do any check)
`abstract void`	`saveDictionary(File dictionaryFile)`	Save this dictionary to a file. Will save the dictionary relative with id only, this means that the same word dictionary should be loaded if this dictionary is opened later.
`abstract void`	`updateProbabilities(double[] d)`	Update the whole probabilities in this dictionary. Can take a while if there is a lot of nodes in the dictionary.
`abstract void`	`updateProbabilities(int[] prefix, int prefixIndex, double[] d)`	Update probabilities in this dictionary for a specific ngram prefix : this will update the probabilities of the prefix children, and update the backoff weight of the parent node. This is much more optimized than `updateProbabilities(double[])`
`protected void`	`writeDictionaryInfo(ByteBuffer buffWrite)`	Write the general information for this dictionary to a given buffer

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.lang.AutoCloseable
close

- Field Detail
  - DICTIONARY_INFORMATION_BYTE_COUNT
```
protected static final int DICTIONARY_INFORMATION_BYTE_COUNT
```
    Byte count needed to save general information about this dictionary. (e.g. max order)
    
    See Also:
    
    Constant Field Values
  - maxOrder
```
protected int maxOrder
```
    Max order possible to store in this dictionary.
    Could be retrieved by opening the dictionary, or set by user as a limit.
  - rootNode
```
protected final T extends AbstractNGramTrieNode<T> rootNode
```
    Root node of this dictionary (this node contains as children the whole vocabulary)
- Constructor Detail
  - AbstractNGramDictionary
```
public AbstractNGramDictionary(T rootNode,
                               int maxOrderP)
```
    Construct a dictionary with a given root node and a max possible order.
    
    Parameters:
    
    rootNode - the root node to use for this dictionary
    
    maxOrderP - max possible order for this dictionary.
- Method Detail
  - getRoot
```
public T getRoot()
```
    Returns:
    
    the root for this dictionary
  - getMaxOrder
```
public int getMaxOrder()
```
    Returns:
    
    the max possible order for this dictionary
  - compact
```
public void compact()
```
    Compact the nodes in this dictionary (this will call AbstractNGramTrieNode.compact() on root)
  - getNodeForPrefix
```
public abstract T getNodeForPrefix(int[] prefix,
                                   int index)
```
    Use to retrieve a node for a given prefix.
    For example, for prefix = [1,2] will return the trie node corresponding to {2}.
    The children of the given node may have not been loaded.
    
    Parameters:
    
    prefix - the node prefix
    
    index - first word in prefix index (to take the full prefix, index should be = 0)
    
    Returns:
    
    the node found for the given prefix, or null if there is no existing node for such prefix
  - checkChildrenLoading
```
public abstract boolean checkChildrenLoading(T node)
```
    To check that the children of a given node are loaded into memory (and can be used)
    
    Parameters:
    
    node - the node to check children loading on
    
    Returns:
    
    true if there is children for this node, and these children are loaded.
  - putAndIncrementBy
```
public abstract void putAndIncrementBy(int[] ngram,
                                       int index,
                                       int increment)
```
    Add a given ngram to the dictionary and to increment its count.
    If the ngram is already in the dictionary, will just increment its count.
    
    Parameters:
    
    ngram - the ngram to put in dictionary
    
    index - index for ngram start (index when the ngram become valid : for example, if we want to skip the first ngram word, just set index = 1)
    
    increment - the increment value
  - putAndIncrementBy
```
public abstract void putAndIncrementBy(int[] ngram,
                                       int increment)
```
    Add a given ngram to the dictionary and to increment its count.
    If the ngram is already in the dictionary, will just increment its count.
    This will call putAndIncrementBy(int[], int, int) with a index = 0
    
    Parameters:
    
    ngram - the ngram to put in dictionary
    
    increment - the increment value
  - saveDictionary
```
public abstract void saveDictionary(File dictionaryFile)
                             throws IOException
```
    Save this dictionary to a file.
    Will save the dictionary relative with id only, this means that the same word dictionary should be loaded if this dictionary is opened later.
    
    Parameters:
    
    dictionaryFile - the file where dictionary should be saved.
    
    Throws:
    
    IOException - if dictionary can't be saved
  - openDictionary
```
protected abstract void openDictionary(File dictionaryFile)
                                throws IOException
```
    Open a dictionary from a file.
    To use the dictionary, the same WordDictionary used to save it should be used.
    
    Parameters:
    
    dictionaryFile - the file containing a dictionary.
    
    Throws:
    
    IOException - if dictionary can't be opened
  - updateProbabilities
```
public abstract void updateProbabilities(double[] d)
```
    Update the whole probabilities in this dictionary.
    Can take a while if there is a lot of nodes in the dictionary.
    
    Parameters:
    
    d - the d parameter for absolute discounting algorithm.
  - updateProbabilities
```
public abstract void updateProbabilities(int[] prefix,
                                         int prefixIndex,
                                         double[] d)
```
    Update probabilities in this dictionary for a specific ngram prefix : this will update the probabilities of the prefix children, and update the backoff weight of the parent node.
    This is much more optimized than updateProbabilities(double[])
    
    Parameters:
    
    prefix - prefix of the node that should be updated
    
    prefixIndex - prefix start index (0 = full prefix, 1 = skip the first word in prefix, etc...)
    
    d - the d parameter for absolute discounting algorithm.
  - computeD
```
public abstract double[] computeD(TrainingConfiguration configuration)
```
    Compute the optimal value for d (absolute discounting parameter).
    Usually d is computed with formula :
    D = C1 / (C1 + 2 * C2)
    Where C1 = number of ngram with count == 1, and C2 = number of ngram with count == 2. Theses values are computed for each order (0 index = unigram, 1 index = bigram, etc.)
    
    Parameters:
    
    configuration - configuration to use to compute D (can set min/max values and a D value)
    
    Returns:
    
    computed d value for this dictionary
  - listNextWords
```
public void listNextWords(int[] prefix,
                          WordDictionary wordDictionary,
                          PredictionParameter predictionParameter,
                          Set<Integer> wordsToExclude,
                          Map<BiIntegerKey,NextWord> resultSet,
                          int wantedCount,
                          boolean unigramLevel)
```
    Will go through each ngram dictionary order to find the next possible words for a given prefix
    Will first go through the highest order for the given prefix (e.g. prefix length == 3 = order is 4), and if the wantedCount is not reached, will go to the lower order to find new next possible.
    
    Parameters:
    
    prefix - the prefix to detect word after (words ids, represent a ngram prefix)
    
    wordDictionary - word dictionary (useful only if prefixDetected is not null)
    
    predictionParameter - prediction parameter (can be used to validate words)
    
    wordsToExclude - a list of words that shouldn't be included in the result set
    
    resultSet - set that will contains every next words found
    
    wantedCount - wanted next word count (an higher count will take more time)
    
    unigramLevel - if true, this will go to unigram level (whole vocabulary) if the is not enough / this can be time consuming as unigram level contains the whole word dictionary
  - getNextWord
```
public TIntHashSet getNextWord(int[] prefix)
                        throws IOException
```
    Return the immediate next words for a given prefix (without any filter)
    
    Parameters:
    
    prefix - the prefix (previous N words)
    
    Returns:
    
    a set containing the next word for the given prefix, or null if there is no existing ngram in the dictionary for this prefix
    
    Throws:
    
    IOException - if children can't be read
  - getProbability
```
public double getProbability(int[] prefix,
                             int index,
                             int length,
                             int wordId)
```
    Return the probability of a word for a given prefix.
    Given index = 0 and length = prefix.length will return the maximum order probability (e.g. prefix.length = 3, will return probability for order 3)
    
    Parameters:
    
    prefix - the word before the given word (prefix)
    
    index - the index in the given prefix (will change the result order)
    
    length - the given prefix length (will change the result order).
    
    wordId - the word we want the probability for
    
    Returns:
    
    the probability for the given word (0.0 - 1.0)
  - getRawProbability
```
public double getRawProbability(int[] prefix,
                                int index,
                                int length,
                                int wordId)
```
  - readDictionaryInformation
```
protected void readDictionaryInformation(ByteBuffer byteBuffer)
```
    Read the general information for this dictionary from a given buffer (doesn't do any check)
    
    Parameters:
    
    byteBuffer - the byte buffer where dictionary information are read
  - writeDictionaryInfo
```
protected void writeDictionaryInfo(ByteBuffer buffWrite)
```
    Write the general information for this dictionary to a given buffer
    
    Parameters:
    
    buffWrite - the byte buffer where information are written

Class AbstractNGramDictionary<T extends AbstractNGramTrieNode<T>>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.lang.AutoCloseable

Field Detail

DICTIONARY_INFORMATION_BYTE_COUNT

maxOrder

rootNode

Constructor Detail

AbstractNGramDictionary

Method Detail

getRoot

getMaxOrder

compact

getNodeForPrefix

checkChildrenLoading

putAndIncrementBy

putAndIncrementBy

saveDictionary

openDictionary

updateProbabilities

updateProbabilities

computeD

listNextWords

getNextWord

getProbability

getRawProbability

readDictionaryInformation

writeDictionaryInfo