Package org.predict4all.nlp.ngram.trie
Class AbstractNGramTrieNode<T extends AbstractNGramTrieNode<?>>
- java.lang.Object
-
- org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode<T>
-
- Type Parameters:
T
- node children type (typically this node type)
- Direct Known Subclasses:
DynamicNGramTrieNode
,StaticNGramTrieNode
public abstract class AbstractNGramTrieNode<T extends AbstractNGramTrieNode<?>> extends java.lang.Object
Represent a node in a trie structure to represent ngrams. Trie structure is used to save memory because information about ngram are very redundant.
For example, the trie for sentences "this is sentence. this is what" will contains following nodes (showing 3 gram only):- this
--- is
----- sentence
----- what
Trie node can be static or dynamic, where both has different application :
- Static : static trie node are node loaded on demand while browsing the trie structure. Their frequencies and backoff weight are pre-computed and used "as it" by the dictionary. Static node doesn't support insertion/remove. They are useful to browse huge ngram trie with a limited memory use.
- Dynamic : dynamic trie node are fully loaded (which mean than the whole trie is loaded into memory) and they support insertion/removal. Their frequencies and bow are computed but can be dynamically computed because the count values are loaded. They are useful to train a ngram model (counting) or when the ngram trie is small (e.g. user ngram model).
-
-
Field Summary
Fields Modifier and Type Field Description protected TrieNodeMap<T>
children
Represent the children node for this node.
Each child is stored by its value (= word id) and represent the possible next value.
To save memory, the map is created on demand, so even if this node has children, the map can be null if children are not loaded yet.protected double
childrenBackoffWeight
Backoff weight for this node children frequenciesprotected int
childrenPosition
Contains the children nodes position in file.
Position in aFileChannel
is a long type, but to save memory the value is stored as an int (trie file never contains more thanInteger.MAX_VALUE
byte)static int
DYNAMIC_TRIE_NODE_SIZE_BYTE
Dynamic node byte size (4 integer) Integer : word id, children size, children position, countprotected double
frequency
Computed frequency for this nodestatic int
STATIC_TRIE_NODE_SIZE_BYTE
Static node byte size (3 integer, 2 double).
Integer : word id, children size, children position.
Double : frequency, backoff weight.
-
Constructor Summary
Constructors Constructor Description AbstractNGramTrieNode()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
compact()
compact the children of this node (if this node has children)TrieNodeMap<T>
getChildren()
double
getChildrenBackoffWeight()
abstract int
getChildrenSize()
double
getFrequency()
-
-
-
Field Detail
-
STATIC_TRIE_NODE_SIZE_BYTE
public static final int STATIC_TRIE_NODE_SIZE_BYTE
Static node byte size (3 integer, 2 double).
Integer : word id, children size, children position.
Double : frequency, backoff weight.- See Also:
- Constant Field Values
-
DYNAMIC_TRIE_NODE_SIZE_BYTE
public static final int DYNAMIC_TRIE_NODE_SIZE_BYTE
Dynamic node byte size (4 integer) Integer : word id, children size, children position, count- See Also:
- Constant Field Values
-
childrenPosition
protected int childrenPosition
Contains the children nodes position in file.
Position in aFileChannel
is a long type, but to save memory the value is stored as an int (trie file never contains more thanInteger.MAX_VALUE
byte)
-
children
protected TrieNodeMap<T extends AbstractNGramTrieNode<?>> children
Represent the children node for this node.
Each child is stored by its value (= word id) and represent the possible next value.
To save memory, the map is created on demand, so even if this node has children, the map can be null if children are not loaded yet.
-
frequency
protected double frequency
Computed frequency for this node
-
childrenBackoffWeight
protected double childrenBackoffWeight
Backoff weight for this node children frequencies
-
-
Method Detail
-
getFrequency
public double getFrequency()
- Returns:
- this node computed frequency
-
getChildrenBackoffWeight
public double getChildrenBackoffWeight()
- Returns:
- this node children backoff weight
-
getChildren
public TrieNodeMap<T> getChildren()
- Returns:
- this node children (can be null if this node has no children, or if children are not loaded)
-
getChildrenSize
public abstract int getChildrenSize()
- Returns:
- the different children count (not the total children count)
-
compact
public void compact()
compact the children of this node (if this node has children)
-
-