Class AbstractNGramTrieNode<T extends AbstractNGramTrieNode<?>>

  • Type Parameters:
    T - node children type (typically this node type)
    Direct Known Subclasses:
    DynamicNGramTrieNode, StaticNGramTrieNode

    public abstract class AbstractNGramTrieNode<T extends AbstractNGramTrieNode<?>>
    extends java.lang.Object
    Represent a node in a trie structure to represent ngrams. Trie structure is used to save memory because information about ngram are very redundant.

    For example, the trie for sentences "this is sentence. this is what" will contains following nodes (showing 3 gram only):

    - this
    --- is
    ----- sentence
    ----- what

    Trie node can be static or dynamic, where both has different application :

    • Static : static trie node are node loaded on demand while browsing the trie structure. Their frequencies and backoff weight are pre-computed and used "as it" by the dictionary. Static node doesn't support insertion/remove. They are useful to browse huge ngram trie with a limited memory use.
    • Dynamic : dynamic trie node are fully loaded (which mean than the whole trie is loaded into memory) and they support insertion/removal. Their frequencies and bow are computed but can be dynamically computed because the count values are loaded. They are useful to train a ngram model (counting) or when the ngram trie is small (e.g. user ngram model).
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected TrieNodeMap<T> children
      Represent the children node for this node.
      Each child is stored by its value (= word id) and represent the possible next value.
      To save memory, the map is created on demand, so even if this node has children, the map can be null if children are not loaded yet.
      protected double childrenBackoffWeight
      Backoff weight for this node children frequencies
      protected int childrenPosition
      Contains the children nodes position in file.
      Position in a FileChannel is a long type, but to save memory the value is stored as an int (trie file never contains more than Integer.MAX_VALUE byte)
      static int DYNAMIC_TRIE_NODE_SIZE_BYTE
      Dynamic node byte size (4 integer) Integer : word id, children size, children position, count
      protected double frequency
      Computed frequency for this node
      static int STATIC_TRIE_NODE_SIZE_BYTE
      Static node byte size (3 integer, 2 double).
      Integer : word id, children size, children position.
      Double : frequency, backoff weight.
    • Field Detail

      • STATIC_TRIE_NODE_SIZE_BYTE

        public static final int STATIC_TRIE_NODE_SIZE_BYTE
        Static node byte size (3 integer, 2 double).
        Integer : word id, children size, children position.
        Double : frequency, backoff weight.
        See Also:
        Constant Field Values
      • DYNAMIC_TRIE_NODE_SIZE_BYTE

        public static final int DYNAMIC_TRIE_NODE_SIZE_BYTE
        Dynamic node byte size (4 integer) Integer : word id, children size, children position, count
        See Also:
        Constant Field Values
      • childrenPosition

        protected int childrenPosition
        Contains the children nodes position in file.
        Position in a FileChannel is a long type, but to save memory the value is stored as an int (trie file never contains more than Integer.MAX_VALUE byte)
      • children

        protected TrieNodeMap<T extends AbstractNGramTrieNode<?>> children
        Represent the children node for this node.
        Each child is stored by its value (= word id) and represent the possible next value.
        To save memory, the map is created on demand, so even if this node has children, the map can be null if children are not loaded yet.
      • frequency

        protected double frequency
        Computed frequency for this node
      • childrenBackoffWeight

        protected double childrenBackoffWeight
        Backoff weight for this node children frequencies
    • Constructor Detail

      • AbstractNGramTrieNode

        public AbstractNGramTrieNode()
    • Method Detail

      • getFrequency

        public double getFrequency()
        Returns:
        this node computed frequency
      • getChildrenBackoffWeight

        public double getChildrenBackoffWeight()
        Returns:
        this node children backoff weight
      • getChildren

        public TrieNodeMap<T> getChildren()
        Returns:
        this node children (can be null if this node has no children, or if children are not loaded)
      • getChildrenSize

        public abstract int getChildrenSize()
        Returns:
        the different children count (not the total children count)
      • compact

        public void compact()
        compact the children of this node (if this node has children)