com.globalmentor.collections.AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>

com.globalmentor.collections.CharSequenceSuffixTree

All Implemented Interfaces:: SuffixTree

public class CharSequenceSuffixTree extends AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>

A suffix tree for a sequence of characters.

This class builds a suffix tree from a sequence of characters in O(N) time following Ukkonen's algorithm. The first version of this algorithm was followed by closely following Mark Nelson's explanation and C++ algorithm presented in Fast String Searching With Suffix Trees.

Notes:

This implementation uses exclusive end positions rather than inclusive end positions, which are more intuitive, make calculations easier, and interact nicely with the Java API.
The original implementation needlessly tied the edge splitting logic to a particular suffix. While this doesn't affect functionality, it doesn't logically isolate the process of splitting an edge at a particular location, which is completely independent from a suffix. It merely needs to be known at what point along the edge the split should occur. This implementation also splits an edge by creating two new edges rather than merely modifying one.
The original article used a "suffix" class that with a node and character indexes. This implementation reverts to the more general "state" terminology used by Ukkonen. Furthermore, the end character index has been changed to exclusive, allowing a state "length" property to be more natural. It also allows the "explicit" state to be more readily apparently---the state in which start==end. Finally, these modifications reduce the state canonization logic to simply "consume edges until the next edge is not small enough to consume or the state is explicit".
The original implementation kept a record of the current last character being added. With every iteration the suffix/state had its endpoint incremented, making a separate last-character variable redundant.
The algorithm here checks to see when the state start has gone past the end; this signals that the current iteration is finished, and there is no need loop around and check for an edge emanating from the current active node, because the state was explicit in the previous iteration so an edge had to have been created.
The traditional algorithm has been modified slightly to construct an explicit suffix tree on the last round without the need of a unique "dummy character" appended to the string. This may result in some edges that are empty, as well as an extra, empty edge emanating from the root node representing the empty string suffix.

Author:

Garret Wilson

See Also:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

protected static class

CharSequenceSuffixTree.AbstractEdgeKey

An abstract base class that implements hashing and equality for an edge key.

protected class

CharSequenceSuffixTree.CharSequenceEdge

Represents an edge between a parent node and a child node in a suffix tree.

protected class

CharSequenceSuffixTree.CharSequenceNode

Represents an edge between a parent node and a child node in a suffix tree.

protected static interface

CharSequenceSuffixTree.EdgeKey

A key identifying an edge of a node, uniquely identified by its parent node and first character (as no node in a suffix tree contains more than one edge starting with the same character).

Nested classes/interfaces inherited from class com.globalmentor.collections.AbstractSuffixTree
AbstractSuffixTree.AbstractNode

Nested classes/interfaces inherited from interface com.globalmentor.collections.SuffixTree
SuffixTree.Edge, SuffixTree.Node
Method Summary

Modifier and Type

Method

Description

protected void

addEdge(CharSequenceSuffixTree.CharSequenceEdge edge)

Adds an edge to the tree.

static CharSequenceSuffixTree

create(CharSequence charSequence)

Suffix tree builder factory method which creates a new, explicit suffix tree for a given character sequence.

protected static CharSequenceSuffixTree

create(CharSequence charSequence, boolean explicit)

Suffix tree builder factory method which creates a new suffix tree for a given character sequence.

protected CharSequenceSuffixTree.CharSequenceEdge

createEdge(SuffixTree.Node parentNode, SuffixTree.Node childNode, int start, int end)

Creates a new edge.

protected AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>.AbstractNode

createNode(int index)

Creates a new node.

CharSequence

getCharSequence()

Collection<? extends SuffixTree.Edge>

getEdges()

CharSequenceSuffixTree.CharSequenceNode

getNode(int nodeIndex)

Retrieves the identified node.

CharSequenceSuffixTree.CharSequenceNode

getRootNode()

Retrieves the root node of the tree.

protected void

removeEdge(CharSequenceSuffixTree.CharSequenceEdge edge)

Removes an edge from the tree.

protected CharSequenceSuffixTree.CharSequenceNode

splitEdge(CharSequenceSuffixTree.CharSequenceEdge edge, int length)

Splits an edge into two.

boolean

startsWith(CharSequence charSequence)

Compares a character sequence with the characters starting at the root node and continuing along child edges.

boolean

startsWith(CharSequence charSequence, int start, int end)

Compares part of a character sequence with the characters starting at the root node and continuing along child edges.

Methods inherited from class com.globalmentor.collections.AbstractSuffixTree
addEdge, addNode, getNodeCount, getNodes, isExplicit

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- getCharSequence
  
  public CharSequence getCharSequence()
  
  Returns:
  
  The character sequence represented by the suffix tree.
- getRootNode
  
  public CharSequenceSuffixTree.CharSequenceNode getRootNode()
  
  Description copied from interface: SuffixTree
  
  Retrieves the root node of the tree. This is a convenience method to retrieve the node with index zero.
  
  Specified by:
  
  getRootNode in interface SuffixTree
  
  Overrides:
  
  getRootNode in class AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>
  
  Returns:
  
  The identified node.
- getNode
  
  public CharSequenceSuffixTree.CharSequenceNode getNode(int nodeIndex)
  
  Description copied from interface: SuffixTree
  
  Retrieves the identified node.
  
  Specified by:
  
  getNode in interface SuffixTree
  
  Overrides:
  
  getNode in class AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>
  
  Parameters:
  
  nodeIndex - The index of the node to retrieve.
  
  Returns:
  
  The identified node.
- createNode
  
  protected AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>.AbstractNode createNode(int index)
  
  Description copied from class: AbstractSuffixTree
  
  Creates a new node. By default a node is considered a leaf node.
  
  Specified by:
  
  createNode in class AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>
  
  Parameters:
  
  index - The index of the node to create.
  
  Returns:
  
  The index of the newly created node.
- getEdges
  
  public Collection<? extends SuffixTree.Edge> getEdges()
  
  Returns:
  
  A read-only iterable of edges in the tree.
- createEdge
  
  protected CharSequenceSuffixTree.CharSequenceEdge createEdge(SuffixTree.Node parentNode, SuffixTree.Node childNode, int start, int end)
  
  Creates a new edge.
  
  Specified by:
  
  createEdge in class AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>
  
  Parameters:
  
  parentNode - The parent node representing the root end of the edge.
  
  childNode - The child node representing the leaf end of the edge.
  
  start - The position of the start element, inclusive.
  
  end - The position of the end element, exclusive.
  
  Returns:
  
  The tree after it receives a new edge.
  
  Throws:
  
  ClassCastException - if the given parent node and/or child node is not an instance of CharSequenceSuffixTree.CharSequenceNode.
- addEdge
  
  protected void addEdge(CharSequenceSuffixTree.CharSequenceEdge edge)
  
  Adds an edge to the tree.
  
  Specified by:
  
  addEdge in class AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>
  
  Parameters:
  
  edge - The edge to add.
  
  Throws:
  
  NullPointerException - if the given edge is null.
  
  IllegalStateException - if there already exists an edge with the same parent node and first character.
- removeEdge
  
  protected void removeEdge(CharSequenceSuffixTree.CharSequenceEdge edge)
  
  Removes an edge from the tree. If the edge does not exist, no action occurs.
  
  Specified by:
  
  removeEdge in class AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>
  
  Parameters:
  
  edge - The edge to remove.
  
  Throws:
  
  NullPointerException - if the given edge is null.
- splitEdge
  
  protected CharSequenceSuffixTree.CharSequenceNode splitEdge(CharSequenceSuffixTree.CharSequenceEdge edge, int length)
  
  Description copied from class: AbstractSuffixTree
  
  Splits an edge into two. The first, near edge will be of the given length; the second, far edge will be of the remaining length (that is, the length of the original edge minus the given length). A new node will be created as the mid-point between the original edge nodes, becoming the child node of the first edge and the parent node of the second edge.
  
  Overrides:
  
  splitEdge in class AbstractSuffixTree<CharSequenceSuffixTree.CharSequenceEdge>
  
  Parameters:
  
  edge - The edge to split.
  
  length - The position at which to split the edge.
  
  Returns:
  
  The created node splitting the edge.
- startsWith
  
  public boolean startsWith(CharSequence charSequence)
  
  Compares a character sequence with the characters starting at the root node and continuing along child edges.
  Parameters:
  
  charSequence - The character sequence to compare.
  
  Returns:
  
  true if there is a path matching the given character sequence starting at this node and continuing along child edges.
  
  See Also:
  
  CharSequenceSuffixTree.CharSequenceNode.startsWith(CharSequence)
  
  CharSequenceSuffixTree.CharSequenceEdge.startsWith(CharSequence)
- startsWith
  
  public boolean startsWith(CharSequence charSequence, int start, int end)
  
  Compares part of a character sequence with the characters starting at the root node and continuing along child edges.
  Parameters:
  
  charSequence - The character sequence to compare.
  
  start - The start of the character sequence to compare, inclusive.
  
  end - The end of the character sequence to compare, exclusive.
  
  Returns:
  
  true if there is a path matching the given character sequence starting at the root node and continuing along child edges.
  
  Throws:
  
  StringIndexOutOfBoundsException - if start or end are negative or greater than length(), or start is greater than end.
  
  See Also:
  
  CharSequenceSuffixTree.CharSequenceNode.startsWith(CharSequence, int, int)
  
  CharSequenceSuffixTree.CharSequenceEdge.startsWith(CharSequence, int, int)
- create
  
  public static CharSequenceSuffixTree create(CharSequence charSequence)
  
  Suffix tree builder factory method which creates a new, explicit suffix tree for a given character sequence. The created suffix tree will have one more leaf node than the number of characters in the sequence, because there will exist an empty edge from the root indicating the empty string.
  
  Parameters:
  
  charSequence - The character sequence for which a suffix tree should be built.
  
  Returns:
  
  The new suffix tree for the given character sequence.
  
  Throws:
  
  NullPointerException - if the given character sequence is null.
- create
  
  protected static CharSequenceSuffixTree create(CharSequence charSequence, boolean explicit)
  
  Suffix tree builder factory method which creates a new suffix tree for a given character sequence. If an explicit suffix tree is requested, the created suffix tree will have one more leaf node than the number of characters in the sequence, because there will exist an empty edge from the root indicating the empty string.
  
  Parameters:
  
  charSequence - The character sequence for which a suffix tree should be built.
  
  explicit - Whether an explicit suffix tree should be constructed.
  
  Returns:
  
  The new suffix tree for the given character sequence.
  
  Throws:
  
  NullPointerException - if the given character sequence is null.

Class CharSequenceSuffixTree

Nested Class Summary

Nested classes/interfaces inherited from class com.globalmentor.collections.AbstractSuffixTree

Nested classes/interfaces inherited from interface com.globalmentor.collections.SuffixTree

Method Summary

Methods inherited from class com.globalmentor.collections.AbstractSuffixTree

Methods inherited from class java.lang.Object

Method Details

getCharSequence

getRootNode

getNode

createNode

getEdges

createEdge

addEdge

removeEdge

splitEdge

startsWith

startsWith

create

create