org.carrot2.text.suffixtree
Class SuffixTree

java.lang.Object
  extended by org.carrot2.text.suffixtree.SuffixTree

public final class SuffixTree
extends Object

Builds a suffix tree (or generalized suffix tree) on a sequence of any integers (or objects that can be represented as unique integers). A direct implementation of Esko Ukkonen's algorithm, but optimized for Java to use primitive data types instead of objects (or boxed types).

See Also:
"E. Ukkonen, On-line construction of suffix trees, Algorithmica, 1995, volume 14, number 3, pages 249-260."

Nested Class Summary
static interface SuffixTree.IProgressCallback
          Progress callback is invoked when iterating forward through the input sequence elements.
static interface SuffixTree.IStateCallback
          A callback invoked when new states are added to the tree.
static interface SuffixTree.IVisitor
          Visitor interface for traversals.
static class SuffixTree.VisitorAdapter
          Empty implementation recursively walking the entire suffix tree.
 
Field Summary
static int NO_EDGE
          Marker for the state's last edge in transitions.
 
Constructor Summary
SuffixTree(ISequence sequence, SuffixTree.IStateCallback newStateCallback, SuffixTree.IProgressCallback progressCallback)
          Build a suffix tree for a given input sequence of symbols.
 
Method Summary
 boolean containsSuffix(ISequence seq)
           
 int findEdge(int state, int symbol)
          Find a transition from state state, labeled with a given symbol.
 int firstEdge(int state)
          Returns the index of the first edge from a given state or NO_EDGE if a given state has no edges.
 int getEndIndex(int edge)
          Returns the edge label's end index (inclusive).
 int getRootState()
          For procedural traversals (not visitors).
 int getStartIndex(int edge)
          Returns the edge label's start index (inclusive).
 int getStatesCount()
           
 int getToState(int edge)
          Returns the target state for a given edge.
 int getTransitionsCount()
           
 boolean isLeaf(int state)
          Check if state is a leaf (has no outgoing edges).
 int nextEdge(int edge)
          Returns the index of the next edge (sibling) or NO_EDGE if edge is the last edge in its state.
 void visit(SuffixTree.IVisitor visitor)
          Walks the states and edges of the suffix tree, depth-first.
 void visitState(int state, SuffixTree.IVisitor visitor)
          Start visiting from a given state.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NO_EDGE

public static final int NO_EDGE
Marker for the state's last edge in transitions.

See Also:
Constant Field Values
Constructor Detail

SuffixTree

public SuffixTree(ISequence sequence,
                  SuffixTree.IStateCallback newStateCallback,
                  SuffixTree.IProgressCallback progressCallback)
Build a suffix tree for a given input sequence of symbols.

Method Detail

getTransitionsCount

public final int getTransitionsCount()
Returns:
Return the number of transitions (edges) in the tree.

getStatesCount

public final int getStatesCount()
Returns:
Return the number of states in the tree.

containsSuffix

public boolean containsSuffix(ISequence seq)
Returns:
true if this suffix tree has a path from the root state to a leaf state corresponding to a given sequence of objects. This indicates the input sequence had a suffix identical to sequence.

visit

public final void visit(SuffixTree.IVisitor visitor)
Walks the states and edges of the suffix tree, depth-first.


visitState

public final void visitState(int state,
                             SuffixTree.IVisitor visitor)
Start visiting from a given state.


getRootState

public int getRootState()
For procedural traversals (not visitors).


isLeaf

public final boolean isLeaf(int state)
Check if state is a leaf (has no outgoing edges).


firstEdge

public final int firstEdge(int state)
Returns the index of the first edge from a given state or NO_EDGE if a given state has no edges. Does not perform any sanity check on the input state.


nextEdge

public final int nextEdge(int edge)
Returns the index of the next edge (sibling) or NO_EDGE if edge is the last edge in its state.


findEdge

public final int findEdge(int state,
                          int symbol)
Find a transition from state state, labeled with a given symbol. NO_EDGE is returned if there is no such edge.


getToState

public int getToState(int edge)
Returns the target state for a given edge.


getStartIndex

public int getStartIndex(int edge)
Returns the edge label's start index (inclusive).


getEndIndex

public int getEndIndex(int edge)
Returns the edge label's end index (inclusive).



Copyright (c) Dawid Weiss, Stanislaw Osinski