Class GramSplitter


  • public class GramSplitter
    extends java.lang.Object
    A class which splits consecutive word character sequences into overlapping character n-grams. For example "en gul bille sang" split into 2-grams becomes "en gu ul bi il ll le sa an ng", and split into 3-grams becomes "en gul bil ill lle san ang".

    This class is multithread safe.

    Author:
    bratseth
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      GramSplitter.GramSplitterIterator split​(java.lang.String input, int n)
      Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

    • Method Detail

      • split

        public GramSplitter.GramSplitterIterator split​(java.lang.String input,
                                                       int n)
        Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.

        The iterator is implemented as a sliding view over the input string rather than being backed by a list, which makes this space efficient for large strings.

        Parameters:
        input - the input string to be split, cannot be null
        n - the gram size, a positive integer
        Returns:
        a read only iterator over the resulting grams
        Throws:
        java.lang.NullPointerException - if input==null
        java.lang.IllegalArgumentException - if n is less than 1