Class GramSplitter

java.lang.Object
com.yahoo.language.process.GramSplitter

public class GramSplitter extends Object
A class which splits consecutive word character sequences into overlapping character n-grams. For example "en gul bille sang" split into 2-grams becomes "en gu ul bi il ll le sa an ng", and split into 3-grams becomes "en gul bil ill lle san ang".

This class is multithread safe.

Author:
bratseth
  • Constructor Details

  • Method Details

    • split

      public GramSplitter.GramSplitterIterator split(String input, int n)
      Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.

      The iterator is implemented as a sliding view over the input string rather than being backed by a list, which makes this space efficient for large strings.

      Parameters:
      input - the input string to be split, cannot be null
      n - the gram size, a positive integer
      Returns:
      a read only iterator over the resulting grams
      Throws:
      NullPointerException - if input==null
      IllegalArgumentException - if n is less than 1