Package com.yahoo.language.process
Class GramSplitter
- java.lang.Object
-
- com.yahoo.language.process.GramSplitter
-
public class GramSplitter extends Object
A class which splits consecutive word character sequences into overlapping character n-grams. For example "en gul bille sang" split into 2-grams becomes "en gu ul bi il ll le sa an ng", and split into 3-grams becomes "en gul bil ill lle san ang".This class is multithread safe.
- Author:
- bratseth
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
GramSplitter.Gram
An immutable start index and length pairstatic class
GramSplitter.GramSplitterIterator
-
Constructor Summary
Constructors Constructor Description GramSplitter(CharacterClasses characterClasses)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description GramSplitter.GramSplitterIterator
split(String input, int n)
Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.
-
-
-
Constructor Detail
-
GramSplitter
public GramSplitter(CharacterClasses characterClasses)
-
-
Method Detail
-
split
public GramSplitter.GramSplitterIterator split(String input, int n)
Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.The iterator is implemented as a sliding view over the input string rather than being backed by a list, which makes this space efficient for large strings.
- Parameters:
input
- the input string to be split, cannot be nulln
- the gram size, a positive integer- Returns:
- a read only iterator over the resulting grams
- Throws:
NullPointerException
- if input==nullIllegalArgumentException
- if n is less than 1
-
-