Package com.yahoo.language.process
Class GramSplitter
java.lang.Object
com.yahoo.language.process.GramSplitter
A class which splits consecutive word character sequences into overlapping character n-grams.
For example "en gul bille sang" split into 2-grams becomes
"en gu ul bi il ll le sa an ng", and split into 3-grams becomes "en gul bil ill lle san ang".
This class is multithread safe.
- Author:
- bratseth
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
An immutable start index and length pairstatic class
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionSplits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.
-
Constructor Details
-
GramSplitter
-
-
Method Details
-
split
Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.The iterator is implemented as a sliding view over the input string rather than being backed by a list, which makes this space efficient for large strings.
- Parameters:
input
- the input string to be split, cannot be nulln
- the gram size, a positive integer- Returns:
- a read only iterator over the resulting grams
- Throws:
NullPointerException
- if input==nullIllegalArgumentException
- if n is less than 1
-