Create a new NGrams instance.
Create a new NGrams instance.
the smallest size of the generated *-grams
the largest size of the generated *-grams, or -1 for the full length of the
input Seq[String]
a string separator used to join individual tokens
Transform a collection of sentences, where each row is a
Seq[String]
of the words / tokens, into a collection containing all the n-grams that can be constructed from each row. The feature representation is an n-hot encoding (see NHotEncoder) constructed from an expanded vocabulary of all of the generated n-grams.N-grams are generated based on a specified range of
low
tohigh
(inclusive) and are joined by the givensep
(default is " "). For example, withlow = 2
,high = 3
andsep = ""
, row["a", "b", "c", "d", "e"]
would produce["ab", "bc", "cd", "de", "abc", "bcd", "cde"]
.As with NHotEncoder, missing values are transformed to [0.0, 0.0, ...].