-
Determines the class of a given character.
A class which splits consecutive word character sequences into overlapping character n-grams.
This interface provides NFKC normalization of Strings through the underlying linguistics library.
Interface providing segmentation, i.e.
Interface providing stemming of single words.
Language-sensitive tokenization of a text string.
Interface for providers of text transformations such as accent removal.
-
Determines the class of a given character.
An embedder converts a text string to a tensor
An immutable start index and length pair
Interface providing segmentation, i.e.
An immutable list of special tokens - strings which should override the normal tokenizer semantics
and be tokenized into a single token.
An immutable special token
A list of strings which does not allow for duplicate elements.
Interface providing stemming of single words.
An enum of the stemming modes which can be requested.
A single token produced by the tokenizer.
Language-sensitive tokenization of a text string.
List of token scripts (e.g.
An enumeration of token types.
-
Determines the class of a given character.
A class which splits consecutive word character sequences into overlapping character n-grams.
This interface provides NFKC normalization of Strings through the underlying linguistics library.
Interface providing segmentation, i.e.
Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics
and be tokenized into a single token.
Interface providing stemming of single words.
An enum of the stemming modes which can be requested.
A single token produced by the tokenizer.
Language-sensitive tokenization of a text string.
List of token scripts (e.g.
An enumeration of token types.
Interface for providers of text transformations such as accent removal.