Filters out tokens composed of fewer than minLength characters.
Filter that removes rare word that occur in fewer than threshold documents Syntax: new RemoveRareWords(10) apply (data)
Filter that removes stop words.
An enumeration over token types (see inner objects to TokenType companion object) based on regex patterns originally defined by Steven Bethard.
A generic (loadable) transformation of a tokenized input text.
A filter that only accepts word and number tokens.