annotators

Type Members

class ChunkTokenizer extends Tokenizer
class ChunkTokenizerModel extends TokenizerModel
class Chunker extends AnnotatorModel[Chunker]

This annotator matches a pattern of part-of-speech tags in order to return meaningful phrases from document
This annotator matches a pattern of part-of-speech tags in order to return meaningful phrases from document
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/ChunkerTestSpec.scala for reference on how to use this API.
class DateMatcher extends AnnotatorModel[DateMatcher] with DateMatcherUtils

Matches standard date formats into a provided format Reads from different forms of date and time expressions and converts them to a provided date format.
Matches standard date formats into a provided format Reads from different forms of date and time expressions and converts them to a provided date format. Extracts only ONE date per sentence. Use with sentence detector for more matches.
Reads the following kind of dates:
1978-01-28, 1984/04/02,1/02/1980, 2/28/79, The 31st of April in the year 2008, "Fri, 21 Nov 1997" , "Jan 21, ‘97" , Sun, Nov 21, jan 1st, next thursday, last wednesday, today, tomorrow, yesterday, next week, next month, next year, day after, the day before, 0600h, 06:00 hours, 6pm, 5:30 a.m., at 5, 12:59, 23:59, 1988/11/23 6pm, next week at 7.30, 5 am tomorrow
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/DateMatcherTestSpec.scala for further reference on how to use this API
trait DateMatcherUtils extends Params
class DocumentNormalizer extends AnnotatorModel[DocumentNormalizer]

Annotator which normalizes raw text from tagged text, e.g.
Annotator which normalizes raw text from tagged text, e.g. scraped web pages or xml documents, from document type columns into Sentence. Removes all dirty characters from text following one or more input regex patterns. Can apply not wanted character removal with a specific policy. Can apply lower case normalization.
See DocumentNormalizer test class for examples examples of usage.
class Lemmatizer extends AnnotatorApproach[LemmatizerModel]

Class to find standarized lemmas from words.
Class to find standarized lemmas from words. Uses a user-provided or default dictionary.
Retrieves lemmas out of words with the objective of returning a base dictionary word. Retrieves the significant part of a word.
lemmaDict: A dictionary of predefined lemmas must be provided
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/LemmatizerTestSpec.scala for examples of how to use this API
class LemmatizerModel extends AnnotatorModel[LemmatizerModel]

Class to find standarized lemmas from words.
Class to find standarized lemmas from words. Uses a user-provided or default dictionary.
Retrieves lemmas out of words with the objective of returning a base dictionary word. Retrieves the significant part of a word
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/LemmatizerTestSpec.scala for examples of how to use this API
class MultiDateMatcher extends AnnotatorModel[MultiDateMatcher] with DateMatcherUtils

Matches standard date formats into a provided format
class NGramGenerator extends AnnotatorModel[NGramGenerator]

A feature transformer that converts the input array of strings (annotatorType TOKEN) into an array of n-grams (annotatorType CHUNK).
A feature transformer that converts the input array of strings (annotatorType TOKEN) into an array of n-grams (annotatorType CHUNK). Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.
When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned.
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/NGramGeneratorTestSpec.scala for reference on how to use this API.
class Normalizer extends AnnotatorApproach[NormalizerModel]

Annotator that cleans out tokens.
Annotator that cleans out tokens. Requires stems, hence tokens. Removes all dirty characters from text following a regex pattern and transforms words based on a provided dictionary
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/NormalizerTestSpec.scala for examples on how to use the API
class NormalizerModel extends AnnotatorModel[NormalizerModel]

Annotator that cleans out tokens.
Annotator that cleans out tokens. Requires stems, hence tokens.
Removes all dirty characters from text following a regex pattern and transforms words based on a provided dictionary
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/NormalizerTestSpec.scala for examples on how to use the API
trait ReadablePretrainedLemmatizer extends ParamsAndFeaturesReadable[LemmatizerModel] with HasPretrained[LemmatizerModel]
trait ReadablePretrainedStopWordsCleanerModel extends ParamsAndFeaturesReadable[StopWordsCleaner] with HasPretrained[StopWordsCleaner]
trait ReadablePretrainedTextMatcher extends ParamsAndFeaturesReadable[TextMatcherModel] with HasPretrained[TextMatcherModel]
trait ReadablePretrainedTokenizer extends ParamsAndFeaturesReadable[TokenizerModel] with HasPretrained[TokenizerModel]
class RecursiveTokenizer extends AnnotatorApproach[RecursiveTokenizerModel] with ParamsAndFeaturesWritable
class RecursiveTokenizerModel extends AnnotatorModel[RecursiveTokenizerModel] with ParamsAndFeaturesWritable
class RegexMatcher extends AnnotatorApproach[RegexMatcherModel]

Uses a reference file to match a set of regular expressions and put them inside a provided key.
Uses a reference file to match a set of regular expressions and put them inside a provided key. File must be comma separated.
Matches regular expressions and maps them to specified values optionally provided
Rules are provided from external source file
class RegexMatcherModel extends AnnotatorModel[RegexMatcherModel]

Matches regular expressions and maps them to specified values optionally provided Rules are provided from external source file
class RegexTokenizer extends AnnotatorModel[RegexTokenizer]

A tokenizer that splits text by regex pattern.
A tokenizer that splits text by regex pattern.

See also
RegexTokenizer
class Stemmer extends AnnotatorModel[Stemmer]

Hard stemming of words for cut-of into standard word references.
Hard stemming of words for cut-of into standard word references. See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/StemmerTestSpec.scala for examples of how to use this API
class StopWordsCleaner extends AnnotatorModel[StopWordsCleaner]

This annotator excludes from a sequence of strings (e.g.
This annotator excludes from a sequence of strings (e.g. the output of a Tokenizer, Normalizer, Lemmatizer, and Stemmer) and drops all the stop words from the input sequences.
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/StopWordsCleanerTestSpec.scala for example of how to use this API.
class TextMatcher extends AnnotatorApproach[TextMatcherModel] with ParamsAndFeaturesWritable

Annotator to match entire phrases (by token) provided in a file against a Document
Annotator to match entire phrases (by token) provided in a file against a Document
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/TextMatcherTestSpec.scala for reference on how to use this API
class TextMatcherModel extends AnnotatorModel[TextMatcherModel]

Extracts entities out of provided phrases
class Token2Chunk extends AnnotatorModel[Token2Chunk]
class Tokenizer extends AnnotatorApproach[TokenizerModel]

Tokenizes raw text in document type columns into TokenizedSentence .
Tokenizes raw text in document type columns into TokenizedSentence .
This class represents a non fitted tokenizer. Fitting it will cause the internal RuleFactory to construct the rules for tokenizing from the input configuration.
Identifies tokens with tokenization open standards. A few rules will help customizing it if defaults do not fit user needs.
See Tokenizer test class for examples examples of usage.
class TokenizerModel extends AnnotatorModel[TokenizerModel]

Tokenizes raw text into word pieces, tokens.
Tokenizes raw text into word pieces, tokens. Identifies tokens with tokenization open standards. A few rules will help customizing it if defaults do not fit user needs.
This class represents an already fitted Tokenizer model.
See Tokenizer test class for examples examples of usage.

Value Members

object ChunkTokenizer extends DefaultParamsReadable[ChunkTokenizer] with Serializable
object ChunkTokenizerModel extends ParamsAndFeaturesReadable[ChunkTokenizerModel] with Serializable
object Chunker extends DefaultParamsReadable[Chunker] with Serializable
object DateMatcher extends DefaultParamsReadable[DateMatcher] with Serializable
object DocumentNormalizer extends DefaultParamsReadable[DocumentNormalizer] with Serializable
object EnglishStemmer
object Lemmatizer extends DefaultParamsReadable[Lemmatizer] with Serializable
object LemmatizerModel extends ReadablePretrainedLemmatizer with Serializable
object MultiDateMatcher extends DefaultParamsReadable[MultiDateMatcher] with Serializable
object NGramGenerator extends ParamsAndFeaturesReadable[NGramGenerator] with Serializable
object Normalizer extends DefaultParamsReadable[Normalizer] with Serializable
object NormalizerModel extends ParamsAndFeaturesReadable[NormalizerModel] with Serializable
object RegexMatcher extends DefaultParamsReadable[RegexMatcher] with Serializable
object RegexMatcherModel extends ParamsAndFeaturesReadable[RegexMatcherModel] with Serializable
object Stemmer extends DefaultParamsReadable[Stemmer] with Serializable
object StopWordsCleaner extends ParamsAndFeaturesReadable[StopWordsCleaner] with ReadablePretrainedStopWordsCleanerModel with Serializable
object TextMatcher extends DefaultParamsReadable[TextMatcher] with Serializable
object TextMatcherModel extends ReadablePretrainedTextMatcher with Serializable
object Token2Chunk extends DefaultParamsReadable[Token2Chunk] with Serializable
object Tokenizer extends DefaultParamsReadable[Tokenizer] with Serializable
object TokenizerModel extends ReadablePretrainedTokenizer with Serializable
package btm
package classifier
package common
package keyword
package ld
package ner
package param
package parser
package pos
package sbd
package sda
package sentence_detector_dl
package seq2seq
package spell
package ws

package annotators

Type Members

class ChunkTokenizer extends Tokenizer

class ChunkTokenizerModel extends TokenizerModel

class Chunker extends AnnotatorModel[Chunker]

class DateMatcher extends AnnotatorModel[DateMatcher] with DateMatcherUtils

trait DateMatcherUtils extends Params

class DocumentNormalizer extends AnnotatorModel[DocumentNormalizer]

class Lemmatizer extends AnnotatorApproach[LemmatizerModel]

class LemmatizerModel extends AnnotatorModel[LemmatizerModel]

class MultiDateMatcher extends AnnotatorModel[MultiDateMatcher] with DateMatcherUtils

class NGramGenerator extends AnnotatorModel[NGramGenerator]

class Normalizer extends AnnotatorApproach[NormalizerModel]

class NormalizerModel extends AnnotatorModel[NormalizerModel]

trait ReadablePretrainedLemmatizer extends ParamsAndFeaturesReadable[LemmatizerModel] with HasPretrained[LemmatizerModel]

trait ReadablePretrainedStopWordsCleanerModel extends ParamsAndFeaturesReadable[StopWordsCleaner] with HasPretrained[StopWordsCleaner]

trait ReadablePretrainedTextMatcher extends ParamsAndFeaturesReadable[TextMatcherModel] with HasPretrained[TextMatcherModel]

trait ReadablePretrainedTokenizer extends ParamsAndFeaturesReadable[TokenizerModel] with HasPretrained[TokenizerModel]

class RecursiveTokenizer extends AnnotatorApproach[RecursiveTokenizerModel] with ParamsAndFeaturesWritable

class RecursiveTokenizerModel extends AnnotatorModel[RecursiveTokenizerModel] with ParamsAndFeaturesWritable

class RegexMatcher extends AnnotatorApproach[RegexMatcherModel]

class RegexMatcherModel extends AnnotatorModel[RegexMatcherModel]

class RegexTokenizer extends AnnotatorModel[RegexTokenizer]

class Stemmer extends AnnotatorModel[Stemmer]

class StopWordsCleaner extends AnnotatorModel[StopWordsCleaner]

class TextMatcher extends AnnotatorApproach[TextMatcherModel] with ParamsAndFeaturesWritable

class TextMatcherModel extends AnnotatorModel[TextMatcherModel]

class Token2Chunk extends AnnotatorModel[Token2Chunk]

class Tokenizer extends AnnotatorApproach[TokenizerModel]

class TokenizerModel extends AnnotatorModel[TokenizerModel]

Value Members

object ChunkTokenizer extends DefaultParamsReadable[ChunkTokenizer] with Serializable

object ChunkTokenizerModel extends ParamsAndFeaturesReadable[ChunkTokenizerModel] with Serializable

object Chunker extends DefaultParamsReadable[Chunker] with Serializable

object DateMatcher extends DefaultParamsReadable[DateMatcher] with Serializable

object DocumentNormalizer extends DefaultParamsReadable[DocumentNormalizer] with Serializable

object EnglishStemmer

object Lemmatizer extends DefaultParamsReadable[Lemmatizer] with Serializable

object LemmatizerModel extends ReadablePretrainedLemmatizer with Serializable

object MultiDateMatcher extends DefaultParamsReadable[MultiDateMatcher] with Serializable

object NGramGenerator extends ParamsAndFeaturesReadable[NGramGenerator] with Serializable

object Normalizer extends DefaultParamsReadable[Normalizer] with Serializable

object NormalizerModel extends ParamsAndFeaturesReadable[NormalizerModel] with Serializable

object RegexMatcher extends DefaultParamsReadable[RegexMatcher] with Serializable

object RegexMatcherModel extends ParamsAndFeaturesReadable[RegexMatcherModel] with Serializable

object Stemmer extends DefaultParamsReadable[Stemmer] with Serializable

object StopWordsCleaner extends ParamsAndFeaturesReadable[StopWordsCleaner] with ReadablePretrainedStopWordsCleanerModel with Serializable

object TextMatcher extends DefaultParamsReadable[TextMatcher] with Serializable

object TextMatcherModel extends ReadablePretrainedTextMatcher with Serializable

object Token2Chunk extends DefaultParamsReadable[Token2Chunk] with Serializable

object Tokenizer extends DefaultParamsReadable[Tokenizer] with Serializable

object TokenizerModel extends ReadablePretrainedTokenizer with Serializable

package btm

package classifier

package common

package keyword

package ld

package ner

package param

package parser

package pos

package sbd

package sda

package sentence_detector_dl

package seq2seq

package spell

package ws

Ungrouped