Package

com.johnsnowlabs.nlp

annotators

Permalink

package annotators

Visibility
  1. Public
  2. All

Type Members

  1. class ChunkTokenizer extends Tokenizer

    Permalink

  2. class ChunkTokenizerModel extends TokenizerModel

    Permalink

  3. class Chunker extends AnnotatorModel[Chunker]

    Permalink

    This annotator matches a pattern of part-of-speech tags in order to return meaningful phrases from document

    This annotator matches a pattern of part-of-speech tags in order to return meaningful phrases from document

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/ChunkerTestSpec.scala for reference on how to use this API.

  4. class DateMatcher extends AnnotatorModel[DateMatcher] with DateMatcherUtils

    Permalink

    Matches standard date formats into a provided format Reads from different forms of date and time expressions and converts them to a provided date format.

    Matches standard date formats into a provided format Reads from different forms of date and time expressions and converts them to a provided date format. Extracts only ONE date per sentence. Use with sentence detector for more matches.

    Reads the following kind of dates:

    1978-01-28, 1984/04/02,1/02/1980, 2/28/79, The 31st of April in the year 2008, "Fri, 21 Nov 1997" , "Jan 21, ‘97" , Sun, Nov 21, jan 1st, next thursday, last wednesday, today, tomorrow, yesterday, next week, next month, next year, day after, the day before, 0600h, 06:00 hours, 6pm, 5:30 a.m., at 5, 12:59, 23:59, 1988/11/23 6pm, next week at 7.30, 5 am tomorrow

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/DateMatcherTestSpec.scala for further reference on how to use this API

  5. trait DateMatcherUtils extends Params

    Permalink
  6. class Lemmatizer extends AnnotatorApproach[LemmatizerModel]

    Permalink

    Class to find standarized lemmas from words.

    Class to find standarized lemmas from words. Uses a user-provided or default dictionary.

    Retrieves lemmas out of words with the objective of returning a base dictionary word. Retrieves the significant part of a word.

    lemmaDict: A dictionary of predefined lemmas must be provided

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/LemmatizerTestSpec.scala for examples of how to use this API

  7. class LemmatizerModel extends AnnotatorModel[LemmatizerModel]

    Permalink

    Class to find standarized lemmas from words.

    Class to find standarized lemmas from words. Uses a user-provided or default dictionary.

    Retrieves lemmas out of words with the objective of returning a base dictionary word. Retrieves the significant part of a word

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/LemmatizerTestSpec.scala for examples of how to use this API

  8. class MultiDateMatcher extends AnnotatorModel[MultiDateMatcher] with DateMatcherUtils

    Permalink

    Matches standard date formats into a provided format

  9. class NGramGenerator extends AnnotatorModel[NGramGenerator]

    Permalink

    A feature transformer that converts the input array of strings (annotatorType TOKEN) into an array of n-grams (annotatorType CHUNK).

    A feature transformer that converts the input array of strings (annotatorType TOKEN) into an array of n-grams (annotatorType CHUNK). Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.

    When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned.

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/NGramGeneratorTestSpec.scala for reference on how to use this API.

  10. class Normalizer extends AnnotatorApproach[NormalizerModel]

    Permalink

    Annotator that cleans out tokens.

    Annotator that cleans out tokens. Requires stems, hence tokens. Removes all dirty characters from text following a regex pattern and transforms words based on a provided dictionary

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/NormalizerTestSpec.scala for examples on how to use the API

  11. class NormalizerModel extends AnnotatorModel[NormalizerModel]

    Permalink

    Annotator that cleans out tokens.

    Annotator that cleans out tokens. Requires stems, hence tokens.

    Removes all dirty characters from text following a regex pattern and transforms words based on a provided dictionary

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/NormalizerTestSpec.scala for examples on how to use the API

  12. trait ReadablePretrainedLemmatizer extends ParamsAndFeaturesReadable[LemmatizerModel] with HasPretrained[LemmatizerModel]

    Permalink
  13. trait ReadablePretrainedTextMatcher extends ParamsAndFeaturesReadable[TextMatcherModel] with HasPretrained[TextMatcherModel]

    Permalink
  14. trait ReadablePretrainedTokenizer extends ParamsAndFeaturesReadable[TokenizerModel] with HasPretrained[TokenizerModel]

    Permalink
  15. class RecursiveTokenizer extends AnnotatorApproach[RecursiveTokenizerModel] with ParamsAndFeaturesWritable

    Permalink

  16. class RecursiveTokenizerModel extends AnnotatorModel[RecursiveTokenizerModel] with ParamsAndFeaturesWritable

    Permalink

  17. class RegexMatcher extends AnnotatorApproach[RegexMatcherModel]

    Permalink

    Uses a reference file to match a set of regular expressions and put them inside a provided key.

    Uses a reference file to match a set of regular expressions and put them inside a provided key. File must be comma separated.

    Matches regular expressions and maps them to specified values optionally provided

    Rules are provided from external source file

  18. class RegexMatcherModel extends AnnotatorModel[RegexMatcherModel]

    Permalink

    Matches regular expressions and maps them to specified values optionally provided Rules are provided from external source file

  19. class Stemmer extends AnnotatorModel[Stemmer]

    Permalink

    Hard stemming of words for cut-of into standard word references.

    Hard stemming of words for cut-of into standard word references. See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/StemmerTestSpec.scala for examples of how to use this API

  20. class StopWordsCleaner extends AnnotatorModel[StopWordsCleaner]

    Permalink

    This annotator excludes from a sequence of strings (e.g.

    This annotator excludes from a sequence of strings (e.g. the output of a Tokenizer, Normalizer, Lemmatizer, and Stemmer) and drops all the stop words from the input sequences.

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/StopWordsCleanerTestSpec.scala for example of how to use this API.

  21. class TextMatcher extends AnnotatorApproach[TextMatcherModel] with ParamsAndFeaturesWritable

    Permalink

    Annotator to match entire phrases (by token) provided in a file against a Document

    Annotator to match entire phrases (by token) provided in a file against a Document

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/TextMatcherTestSpec.scala for reference on how to use this API

  22. class TextMatcherModel extends AnnotatorModel[TextMatcherModel]

    Permalink

    Extracts entities out of provided phrases

  23. class Token2Chunk extends AnnotatorModel[Token2Chunk]

    Permalink

  24. class Tokenizer extends AnnotatorApproach[TokenizerModel]

    Permalink

    Tokenizes raw text in document type columns into TokenizedSentence .

    Tokenizes raw text in document type columns into TokenizedSentence .

    This class represents a non fitted tokenizer. Fitting it will cause the internal RuleFactory to construct the rules for tokenizing from the input configuration.

    Identifies tokens with tokenization open standards. A few rules will help customizing it if defaults do not fit user needs.

    See Tokenizer test class for examples examples of usage.

  25. class TokenizerModel extends AnnotatorModel[TokenizerModel]

    Permalink

    Tokenizes raw text into word pieces, tokens.

    Tokenizes raw text into word pieces, tokens. Identifies tokens with tokenization open standards. A few rules will help customizing it if defaults do not fit user needs.

    This class represents an already fitted Tokenizer model.

    See Tokenizer test class for examples examples of usage.

Value Members

  1. object ChunkTokenizer extends DefaultParamsReadable[ChunkTokenizer] with Serializable

    Permalink
  2. object ChunkTokenizerModel extends ParamsAndFeaturesReadable[ChunkTokenizerModel] with Serializable

    Permalink
  3. object Chunker extends DefaultParamsReadable[Chunker] with Serializable

    Permalink
  4. object DateMatcher extends DefaultParamsReadable[DateMatcher] with Serializable

    Permalink
  5. object EnglishStemmer

    Permalink
  6. object Lemmatizer extends DefaultParamsReadable[Lemmatizer] with Serializable

    Permalink
  7. object LemmatizerModel extends ReadablePretrainedLemmatizer with Serializable

    Permalink
  8. object MultiDateMatcher extends DefaultParamsReadable[MultiDateMatcher] with Serializable

    Permalink
  9. object NGramGenerator extends ParamsAndFeaturesReadable[NGramGenerator] with Serializable

    Permalink
  10. object Normalizer extends DefaultParamsReadable[Normalizer] with Serializable

    Permalink
  11. object NormalizerModel extends ParamsAndFeaturesReadable[NormalizerModel] with Serializable

    Permalink
  12. object RegexMatcher extends DefaultParamsReadable[RegexMatcher] with Serializable

    Permalink
  13. object RegexMatcherModel extends ParamsAndFeaturesReadable[RegexMatcherModel] with Serializable

    Permalink
  14. object Stemmer extends DefaultParamsReadable[Stemmer] with Serializable

    Permalink
  15. object StopWordsCleaner extends ParamsAndFeaturesReadable[StopWordsCleaner] with Serializable

    Permalink
  16. object TextMatcher extends DefaultParamsReadable[TextMatcher] with Serializable

    Permalink
  17. object TextMatcherModel extends ReadablePretrainedTextMatcher with Serializable

    Permalink
  18. object Token2Chunk extends DefaultParamsReadable[Token2Chunk] with Serializable

    Permalink
  19. object Tokenizer extends DefaultParamsReadable[Tokenizer] with Serializable

    Permalink
  20. object TokenizerModel extends ReadablePretrainedTokenizer with Serializable

    Permalink
  21. package btm

    Permalink
  22. package classifier

    Permalink
  23. package common

    Permalink
  24. package ld

    Permalink
  25. package ner

    Permalink
  26. package param

    Permalink
  27. package parser

    Permalink
  28. package pos

    Permalink
  29. package sbd

    Permalink
  30. package sda

    Permalink
  31. package spell

    Permalink

Ungrouped