dl - Spark NLP 3.4.4 ScalaDoc - com.johnsnowlabs.nlp.annotators.classifier.dl

Type Members

class AlbertForSequenceClassification extends AnnotatorModel[AlbertForSequenceClassification] with HasBatchedAnnotate[AlbertForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

AlbertForSequenceClassification can load ALBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
AlbertForSequenceClassification can load ALBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = AlbertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "albert_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the AlbertForSequenceClassification.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = AlbertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
AlbertForSequenceClassification for sequence-level classification

class AlbertForTokenClassification extends AnnotatorModel[AlbertForTokenClassification] with HasBatchedAnnotate[AlbertForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties

AlbertForTokenClassification can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

AlbertForTokenClassification can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = AlbertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "albert_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

and the AlbertForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = AlbertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
AlbertForTokenClassification for token-level classification

class BertForSequenceClassification extends AnnotatorModel[BertForSequenceClassification] with HasBatchedAnnotate[BertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

BertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
BertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = BertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "bert_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the BertForSequenceClassificationTestSpec.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = BertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
BertForSequenceClassification for sequnece-level classification

class BertForTokenClassification extends AnnotatorModel[BertForTokenClassification] with HasBatchedAnnotate[BertForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

BertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

BertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = BertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "bert_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the BertForTokenClassificationTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = BertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
BertForTokenClassification for token-level classification

class ClassifierDLApproach extends AnnotatorApproach[ClassifierDLModel] with ParamsAndFeaturesWritable

Trains a ClassifierDL for generic Multi-class Text Classification.

ClassifierDL uses the state-of-the-art Universal Sentence Encoder as an input for text classifications. The ClassifierDL annotator uses a deep learning model (DNNs) we have built inside TensorFlow and supports up to 100 classes.

For instantiated/pretrained models, see ClassifierDLModel.

Notes:

This annotator accepts a label column of a single item in either type of String, Int, Float, or Double.
UniversalSentenceEncoder, BertSentenceEmbeddings, or SentenceEmbeddings can be used for the inputCol.

For extended examples of usage, see the Spark NLP Workshop [1] [2] and the ClassifierDLTestSpec.

Example

In this example, the training data "sentiment.csv" has the form of

text,label
This movie is the best movie I have wached ever! In my opinion this movie can win an award.,0
This was a terrible movie! The acting was bad really bad!,1
...

Then traning can be done like so:

import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder
import com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach
import org.apache.spark.ml.Pipeline

val smallCorpus = spark.read.option("header","true").csv("src/test/resources/classifier/sentiment.csv")

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val useEmbeddings = UniversalSentenceEncoder.pretrained()
  .setInputCols("document")
  .setOutputCol("sentence_embeddings")

val docClassifier = new ClassifierDLApproach()
  .setInputCols("sentence_embeddings")
  .setOutputCol("category")
  .setLabelColumn("label")
  .setBatchSize(64)
  .setMaxEpochs(20)
  .setLr(5e-3f)
  .setDropout(0.5f)

val pipeline = new Pipeline()
  .setStages(
    Array(
      documentAssembler,
      useEmbeddings,
      docClassifier
    )
  )

val pipelineModel = pipeline.fit(smallCorpus)

See also: SentimentDLApproach for sentiment analysis
MultiClassifierDLApproach for multi-class classification

class ClassifierDLModel extends AnnotatorModel[ClassifierDLModel] with HasSimpleAnnotate[ClassifierDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

ClassifierDL for generic Multi-class Text Classification.

This is the instantiated model of the ClassifierDLApproach. For training your own model, please see the documentation of that class.

Pretrained models can be loaded with pretrained of the companion object:

val classifierDL = ClassifierDLModel.pretrained()
  .setInputCols("sentence_embeddings")
  .setOutputCol("classification")

The default model is "classifierdl_use_trec6", if no name is provided. It uses embeddings from the UniversalSentenceEncoder and is trained on the TREC-6 dataset. For available pretrained models please see the Models Hub.

For extended examples of usage, see the Spark NLP Workshop and the ClassifierDLTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.SentenceDetector
import com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLModel
import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentence = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val useEmbeddings = UniversalSentenceEncoder.pretrained()
  .setInputCols("document")
  .setOutputCol("sentence_embeddings")

val sarcasmDL = ClassifierDLModel.pretrained("classifierdl_use_sarcasm")
  .setInputCols("sentence_embeddings")
  .setOutputCol("sarcasm")

val pipeline = new Pipeline()
  .setStages(Array(
    documentAssembler,
    sentence,
    useEmbeddings,
    sarcasmDL
  ))

val data = Seq(
  "I'm ready!",
  "If I could put into words how much I love waking up at 6 am on Mondays I would."
).toDF("text")
val result = pipeline.fit(data).transform(data)

result.selectExpr("explode(arrays_zip(sentence, sarcasm)) as out")
  .selectExpr("out.sentence.result as sentence", "out.sarcasm.result as sarcasm")
  .show(false)
+-------------------------------------------------------------------------------+-------+
|sentence                                                                       |sarcasm|
+-------------------------------------------------------------------------------+-------+
|I'm ready!                                                                     |normal |
|If I could put into words how much I love waking up at 6 am on Mondays I would.|sarcasm|
+-------------------------------------------------------------------------------+-------+

See also: SentimentDLModel for sentiment analysis
MultiClassifierDLModel for multi-class classification

class DeBertaForSequenceClassification extends AnnotatorModel[DeBertaForSequenceClassification] with HasBatchedAnnotate[DeBertaForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

DeBertaForSequenceClassification can load DeBerta v2 & v3 Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
DeBertaForSequenceClassification can load DeBerta v2 & v3 Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = DeBertaForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "deberta_v3_xsmall_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the DeBertaForSequenceClassification.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = DeBertaForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
DeBertaForSequenceClassification for sequence-level classification

class DeBertaForTokenClassification extends AnnotatorModel[DeBertaForTokenClassification] with HasBatchedAnnotate[DeBertaForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties

DeBertaForTokenClassification can load DeBERTA Models v2 and v3 with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

DeBertaForTokenClassification can load DeBERTA Models v2 and v3 with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = DeBertaForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "deberta_xsmall_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

and the DeBertaForTokenClassificationTestSpec. Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = DeBertaForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
DeBertaForTokenClassification for token-level classification

class DistilBertForSequenceClassification extends AnnotatorModel[DistilBertForSequenceClassification] with HasBatchedAnnotate[DistilBertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

DistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
DistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = DistilBertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "distilbert_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the DistilBertForSequenceClassificationTestSpec.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = DistilBertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
DistilBertForSequenceClassification for sequence-level classification

class DistilBertForTokenClassification extends AnnotatorModel[DistilBertForTokenClassification] with HasBatchedAnnotate[DistilBertForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

DistilBertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

DistilBertForTokenClassification can load Bert Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = DistilBertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "distilbert_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the DistilBertForTokenClassificationTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = DistilBertForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
DistilBertForTokenClassification for token-level classification

class LongformerForSequenceClassification extends AnnotatorModel[LongformerForSequenceClassification] with HasBatchedAnnotate[LongformerForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

LongformerForSequenceClassification can load Longformer Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
LongformerForSequenceClassification can load Longformer Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = LongformerForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "longformer_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the LongformerForSequenceClassification.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = LongformerForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
LongformerForSequenceClassification for sequence-level classification

class LongformerForTokenClassification extends AnnotatorModel[LongformerForTokenClassification] with HasBatchedAnnotate[LongformerForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

LongformerForTokenClassification can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

LongformerForTokenClassification can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = LongformerForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "longformer_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

and the LongformerForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = LongformerForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
LongformerForTokenClassification for token-level classification

class MultiClassifierDLApproach extends AnnotatorApproach[MultiClassifierDLModel] with ParamsAndFeaturesWritable

Trains a MultiClassifierDL for Multi-label Text Classification.

MultiClassifierDL uses a Bidirectional GRU with a convolutional model that we have built inside TensorFlow and supports up to 100 classes.

For instantiated/pretrained models, see MultiClassifierDLModel.

The input to MultiClassifierDL are Sentence Embeddings such as the state-of-the-art UniversalSentenceEncoder, BertSentenceEmbeddings, or SentenceEmbeddings.

In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to. Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y (assigning a value of 0 or 1 for each element (label) in y).

Notes:

This annotator requires an array of labels in type of String.
UniversalSentenceEncoder, BertSentenceEmbeddings, or SentenceEmbeddings can be used for the inputCol.

For extended examples of usage, see the Spark NLP Workshop and the MultiClassifierDLTestSpec.

Example

In this example, the training data has the form (Note: labels can be arbitrary)

mr,ref
"name[Alimentum], area[city centre], familyFriendly[no], near[Burger King]",Alimentum is an adult establish found in the city centre area near Burger King.
"name[Alimentum], area[city centre], familyFriendly[yes]",Alimentum is a family-friendly place in the city centre.
...

It needs some pre-processing first, so the labels are of type Array[String]. This can be done like so:

import spark.implicits._
import com.johnsnowlabs.nlp.annotators.classifier.dl.MultiClassifierDLApproach
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.functions.{col, udf}

// Process training data to create text with associated array of labels
def splitAndTrim = udf { labels: String =>
  labels.split(", ").map(x=>x.trim)
}

val smallCorpus = spark.read
  .option("header", true)
  .option("inferSchema", true)
  .option("mode", "DROPMALFORMED")
  .csv("src/test/resources/classifier/e2e.csv")
  .withColumn("labels", splitAndTrim(col("mr")))
  .withColumn("text", col("ref"))
  .drop("mr")

smallCorpus.printSchema()
// root
// |-- ref: string (nullable = true)
// |-- labels: array (nullable = true)
// |    |-- element: string (containsNull = true)

// Then create pipeline for training
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
  .setCleanupMode("shrink")

val embeddings = UniversalSentenceEncoder.pretrained()
  .setInputCols("document")
  .setOutputCol("embeddings")

val docClassifier = new MultiClassifierDLApproach()
  .setInputCols("embeddings")
  .setOutputCol("category")
  .setLabelColumn("labels")
  .setBatchSize(128)
  .setMaxEpochs(10)
  .setLr(1e-3f)
  .setThreshold(0.5f)
  .setValidationSplit(0.1f)

val pipeline = new Pipeline()
  .setStages(
    Array(
      documentAssembler,
      embeddings,
      docClassifier
    )
  )

val pipelineModel = pipeline.fit(smallCorpus)

See also: SentimentDLApproach for sentiment analysis
ClassifierDLApproach for single-class classification
Multi-label classification on Wikipedia

class MultiClassifierDLModel extends AnnotatorModel[MultiClassifierDLModel] with HasSimpleAnnotate[MultiClassifierDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

MultiClassifierDL for Multi-label Text Classification.
MultiClassifierDL for Multi-label Text Classification.
MultiClassifierDL Bidirectional GRU with Convolution model we have built inside TensorFlow and supports up to 100 classes. The input to MultiClassifierDL is Sentence Embeddings such as state-of-the-art UniversalSentenceEncoder, BertSentenceEmbeddings, or SentenceEmbeddings.
This is the instantiated model of the MultiClassifierDLApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with pretrained of the companion object:
```
val multiClassifier = MultiClassifierDLModel.pretrained()
  .setInputCols("sentence_embeddings")
  .setOutputCol("categories")
```
The default model is "multiclassifierdl_use_toxic", if no name is provided. It uses embeddings from the UniversalSentenceEncoder and classifies toxic comments. The data is based on the Jigsaw Toxic Comment Classification Challenge. For available pretrained models please see the Models Hub.
In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to. Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y (assigning a value of 0 or 1 for each element (label) in y).
For extended examples of usage, see the Spark NLP Workshop and the MultiClassifierDLTestSpec.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.classifier.dl.MultiClassifierDLModel
import com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val useEmbeddings = UniversalSentenceEncoder.pretrained()
  .setInputCols("document")
  .setOutputCol("sentence_embeddings")

val multiClassifierDl = MultiClassifierDLModel.pretrained()
  .setInputCols("sentence_embeddings")
  .setOutputCol("classifications")

val pipeline = new Pipeline()
  .setStages(Array(
    documentAssembler,
    useEmbeddings,
    multiClassifierDl
  ))

val data = Seq(
  "This is pretty good stuff!",
  "Wtf kind of crap is this"
).toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("text", "classifications.result").show(false)
+--------------------------+----------------+
|text                      |result          |
+--------------------------+----------------+
|This is pretty good stuff!|[]              |
|Wtf kind of crap is this  |[toxic, obscene]|
+--------------------------+----------------+
```
See also
SentimentDLModel for sentiment analysis
ClassifierDLModel for single-class classification
Multi-label classification on Wikipedia
trait ReadAlbertForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadAlbertForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadBertForSequenceTensorflowModel extends ReadTensorflowModel
trait ReadBertForTokenTensorflowModel extends ReadTensorflowModel
trait ReadClassifierDLTensorflowModel extends ReadTensorflowModel
trait ReadDeBertaForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadDeBertaForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadDistilBertForSequenceTensorflowModel extends ReadTensorflowModel
trait ReadDistilBertForTokenTensorflowModel extends ReadTensorflowModel
trait ReadLongformerForSequenceTensorflowModel extends ReadTensorflowModel
trait ReadLongformerForTokenTensorflowModel extends ReadTensorflowModel
trait ReadMultiClassifierDLTensorflowModel extends ReadTensorflowModel
trait ReadRoBertaForSequenceTensorflowModel extends ReadTensorflowModel
trait ReadRoBertaForTokenTensorflowModel extends ReadTensorflowModel
trait ReadSentimentDLTensorflowModel extends ReadTensorflowModel
trait ReadXlmRoBertaForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadXlmRoBertaForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadXlnetForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadXlnetForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadablePretrainedAlbertForSequenceModel extends ParamsAndFeaturesReadable[AlbertForSequenceClassification] with HasPretrained[AlbertForSequenceClassification]
trait ReadablePretrainedAlbertForTokenModel extends ParamsAndFeaturesReadable[AlbertForTokenClassification] with HasPretrained[AlbertForTokenClassification]
trait ReadablePretrainedBertForSequenceModel extends ParamsAndFeaturesReadable[BertForSequenceClassification] with HasPretrained[BertForSequenceClassification]
trait ReadablePretrainedBertForTokenModel extends ParamsAndFeaturesReadable[BertForTokenClassification] with HasPretrained[BertForTokenClassification]
trait ReadablePretrainedClassifierDL extends ParamsAndFeaturesReadable[ClassifierDLModel] with HasPretrained[ClassifierDLModel]
trait ReadablePretrainedDeBertaForSequenceModel extends ParamsAndFeaturesReadable[DeBertaForSequenceClassification] with HasPretrained[DeBertaForSequenceClassification]
trait ReadablePretrainedDeBertaForTokenModel extends ParamsAndFeaturesReadable[DeBertaForTokenClassification] with HasPretrained[DeBertaForTokenClassification]
trait ReadablePretrainedDistilBertForSequenceModel extends ParamsAndFeaturesReadable[DistilBertForSequenceClassification] with HasPretrained[DistilBertForSequenceClassification]
trait ReadablePretrainedDistilBertForTokenModel extends ParamsAndFeaturesReadable[DistilBertForTokenClassification] with HasPretrained[DistilBertForTokenClassification]
trait ReadablePretrainedLongformerForSequenceModel extends ParamsAndFeaturesReadable[LongformerForSequenceClassification] with HasPretrained[LongformerForSequenceClassification]
trait ReadablePretrainedLongformerForTokenModel extends ParamsAndFeaturesReadable[LongformerForTokenClassification] with HasPretrained[LongformerForTokenClassification]
trait ReadablePretrainedMultiClassifierDL extends ParamsAndFeaturesReadable[MultiClassifierDLModel] with HasPretrained[MultiClassifierDLModel]
trait ReadablePretrainedRoBertaForSequenceModel extends ParamsAndFeaturesReadable[RoBertaForSequenceClassification] with HasPretrained[RoBertaForSequenceClassification]
trait ReadablePretrainedRoBertaForTokenModel extends ParamsAndFeaturesReadable[RoBertaForTokenClassification] with HasPretrained[RoBertaForTokenClassification]
trait ReadablePretrainedSentimentDL extends ParamsAndFeaturesReadable[SentimentDLModel] with HasPretrained[SentimentDLModel]
trait ReadablePretrainedXlmRoBertaForSequenceModel extends ParamsAndFeaturesReadable[XlmRoBertaForSequenceClassification] with HasPretrained[XlmRoBertaForSequenceClassification]
trait ReadablePretrainedXlmRoBertaForTokenModel extends ParamsAndFeaturesReadable[XlmRoBertaForTokenClassification] with HasPretrained[XlmRoBertaForTokenClassification]
trait ReadablePretrainedXlnetForSequenceModel extends ParamsAndFeaturesReadable[XlnetForSequenceClassification] with HasPretrained[XlnetForSequenceClassification]
trait ReadablePretrainedXlnetForTokenModel extends ParamsAndFeaturesReadable[XlnetForTokenClassification] with HasPretrained[XlnetForTokenClassification]
class RoBertaForSequenceClassification extends AnnotatorModel[RoBertaForSequenceClassification] with HasBatchedAnnotate[RoBertaForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

RoBertaForSequenceClassification can load RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
RoBertaForSequenceClassification can load RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = RoBertaForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "roberta_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the RoBertaForSequenceClassification.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = RoBertaForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
RoBertaForSequenceClassification for sequence-level classification

class RoBertaForTokenClassification extends AnnotatorModel[RoBertaForTokenClassification] with HasBatchedAnnotate[RoBertaForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

RoBertaForTokenClassification can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

RoBertaForTokenClassification can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = RoBertaForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "roberta_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

and the RoBertaForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = RoBertaForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
RoBertaForTokenClassification for token-level classification

class SentimentDLApproach extends AnnotatorApproach[SentimentDLModel] with ParamsAndFeaturesWritable

Trains a SentimentDL, an annotator for multi-class sentiment analysis.

In natural language processing, sentiment analysis is the task of classifying the affective state or subjective view of a text. A common example is if either a product review or tweet can be interpreted positively or negatively.

For the instantiated/pretrained models, see SentimentDLModel.

Notes:

This annotator accepts a label column of a single item in either type of String, Int, Float, or Double. So positive sentiment can be expressed as either "positive" or 0, negative sentiment as "negative" or 1.
UniversalSentenceEncoder, BertSentenceEmbeddings, or SentenceEmbeddings can be used for the inputCol.

For extended examples of usage, see the Spark NLP Workshop and the SentimentDLTestSpec.

Example

In this example, sentiment.csv is in the form

text,label
This movie is the best movie I have watched ever! In my opinion this movie can win an award.,0
This was a terrible movie! The acting was bad really bad!,1

The model can then be trained with

import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.UniversalSentenceEncoder
import com.johnsnowlabs.nlp.annotators.classifier.dl.{SentimentDLApproach, SentimentDLModel}
import org.apache.spark.ml.Pipeline

val smallCorpus = spark.read.option("header", "true").csv("src/test/resources/classifier/sentiment.csv")

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val useEmbeddings = UniversalSentenceEncoder.pretrained()
  .setInputCols("document")
  .setOutputCol("sentence_embeddings")

val docClassifier = new SentimentDLApproach()
  .setInputCols("sentence_embeddings")
  .setOutputCol("sentiment")
  .setLabelColumn("label")
  .setBatchSize(32)
  .setMaxEpochs(1)
  .setLr(5e-3f)
  .setDropout(0.5f)

val pipeline = new Pipeline()
  .setStages(
    Array(
      documentAssembler,
      useEmbeddings,
      docClassifier
    )
  )

val pipelineModel = pipeline.fit(smallCorpus)

See also: MultiClassifierDLApproach for general multi-class classification
ClassifierDLApproach for general single-class classification

class SentimentDLModel extends AnnotatorModel[SentimentDLModel] with HasSimpleAnnotate[SentimentDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

SentimentDL, an annotator for multi-class sentiment analysis.

This is the instantiated model of the SentimentDLApproach. For training your own model, please see the documentation of that class.

Pretrained models can be loaded with pretrained of the companion object:

val sentiment = SentimentDLModel.pretrained()
  .setInputCols("sentence_embeddings")
  .setOutputCol("sentiment")

The default model is "sentimentdl_use_imdb", if no name is provided. It is english sentiment analysis trained on the IMDB dataset. For available pretrained models please see the Models Hub.

For extended examples of usage, see the Spark NLP Workshop and the SentimentDLTestSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.UniversalSentenceEncoder
import com.johnsnowlabs.nlp.annotators.classifier.dl.SentimentDLModel
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val useEmbeddings = UniversalSentenceEncoder.pretrained()
  .setInputCols("document")
  .setOutputCol("sentence_embeddings")

val sentiment = SentimentDLModel.pretrained("sentimentdl_use_twitter")
  .setInputCols("sentence_embeddings")
  .setThreshold(0.7F)
  .setOutputCol("sentiment")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  useEmbeddings,
  sentiment
))

val data = Seq(
  "Wow, the new video is awesome!",
  "bruh what a damn waste of time"
).toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("text", "sentiment.result").show(false)
+------------------------------+----------+
|text                          |result    |
+------------------------------+----------+
|Wow, the new video is awesome!|[positive]|
|bruh what a damn waste of time|[negative]|
+------------------------------+----------+

See also: MultiClassifierDLModel for general multi-class classification
ClassifierDLModel for general single-class classification

class XlmRoBertaForSequenceClassification extends AnnotatorModel[XlmRoBertaForSequenceClassification] with HasBatchedAnnotate[XlmRoBertaForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

XlmRoBertaForSequenceClassification can load XLM-RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
XlmRoBertaForSequenceClassification can load XLM-RoBERTa Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "xlm_roberta_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the XlmRoBertaForSequenceClassification.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
XlmRoBertaForSequenceClassification for sequence-level classification

class XlmRoBertaForTokenClassification extends AnnotatorModel[XlmRoBertaForTokenClassification] with HasBatchedAnnotate[XlmRoBertaForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties

XlmRoBertaForTokenClassification can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

XlmRoBertaForTokenClassification can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = XlmRoBertaForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "xlm_roberta_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

and the XlmRoBertaForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = XlmRoBertaForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
XlmRoBertaForTokenClassification for token-level classification

class XlnetForSequenceClassification extends AnnotatorModel[XlnetForSequenceClassification] with HasBatchedAnnotate[XlnetForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

XlnetForSequenceClassification can load XLNet Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
XlnetForSequenceClassification can load XLNet Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = XlnetForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "xlnet_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the XlnetForSequenceClassification.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = XlnetForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
Annotators Main Page for a list of transformer based classifiers
XlnetForSequenceClassification for sequence-level classification

class XlnetForTokenClassification extends AnnotatorModel[XlnetForTokenClassification] with HasBatchedAnnotate[XlnetForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties

XlnetForTokenClassification can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g.

XlnetForTokenClassification can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

Pretrained models can be loaded with pretrained of the companion object:

val tokenClassifier = XlnetForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

The default model is "xlnet_base_token_classifier_conll03", if no name is provided.

For available pretrained models please see the Models Hub.

and the XlnetForTokenClassificationTestSpec. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val tokenClassifier = XlnetForTokenClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  tokenClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+------------------------------------------------------------------------------------+
|result                                                                              |
+------------------------------------------------------------------------------------+
|[B-PER, I-PER, O, O, O, B-LOC, O, O, O, B-LOC, O, O, O, O, B-PER, O, O, O, O, B-LOC]|
+------------------------------------------------------------------------------------+

See also: Annotators Main Page for a list of transformer based classifiers
XlnetForTokenClassification for token-level classification

Value Members

object AlbertForSequenceClassification extends ReadablePretrainedAlbertForSequenceModel with ReadAlbertForSequenceTensorflowModel with Serializable

This is the companion object of AlbertForSequenceClassification.
This is the companion object of AlbertForSequenceClassification. Please refer to that class for the documentation.
object AlbertForTokenClassification extends ReadablePretrainedAlbertForTokenModel with ReadAlbertForTokenTensorflowModel with Serializable

This is the companion object of AlbertForTokenClassification.
This is the companion object of AlbertForTokenClassification. Please refer to that class for the documentation.
object BertForSequenceClassification extends ReadablePretrainedBertForSequenceModel with ReadBertForSequenceTensorflowModel with Serializable

This is the companion object of BertForSequenceClassification.
This is the companion object of BertForSequenceClassification. Please refer to that class for the documentation.
object BertForTokenClassification extends ReadablePretrainedBertForTokenModel with ReadBertForTokenTensorflowModel with Serializable

This is the companion object of BertForTokenClassification.
This is the companion object of BertForTokenClassification. Please refer to that class for the documentation.
object ClassifierDLApproach extends DefaultParamsReadable[ClassifierDLApproach] with Serializable

This is the companion object of ClassifierDLApproach.
This is the companion object of ClassifierDLApproach. Please refer to that class for the documentation.
object ClassifierDLModel extends ReadablePretrainedClassifierDL with ReadClassifierDLTensorflowModel with Serializable

This is the companion object of ClassifierDLModel.
This is the companion object of ClassifierDLModel. Please refer to that class for the documentation.
object DeBertaForSequenceClassification extends ReadablePretrainedDeBertaForSequenceModel with ReadDeBertaForSequenceTensorflowModel with Serializable

This is the companion object of DeBertaForSequenceClassification.
This is the companion object of DeBertaForSequenceClassification. Please refer to that class for the documentation.
object DeBertaForTokenClassification extends ReadablePretrainedDeBertaForTokenModel with ReadDeBertaForTokenTensorflowModel with Serializable

This is the companion object of DeBertaForTokenClassification.
This is the companion object of DeBertaForTokenClassification. Please refer to that class for the documentation.
object DistilBertForSequenceClassification extends ReadablePretrainedDistilBertForSequenceModel with ReadDistilBertForSequenceTensorflowModel with Serializable

This is the companion object of DistilBertForSequenceClassification.
This is the companion object of DistilBertForSequenceClassification. Please refer to that class for the documentation.
object DistilBertForTokenClassification extends ReadablePretrainedDistilBertForTokenModel with ReadDistilBertForTokenTensorflowModel with Serializable

This is the companion object of DistilBertForTokenClassification.
This is the companion object of DistilBertForTokenClassification. Please refer to that class for the documentation.
object LongformerForSequenceClassification extends ReadablePretrainedLongformerForSequenceModel with ReadLongformerForSequenceTensorflowModel with Serializable

This is the companion object of LongformerForSequenceClassification.
This is the companion object of LongformerForSequenceClassification. Please refer to that class for the documentation.
object LongformerForTokenClassification extends ReadablePretrainedLongformerForTokenModel with ReadLongformerForTokenTensorflowModel with Serializable

This is the companion object of LongformerForTokenClassification.
This is the companion object of LongformerForTokenClassification. Please refer to that class for the documentation.
object MultiClassifierDLModel extends ReadablePretrainedMultiClassifierDL with ReadMultiClassifierDLTensorflowModel with Serializable

This is the companion object of MultiClassifierDLModel.
This is the companion object of MultiClassifierDLModel. Please refer to that class for the documentation.
object RoBertaForSequenceClassification extends ReadablePretrainedRoBertaForSequenceModel with ReadRoBertaForSequenceTensorflowModel with Serializable

This is the companion object of RoBertaForSequenceClassification.
This is the companion object of RoBertaForSequenceClassification. Please refer to that class for the documentation.
object RoBertaForTokenClassification extends ReadablePretrainedRoBertaForTokenModel with ReadRoBertaForTokenTensorflowModel with Serializable

This is the companion object of RoBertaForTokenClassification.
This is the companion object of RoBertaForTokenClassification. Please refer to that class for the documentation.
object SentimentApproach extends DefaultParamsReadable[SentimentDLApproach]

This is the companion object of SentimentApproach.
This is the companion object of SentimentApproach. Please refer to that class for the documentation.
object SentimentDLModel extends ReadablePretrainedSentimentDL with ReadSentimentDLTensorflowModel with Serializable

This is the companion object of SentimentDLModel.
This is the companion object of SentimentDLModel. Please refer to that class for the documentation.
object XlmRoBertaForSequenceClassification extends ReadablePretrainedXlmRoBertaForSequenceModel with ReadXlmRoBertaForSequenceTensorflowModel with Serializable

This is the companion object of XlmRoBertaForSequenceClassification.
This is the companion object of XlmRoBertaForSequenceClassification. Please refer to that class for the documentation.
object XlmRoBertaForTokenClassification extends ReadablePretrainedXlmRoBertaForTokenModel with ReadXlmRoBertaForTokenTensorflowModel with Serializable

This is the companion object of XlmRoBertaForTokenClassification.
This is the companion object of XlmRoBertaForTokenClassification. Please refer to that class for the documentation.
object XlnetForSequenceClassification extends ReadablePretrainedXlnetForSequenceModel with ReadXlnetForSequenceTensorflowModel with Serializable

This is the companion object of XlnetForSequenceClassification.
This is the companion object of XlnetForSequenceClassification. Please refer to that class for the documentation.
object XlnetForTokenClassification extends ReadablePretrainedXlnetForTokenModel with ReadXlnetForTokenTensorflowModel with Serializable

This is the companion object of XlnetForTokenClassification.
This is the companion object of XlnetForTokenClassification. Please refer to that class for the documentation.

dl

package dl

Type Members

class AlbertForSequenceClassification extends AnnotatorModel[AlbertForSequenceClassification] with HasBatchedAnnotate[AlbertForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

Example

class AlbertForTokenClassification extends AnnotatorModel[AlbertForTokenClassification] with HasBatchedAnnotate[AlbertForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties

Example

class BertForSequenceClassification extends AnnotatorModel[BertForSequenceClassification] with HasBatchedAnnotate[BertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

Example

class BertForTokenClassification extends AnnotatorModel[BertForTokenClassification] with HasBatchedAnnotate[BertForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

Example

class ClassifierDLApproach extends AnnotatorApproach[ClassifierDLModel] with ParamsAndFeaturesWritable

Example

class ClassifierDLModel extends AnnotatorModel[ClassifierDLModel] with HasSimpleAnnotate[ClassifierDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

Example

class DeBertaForSequenceClassification extends AnnotatorModel[DeBertaForSequenceClassification] with HasBatchedAnnotate[DeBertaForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

Example

class DeBertaForTokenClassification extends AnnotatorModel[DeBertaForTokenClassification] with HasBatchedAnnotate[DeBertaForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties

Example

class DistilBertForSequenceClassification extends AnnotatorModel[DistilBertForSequenceClassification] with HasBatchedAnnotate[DistilBertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

Example

class DistilBertForTokenClassification extends AnnotatorModel[DistilBertForTokenClassification] with HasBatchedAnnotate[DistilBertForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

Example

class LongformerForSequenceClassification extends AnnotatorModel[LongformerForSequenceClassification] with HasBatchedAnnotate[LongformerForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

Example

class LongformerForTokenClassification extends AnnotatorModel[LongformerForTokenClassification] with HasBatchedAnnotate[LongformerForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

Example

class MultiClassifierDLApproach extends AnnotatorApproach[MultiClassifierDLModel] with ParamsAndFeaturesWritable

Example

class MultiClassifierDLModel extends AnnotatorModel[MultiClassifierDLModel] with HasSimpleAnnotate[MultiClassifierDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

Example

trait ReadAlbertForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadAlbertForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadBertForSequenceTensorflowModel extends ReadTensorflowModel

trait ReadBertForTokenTensorflowModel extends ReadTensorflowModel

trait ReadClassifierDLTensorflowModel extends ReadTensorflowModel

trait ReadDeBertaForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadDeBertaForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadDistilBertForSequenceTensorflowModel extends ReadTensorflowModel

trait ReadDistilBertForTokenTensorflowModel extends ReadTensorflowModel

trait ReadLongformerForSequenceTensorflowModel extends ReadTensorflowModel

trait ReadLongformerForTokenTensorflowModel extends ReadTensorflowModel

trait ReadMultiClassifierDLTensorflowModel extends ReadTensorflowModel

trait ReadRoBertaForSequenceTensorflowModel extends ReadTensorflowModel

trait ReadRoBertaForTokenTensorflowModel extends ReadTensorflowModel

trait ReadSentimentDLTensorflowModel extends ReadTensorflowModel

trait ReadXlmRoBertaForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadXlmRoBertaForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadXlnetForSequenceTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadXlnetForTokenTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadablePretrainedAlbertForSequenceModel extends ParamsAndFeaturesReadable[AlbertForSequenceClassification] with HasPretrained[AlbertForSequenceClassification]

trait ReadablePretrainedAlbertForTokenModel extends ParamsAndFeaturesReadable[AlbertForTokenClassification] with HasPretrained[AlbertForTokenClassification]

trait ReadablePretrainedBertForSequenceModel extends ParamsAndFeaturesReadable[BertForSequenceClassification] with HasPretrained[BertForSequenceClassification]

trait ReadablePretrainedBertForTokenModel extends ParamsAndFeaturesReadable[BertForTokenClassification] with HasPretrained[BertForTokenClassification]

trait ReadablePretrainedClassifierDL extends ParamsAndFeaturesReadable[ClassifierDLModel] with HasPretrained[ClassifierDLModel]

trait ReadablePretrainedDeBertaForSequenceModel extends ParamsAndFeaturesReadable[DeBertaForSequenceClassification] with HasPretrained[DeBertaForSequenceClassification]

trait ReadablePretrainedDeBertaForTokenModel extends ParamsAndFeaturesReadable[DeBertaForTokenClassification] with HasPretrained[DeBertaForTokenClassification]

trait ReadablePretrainedDistilBertForSequenceModel extends ParamsAndFeaturesReadable[DistilBertForSequenceClassification] with HasPretrained[DistilBertForSequenceClassification]

trait ReadablePretrainedDistilBertForTokenModel extends ParamsAndFeaturesReadable[DistilBertForTokenClassification] with HasPretrained[DistilBertForTokenClassification]

trait ReadablePretrainedLongformerForSequenceModel extends ParamsAndFeaturesReadable[LongformerForSequenceClassification] with HasPretrained[LongformerForSequenceClassification]

trait ReadablePretrainedLongformerForTokenModel extends ParamsAndFeaturesReadable[LongformerForTokenClassification] with HasPretrained[LongformerForTokenClassification]

trait ReadablePretrainedMultiClassifierDL extends ParamsAndFeaturesReadable[MultiClassifierDLModel] with HasPretrained[MultiClassifierDLModel]

trait ReadablePretrainedRoBertaForSequenceModel extends ParamsAndFeaturesReadable[RoBertaForSequenceClassification] with HasPretrained[RoBertaForSequenceClassification]

trait ReadablePretrainedRoBertaForTokenModel extends ParamsAndFeaturesReadable[RoBertaForTokenClassification] with HasPretrained[RoBertaForTokenClassification]

trait ReadablePretrainedSentimentDL extends ParamsAndFeaturesReadable[SentimentDLModel] with HasPretrained[SentimentDLModel]

trait ReadablePretrainedXlmRoBertaForSequenceModel extends ParamsAndFeaturesReadable[XlmRoBertaForSequenceClassification] with HasPretrained[XlmRoBertaForSequenceClassification]

trait ReadablePretrainedXlmRoBertaForTokenModel extends ParamsAndFeaturesReadable[XlmRoBertaForTokenClassification] with HasPretrained[XlmRoBertaForTokenClassification]

trait ReadablePretrainedXlnetForSequenceModel extends ParamsAndFeaturesReadable[XlnetForSequenceClassification] with HasPretrained[XlnetForSequenceClassification]

trait ReadablePretrainedXlnetForTokenModel extends ParamsAndFeaturesReadable[XlnetForTokenClassification] with HasPretrained[XlnetForTokenClassification]

class RoBertaForSequenceClassification extends AnnotatorModel[RoBertaForSequenceClassification] with HasBatchedAnnotate[RoBertaForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

Example

class RoBertaForTokenClassification extends AnnotatorModel[RoBertaForTokenClassification] with HasBatchedAnnotate[RoBertaForTokenClassification] with WriteTensorflowModel with HasCaseSensitiveProperties

Example

class SentimentDLApproach extends AnnotatorApproach[SentimentDLModel] with ParamsAndFeaturesWritable

Example

class SentimentDLModel extends AnnotatorModel[SentimentDLModel] with HasSimpleAnnotate[SentimentDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

Example

class XlmRoBertaForSequenceClassification extends AnnotatorModel[XlmRoBertaForSequenceClassification] with HasBatchedAnnotate[XlmRoBertaForSequenceClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties with HasClassifierActivationProperties

Example

class XlmRoBertaForTokenClassification extends AnnotatorModel[XlmRoBertaForTokenClassification] with HasBatchedAnnotate[XlmRoBertaForTokenClassification] with WriteTensorflowModel with WriteSentencePieceModel with HasCaseSensitiveProperties