dl - Spark NLP 3.3.2 ScalaDoc - com.johnsnowlabs.nlp.annotators.ner.dl

Type Members

class NerDLApproach extends AnnotatorApproach[NerDLModel] with NerApproach[NerDLApproach] with Logging with ParamsAndFeaturesWritable

This Named Entity recognition annotator allows to train generic NER model based on Neural Networks.

The architecture of the neural network is a Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets.

For instantiated/pretrained models, see NerDLModel.

The training data should be a labeled Spark Dataset, in the format of CoNLL 2003 IOB with Annotation type columns. The data should have columns of type DOCUMENT, TOKEN, WORD_EMBEDDINGS and an additional label column of annotator type NAMED_ENTITY. Excluding the label, this can be done with for example

a SentenceDetector,
a Tokenizer and
a WordEmbeddingsModel (any embeddings can be chosen, e.g. BertEmbeddings for BERT based embeddings).

For extended examples of usage, see the Spark NLP Workshop and the NerDLSpec.

Example

import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector
import com.johnsnowlabs.nlp.embeddings.BertEmbeddings
import com.johnsnowlabs.nlp.annotators.ner.dl.NerDLApproach
import com.johnsnowlabs.nlp.training.CoNLL
import org.apache.spark.ml.Pipeline

// This CoNLL dataset already includes a sentence, token and label
// column with their respective annotator types. If a custom dataset is used,
// these need to be defined with for example:

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentence = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

// Then the training can start
val embeddings = BertEmbeddings.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("embeddings")

val nerTagger = new NerDLApproach()
  .setInputCols("sentence", "token", "embeddings")
  .setLabelColumn("label")
  .setOutputCol("ner")
  .setMaxEpochs(1)
  .setRandomSeed(0)
  .setVerbose(0)

val pipeline = new Pipeline().setStages(Array(
  embeddings,
  nerTagger
))

// We use the sentences, tokens and labels from the CoNLL dataset
val conll = CoNLL()
val trainingData = conll.readDataset(spark, "src/test/resources/conll2003/eng.train")

val pipelineModel = pipeline.fit(trainingData)

See also: NerConverter to further process the results
NerCrfApproach for a generic CRF approach

class NerDLModel extends AnnotatorModel[NerDLModel] with HasBatchedAnnotate[NerDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

This Named Entity recognition annotator is a generic NER model based on Neural Networks.

Neural Network architecture is Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets.

This is the instantiated model of the NerDLApproach. For training your own model, please see the documentation of that class.

Pretrained models can be loaded with pretrained of the companion object:

val nerModel = NerDLModel.pretrained()
  .setInputCols("sentence", "token", "embeddings")
  .setOutputCol("ner")

The default model is "ner_dl", if no name is provided.

For available pretrained models please see the Models Hub. Additionally, pretrained pipelines are available for this module, see Pipelines.

Note that some pretrained models require specific types of embeddings, depending on which they were trained on. For example, the default model "ner_dl" requires the WordEmbeddings "glove_100d".

For extended examples of usage, see the Spark NLP Workshop and the NerDLSpec.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector
import com.johnsnowlabs.nlp.embeddings.WordEmbeddingsModel
import com.johnsnowlabs.nlp.annotators.ner.dl.NerDLModel
import org.apache.spark.ml.Pipeline

// First extract the prerequisites for the NerDLModel
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentence = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

val embeddings = WordEmbeddingsModel.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("bert")

// Then NER can be extracted
val nerTagger = NerDLModel.pretrained()
  .setInputCols("sentence", "token", "bert")
  .setOutputCol("ner")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentence,
  tokenizer,
  embeddings,
  nerTagger
))

val data = Seq("U.N. official Ekeus heads for Baghdad.").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("ner.result").show(false)
+------------------------------------+
|result                              |
+------------------------------------+
|[B-ORG, O, O, B-PER, O, O, B-LOC, O]|
+------------------------------------+

See also: NerConverter to further process the results
NerCrfModel for a generic CRF approach

trait ReadablePretrainedNerDL extends ParamsAndFeaturesReadable[NerDLModel] with HasPretrained[NerDLModel]
trait ReadsNERGraph extends ParamsAndFeaturesReadable[NerDLModel] with ReadTensorflowModel
trait WithGraphResolver extends AnyRef

Value Members

object LoadsContrib
object NerDLApproach extends DefaultParamsReadable[NerDLApproach] with WithGraphResolver with Serializable

This is the companion object of NerDLApproach.
This is the companion object of NerDLApproach. Please refer to that class for the documentation.
object NerDLModel extends ReadablePretrainedNerDL with ReadsNERGraph with Serializable

This is the companion object of NerDLModel.
This is the companion object of NerDLModel. Please refer to that class for the documentation.
object NerDLModelPythonReader

dl

package dl

Type Members

class NerDLApproach extends AnnotatorApproach[NerDLModel] with NerApproach[NerDLApproach] with Logging with ParamsAndFeaturesWritable

Example

class NerDLModel extends AnnotatorModel[NerDLModel] with HasBatchedAnnotate[NerDLModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable

Example

trait ReadablePretrainedNerDL extends ParamsAndFeaturesReadable[NerDLModel] with HasPretrained[NerDLModel]

trait ReadsNERGraph extends ParamsAndFeaturesReadable[NerDLModel] with ReadTensorflowModel

trait WithGraphResolver extends AnyRef

Value Members

object LoadsContrib

object NerDLApproach extends DefaultParamsReadable[NerDLApproach] with WithGraphResolver with Serializable

object NerDLModel extends ReadablePretrainedNerDL with ReadsNERGraph with Serializable

object NerDLModelPythonReader

Ungrouped