Package

com.johnsnowlabs.nlp.annotators

sentence_detector_dl

Permalink

package sentence_detector_dl

Visibility
  1. Public
  2. All

Type Members

  1. case class Metrics(accuracy: Double, recall: Double, precision: Double, f1: Double) extends Product with Serializable

    Permalink
  2. trait ReadablePretrainedSentenceDetectorDL extends ParamsAndFeaturesReadable[SentenceDetectorDLModel] with HasPretrained[SentenceDetectorDLModel]

    Permalink
  3. trait ReadsSentenceDetectorDLGraph extends ParamsAndFeaturesReadable[SentenceDetectorDLModel] with ReadTensorflowModel

    Permalink
  4. class SentenceDetectorDLApproach extends AnnotatorApproach[SentenceDetectorDLModel]

    Permalink

    Trains an annotator that detects sentence boundaries using a deep learning approach.

    Trains an annotator that detects sentence boundaries using a deep learning approach.

    For pretrained models see SentenceDetectorDLModel.

    Currently, only the CNN model is supported for training, but in the future the architecture of the model can be set with setModelArchitecture.

    The default model "cnn" is based on the paper Deep-EOS: General-Purpose Neural Networks for Sentence Boundary Detection (2020, Stefan Schweter, Sajawel Ahmed) using a CNN architecture. We also modified the original implementation a little bit to cover broken sentences and some impossible end of line chars.

    Each extracted sentence can be returned in an Array or exploded to separate rows, if explodeSentences is set to true.

    For extended examples of usage, see the Spark NLP Workshop and the SentenceDetectorDLSpec.

    Example

    The training process needs data, where each data point is a sentence.

    In this example the train.txt file has the form of

    ...
    Slightly more moderate language would make our present situation – namely the lack of progress – a little easier.
    His political successors now have great responsibilities to history and to the heritage of values bequeathed to them by Nelson Mandela.
    ...

    where each line is one sentence. Training can then be started like so:

    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLApproach
    import org.apache.spark.ml.Pipeline
    
    val trainingData = spark.read.text("train.txt").toDF("text")
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentenceDetector = new SentenceDetectorDLApproach()
      .setInputCols(Array("document"))
      .setOutputCol("sentences")
      .setEpochsNumber(100)
    
    val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector))
    
    val model = pipeline.fit(trainingData)
    See also

    SentenceDetector for non deep learning extraction

    SentenceDetectorDLModel for pretrained models

  5. class SentenceDetectorDLEncoder extends Serializable

    Permalink
  6. class SentenceDetectorDLEncoderParam extends Param[SentenceDetectorDLEncoder]

    Permalink
  7. class SentenceDetectorDLModel extends AnnotatorModel[SentenceDetectorDLModel] with HasSimpleAnnotate[SentenceDetectorDLModel] with HasStorageRef with ParamsAndFeaturesWritable with WriteTensorflowModel

    Permalink

    Annotator that detects sentence boundaries using a deep learning approach.

    Annotator that detects sentence boundaries using a deep learning approach.

    Instantiated Model of the SentenceDetectorDLApproach. Detects sentence boundaries using a deep learning approach.

    Pretrained models can be loaded with pretrained of the companion object:

    val sentenceDL = SentenceDetectorDLModel.pretrained()
      .setInputCols("document")
      .setOutputCol("sentencesDL")

    The default model is "sentence_detector_dl", if no name is provided. For available pretrained models please see the Models Hub.

    Each extracted sentence can be returned in an Array or exploded to separate rows, if explodeSentences is set to true.

    For extended examples of usage, see the Spark NLP Workshop and the SentenceDetectorDLSpec.

    Example

    In this example, the normal SentenceDetector is compared to the SentenceDetectorDLModel. In a pipeline, SentenceDetectorDLModel can be used as a replacement for the SentenceDetector.

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotator.SentenceDetector
    import com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentence = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentences")
    
    val sentenceDL = SentenceDetectorDLModel
      .pretrained("sentence_detector_dl", "en")
      .setInputCols("document")
      .setOutputCol("sentencesDL")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentence,
      sentenceDL
    ))
    
    val data = Seq("""John loves Mary.Mary loves Peter
      Peter loves Helen .Helen loves John;
      Total: four people involved.""").toDF("text")
    val result = pipeline.fit(data).transform(data)
    
    result.selectExpr("explode(sentences.result) as sentences").show(false)
    +----------------------------------------------------------+
    |sentences                                                 |
    +----------------------------------------------------------+
    |John loves Mary.Mary loves Peter\n     Peter loves Helen .|
    |Helen loves John;                                         |
    |Total: four people involved.                              |
    +----------------------------------------------------------+
    
    result.selectExpr("explode(sentencesDL.result) as sentencesDL").show(false)
    +----------------------------+
    |sentencesDL                 |
    +----------------------------+
    |John loves Mary.            |
    |Mary loves Peter            |
    |Peter loves Helen .         |
    |Helen loves John;           |
    |Total: four people involved.|
    +----------------------------+
    See also

    SentenceDetector for non deep learning extraction

    SentenceDetectorDLApproach for training a model yourself

Ungrouped