Package

com.johnsnowlabs.nlp.annotators.spell

norvig

Permalink

package norvig

Visibility
  1. Public
  2. All

Type Members

  1. class NorvigSweetingApproach extends AnnotatorApproach[NorvigSweetingModel] with NorvigSweetingParams

    Permalink

    Trains annotator, that retrieves tokens and makes corrections automatically if not found in an English dictionary.

    Trains annotator, that retrieves tokens and makes corrections automatically if not found in an English dictionary.

    The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent. A dictionary of correct spellings must be provided with setDictionary either in the form of a text file or directly as an ExternalResource, where each word is parsed by a regex pattern.

    Inspired by Norvig model and SymSpell.

    For instantiated/pretrained models, see NorvigSweetingModel.

    For extended examples of usage, see the Spark NLP Workshop and the NorvigSweetingTestSpec.

    Example

    In this example, the dictionary "words.txt" has the form of

    ...
    gummy
    gummic
    gummier
    gummiest
    gummiferous
    ...

    This dictionary is then set to be the basis of the spell checker.

    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingApproach
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val spellChecker = new NorvigSweetingApproach()
      .setInputCols("token")
      .setOutputCol("spell")
      .setDictionary("src/test/resources/spell/words.txt")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      spellChecker
    ))
    
    val pipelineModel = pipeline.fit(trainingData)
    See also

    ContextSpellCheckerApproach for a DL based approach

    SymmetricDeleteApproach for an alternative approach to spell checking

  2. class NorvigSweetingModel extends AnnotatorModel[NorvigSweetingModel] with HasSimpleAnnotate[NorvigSweetingModel] with NorvigSweetingParams

    Permalink

    This annotator retrieves tokens and makes corrections automatically if not found in an English dictionary.

    This annotator retrieves tokens and makes corrections automatically if not found in an English dictionary. Inspired by Norvig model and SymSpell.

    The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.

    This is the instantiated model of the NorvigSweetingApproach. For training your own model, please see the documentation of that class.

    Pretrained models can be loaded with pretrained of the companion object:

    val spellChecker = NorvigSweetingModel.pretrained()
      .setInputCols("token")
      .setOutputCol("spell")
      .setDoubleVariants(true)

    The default model is "spellcheck_norvig", if no name is provided. For available pretrained models please see the Models Hub.

    For extended examples of usage, see the Spark NLP Workshop and the NorvigSweetingTestSpec.

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingModel
    
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val spellChecker = NorvigSweetingModel.pretrained()
      .setInputCols("token")
      .setOutputCol("spell")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      spellChecker
    ))
    
    val data = Seq("somtimes i wrrite wordz erong.").toDF("text")
    val result = pipeline.fit(data).transform(data)
    result.select("spell.result").show(false)
    +--------------------------------------+
    |result                                |
    +--------------------------------------+
    |[sometimes, i, write, words, wrong, .]|
    +--------------------------------------+
    See also

    ContextSpellCheckerModel for a DL based approach

    SymmetricDeleteModel for an alternative approach to spell checking

  3. trait NorvigSweetingParams extends Params

    Permalink

    These are the configs for the NorvigSweeting model

    These are the configs for the NorvigSweeting model

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/spell/norvig/NorvigSweetingTestSpec.scala for further reference on how to use this API

  4. trait ReadablePretrainedNorvig extends ParamsAndFeaturesReadable[NorvigSweetingModel] with HasPretrained[NorvigSweetingModel]

    Permalink

Value Members

  1. object NorvigSweetingApproach extends DefaultParamsReadable[NorvigSweetingApproach] with Serializable

    Permalink

    This is the companion object of NorvigSweetingApproach.

    This is the companion object of NorvigSweetingApproach. Please refer to that class for the documentation.

  2. object NorvigSweetingModel extends ReadablePretrainedNorvig with Serializable

    Permalink

    This is the companion object of NorvigSweetingModel.

    This is the companion object of NorvigSweetingModel. Please refer to that class for the documentation.

Ungrouped