Package

com.johnsnowlabs.nlp.annotators.sda

pragmatic

Permalink

package pragmatic

Visibility
  1. Public
  2. All

Type Members

  1. class PragmaticScorer extends Serializable

    Permalink

    Scorer is a rule based implementation inspired on http://fjavieralba.com/basic-sentiment-analysis-with-python.html Its strategy is to tag words by a dictionary in a sentence context, and later identify such context to get amplifiers

  2. class SentimentDetector extends AnnotatorApproach[SentimentDetectorModel]

    Permalink

    Trains a rule based sentiment detector, which calculates a score based on predefined keywords.

    Trains a rule based sentiment detector, which calculates a score based on predefined keywords.

    A dictionary of predefined sentiment keywords must be provided with setDictionary, where each line is a word delimited to its class (either positive or negative). The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.

    By default, the sentiment score will be assigned labels "positive" if the score is >= 0, else "negative". To retrieve the raw sentiment scores, enableScore needs to be set to true.

    For extended examples of usage, see the Spark NLP Workshop and the SentimentTestSpec.

    Example

    In this example, the dictionary default-sentiment-dict.txt has the form of

    ...
    cool,positive
    superb,positive
    bad,negative
    uninspired,negative
    ...

    where each sentiment keyword is delimited by ",".

    import spark.implicits._
    import com.johnsnowlabs.nlp.DocumentAssembler
    import com.johnsnowlabs.nlp.annotator.Tokenizer
    import com.johnsnowlabs.nlp.annotators.Lemmatizer
    import com.johnsnowlabs.nlp.annotators.sda.pragmatic.SentimentDetector
    import com.johnsnowlabs.nlp.util.io.ReadAs
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val lemmatizer = new Lemmatizer()
      .setInputCols("token")
      .setOutputCol("lemma")
      .setDictionary("src/test/resources/lemma-corpus-small/lemmas_small.txt", "->", "\t")
    
    val sentimentDetector = new SentimentDetector()
      .setInputCols("lemma", "document")
      .setOutputCol("sentimentScore")
      .setDictionary("src/test/resources/sentiment-corpus/default-sentiment-dict.txt", ",", ReadAs.TEXT)
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      lemmatizer,
      sentimentDetector,
    ))
    
    val data = Seq(
      "The staff of the restaurant is nice",
      "I recommend others to avoid because it is too expensive"
    ).toDF("text")
    val result = pipeline.fit(data).transform(data)
    
    result.selectExpr("sentimentScore.result").show(false)
    +----------+  //  +------+ for enableScore set to true
    |result    |  //  |result|
    +----------+  //  +------+
    |[positive]|  //  |[1.0] |
    |[negative]|  //  |[-2.0]|
    +----------+  //  +------+
    See also

    ViveknSentimentApproach for an alternative approach to sentiment extraction

  3. class SentimentDetectorModel extends AnnotatorModel[SentimentDetectorModel] with HasSimpleAnnotate[SentimentDetectorModel]

    Permalink

    Rule based sentiment detector, which calculates a score based on predefined keywords.

    Rule based sentiment detector, which calculates a score based on predefined keywords.

    This is the instantiated model of the SentimentDetector. For training your own model, please see the documentation of that class.

    A dictionary of predefined sentiment keywords must be provided with setDictionary, where each line is a word delimited to its class (either positive or negative). The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.

    By default, the sentiment score will be assigned labels "positive" if the score is >= 0, else "negative". To retrieve the raw sentiment scores, enableScore needs to be set to true.

    For extended examples of usage, see the Spark NLP Workshop and the SentimentTestSpec.

    See also

    ViveknSentimentApproach for an alternative approach to sentiment extraction

Value Members

  1. object SentimentDetector extends DefaultParamsReadable[SentimentDetector] with Serializable

    Permalink

    This is the companion object of SentimentDetector.

    This is the companion object of SentimentDetector. Please refer to that class for the documentation.

  2. object SentimentDetectorModel extends ParamsAndFeaturesReadable[SentimentDetectorModel] with Serializable

    Permalink

Ungrouped