Vivekn inspired sentiment analysis model
Vivekn inspired sentiment analysis model
content feature limit, to boost performance in very dirt text (Default: Disabled with -1
)
Get content feature limit, to boost performance in very dirt text (Default: Disabled with -1
)
Get Proportion of feature content to be considered relevant (Default: Disabled with 0.5
)
input annotations columns currently used
Gets annotation column name going to generate
Gets annotation column name going to generate
Get Proportion to lookahead in unimportant features (Default: 0.025
)
Proportion of feature content to be considered relevant (Default: 0.5
)
Input annotator type : TOKEN, DOCUMENT
Input annotator type : TOKEN, DOCUMENT
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
Detects negations and transforms them into not_ form
Detects negations and transforms them into not_ form
Output annotator type : SENTIMENT
Output annotator type : SENTIMENT
Removes unfrequent scenarios from scope.
Removes unfrequent scenarios from scope. The higher the better performance (Default: 1
)
Column with the sentiment result of every row.
Column with the sentiment result of every row. Must be "positive"
or "negative"
Set content feature limit, to boost performance in very dirt text (Default: Disabled with -1
)
Set Proportion of feature content to be considered relevant (Default: 0.5
)
Overrides required annotators column if different than default
Overrides required annotators column if different than default
Overrides annotation column name when transforming
Overrides annotation column name when transforming
when training on small data you may want to disable this to not cut off infrequent words
Column with sentiment analysis row’s result for training.
Column with sentiment analysis row’s result for training. If not set, external sources need to be set instead. Column with the sentiment result of every row. Must be 'positive' or 'negative'
Set Proportion to lookahead in unimportant features (Default: 0.025
)
requirement for pipeline transformation validation.
requirement for pipeline transformation validation. It is called on fit()
Proportion to lookahead in unimportant features (Default: 0.025
)
takes a Dataset and checks to see if all the required annotation types are present.
takes a Dataset and checks to see if all the required annotation types are present.
to be validated
True if all the required types are present, else false
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.
Required input and expected output annotator types
Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan https://github.com/vivekn/sentiment/.
The algorithm is based on the paper "Fast and accurate sentiment classification using an enhanced Naive Bayes model".
The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.
The training data needs to consist of a column for normalized text and a label column (either
"positive"
or"negative"
).For extended examples of usage, see the Spark NLP Workshop and the ViveknSentimentTestSpec.
Example
SentimentDetector for an alternative approach to sentiment detection