- Bigram - Class in smile.nlp
-
Bigrams or digrams are groups of two words, and are very commonly used
as the basis for simple statistical analysis of text.
- Bigram(String, String) - Constructor for class smile.nlp.Bigram
-
Constructor.
- BigramCollocation - Class in smile.nlp.collocation
-
Collocations are expressions of multiple words which commonly co-occur.
- BigramCollocation(String, String, int, double) - Constructor for class smile.nlp.collocation.BigramCollocation
-
Constructor.
- BigramCollocationFinder - Class in smile.nlp.collocation
-
Tools to identify collocations (words that often appear consecutively) within
corpora.
- BigramCollocationFinder(int) - Constructor for class smile.nlp.collocation.BigramCollocationFinder
-
Constructor.
- BM25 - Class in smile.nlp.relevance
-
The BM25 weighting scheme, often called Okapi weighting, after the system in
which it was first implemented, was developed as a way of building a
probabilistic model sensitive to term frequency and document length while
not introducing too many additional parameters into the model.
- BM25() - Constructor for class smile.nlp.relevance.BM25
-
Default constructor with k1 = 1.2, b = 0.75, delta = 1.0.
- BM25(double, double, double) - Constructor for class smile.nlp.relevance.BM25
-
Constructor.
- BreakIteratorSentenceSplitter - Class in smile.nlp.tokenizer
-
A sentence splitter based on the java.text.BreakIterator, which supports
multiple natural languages (selected by locale setting).
- BreakIteratorSentenceSplitter() - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
-
Constructor for the default locale.
- BreakIteratorSentenceSplitter(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
-
Constructor for the given locale.
- BreakIteratorTokenizer - Class in smile.nlp.tokenizer
-
A word tokenizer based on the java.text.BreakIterator, which supports
multiple natural languages (selected by locale setting).
- BreakIteratorTokenizer() - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer
-
Constructor for the default locale.
- BreakIteratorTokenizer(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer
-
Constructor for the given locale.
- get(String) - Static method in class smile.nlp.pos.EnglishPOSLexicon
-
Returns part-of-speech tags for given word, or null if the word does
not exist in the dictionary.
- get(K[]) - Method in class smile.nlp.Trie
-
Returns the associated value of a given key.
- get(K) - Method in class smile.nlp.Trie
-
Returns the node of a given key.
- getAbbreviation(String) - Method in interface smile.nlp.dictionary.Abbreviations
-
Returns the abbreviation for a word.
- getAnchor() - Method in interface smile.nlp.AnchorText
-
Returns the anchor text if any.
- getAnchor() - Method in class smile.nlp.SimpleText
-
Returns the anchor text if any.
- getAverageDocumentSize() - Method in interface smile.nlp.Corpus
-
Returns the average size of documents in the corpus.
- getAverageDocumentSize() - Method in class smile.nlp.SimpleCorpus
-
- getBigramFrequency(Bigram) - Method in interface smile.nlp.Corpus
-
Returns the total frequency of the bigram in the corpus.
- getBigramFrequency(Bigram) - Method in class smile.nlp.SimpleCorpus
-
- getBigrams() - Method in interface smile.nlp.Corpus
-
Returns an iterator over the bigrams in the corpus.
- getBigrams() - Method in class smile.nlp.SimpleCorpus
-
- getBody() - Method in class smile.nlp.Text
-
Returns the body of text.
- getChild(K[], int) - Method in class smile.nlp.Trie.Node
-
- getChild(K) - Method in class smile.nlp.Trie.Node
-
- getDefault() - Static method in class smile.nlp.pos.HMMPOSTagger
-
Returns the default English POS tagger.
- getFull(String) - Method in interface smile.nlp.dictionary.Abbreviations
-
Returns the full word for a given abbreviation.
- getID() - Method in class smile.nlp.Text
-
Returns the id of document in the corpus.
- getInstance() - Static method in class smile.nlp.dictionary.EnglishPunctuations
-
Returns the singleton instance.
- getInstance() - Static method in class smile.nlp.tokenizer.PennTreebankTokenizer
-
Returns the singleton instance.
- getInstance() - Static method in class smile.nlp.tokenizer.SimpleParagraphSplitter
-
Returns the singleton instance.
- getInstance() - Static method in class smile.nlp.tokenizer.SimpleSentenceSplitter
-
Returns the singleton instance.
- getKey() - Method in class smile.nlp.Trie.Node
-
- getNumBigrams() - Method in interface smile.nlp.Corpus
-
Returns the number of bigrams in the corpus.
- getNumBigrams() - Method in class smile.nlp.SimpleCorpus
-
- getNumDocuments() - Method in interface smile.nlp.Corpus
-
Returns the number of documents in the corpus.
- getNumDocuments() - Method in class smile.nlp.SimpleCorpus
-
- getNumTerms() - Method in interface smile.nlp.Corpus
-
Returns the number of unique terms in the corpus.
- getNumTerms() - Method in class smile.nlp.SimpleCorpus
-
- getTermFrequency(String) - Method in interface smile.nlp.Corpus
-
Returns the total frequency of the term in the corpus.
- getTermFrequency(String) - Method in class smile.nlp.SimpleCorpus
-
- getTerms() - Method in interface smile.nlp.Corpus
-
Returns an iterator over the terms in the corpus.
- getTerms() - Method in class smile.nlp.SimpleCorpus
-
- getTitle() - Method in class smile.nlp.Text
-
Returns the title of text.
- getValue(String) - Static method in enum smile.nlp.pos.PennTreebankPOS
-
Returns an enum value from a string.
- getValue() - Method in class smile.nlp.Trie.Node
-
- score() - Method in class smile.nlp.collocation.BigramCollocation
-
Returns the chi-square statistical score of the collocation.
- score(int, int, double, int, int, double, int, int, double, long, long) - Method in class smile.nlp.relevance.BM25
-
Returns a relevance score between a term and a document based on a corpus.
- score(double, long, long) - Method in class smile.nlp.relevance.BM25
-
Returns a relevance score between a term and a document based on a corpus.
- score(double, int, double, long, long) - Method in class smile.nlp.relevance.BM25
-
Returns a relevance score between a term and a document based on a corpus.
- score() - Method in class smile.nlp.relevance.Relevance
-
Returns the relevance score.
- search(String) - Method in interface smile.nlp.Corpus
-
Returns an iterator over the set of documents containing the given term.
- search(RelevanceRanker, String) - Method in interface smile.nlp.Corpus
-
Returns an iterator over the set of documents containing the given term
in descending order of relevance.
- search(RelevanceRanker, String[]) - Method in interface smile.nlp.Corpus
-
Returns an iterator over the set of documents containing (at least one
of) the given terms in descending order of relevance.
- search(String) - Method in class smile.nlp.SimpleCorpus
-
- search(RelevanceRanker, String) - Method in class smile.nlp.SimpleCorpus
-
- search(RelevanceRanker, String[]) - Method in class smile.nlp.SimpleCorpus
-
- SentenceSplitter - Interface in smile.nlp.tokenizer
-
A sentence splitter segments text into sentences (a string of words
satisfying the grammatical rules of a language).
- setAnchor(String) - Method in interface smile.nlp.AnchorText
-
Sets the anchor text.
- setAnchor(String) - Method in class smile.nlp.SimpleText
-
Sets the anchor text.
- setBody(String) - Method in class smile.nlp.Text
-
- setID(String) - Method in class smile.nlp.Text
-
- setTitle(String) - Method in class smile.nlp.Text
-
- SimpleCorpus - Class in smile.nlp
-
A simple implementation of corpus in main memory for small datasets.
- SimpleCorpus() - Constructor for class smile.nlp.SimpleCorpus
-
Constructor.
- SimpleCorpus(SentenceSplitter, Tokenizer, StopWords, Punctuations) - Constructor for class smile.nlp.SimpleCorpus
-
Constructor.
- SimpleDictionary - Class in smile.nlp.dictionary
-
A simple implementation of dictionary interface.
- SimpleDictionary(String) - Constructor for class smile.nlp.dictionary.SimpleDictionary
-
Constructor.
- SimpleParagraphSplitter - Class in smile.nlp.tokenizer
-
This is a simple paragraph splitter.
- SimpleSentenceSplitter - Class in smile.nlp.tokenizer
-
This is a simple sentence splitter for English.
- SimpleText - Class in smile.nlp
-
A list-of-words representation of documents.
- SimpleText(String, String, String, String[]) - Constructor for class smile.nlp.SimpleText
-
Constructor.
- SimpleTokenizer - Class in smile.nlp.tokenizer
-
A word tokenizer that tokenizes English sentences with some differences from
TreebankWordTokenizer, noteably on handling not-contractions.
- SimpleTokenizer() - Constructor for class smile.nlp.tokenizer.SimpleTokenizer
-
Constructor.
- SimpleTokenizer(boolean) - Constructor for class smile.nlp.tokenizer.SimpleTokenizer
-
Constructor.
- size() - Method in interface smile.nlp.Corpus
-
Returns the number of words in the corpus.
- size() - Method in interface smile.nlp.dictionary.Dictionary
-
Returns the number of elements in this dictionary.
- size() - Method in enum smile.nlp.dictionary.EnglishDictionary
-
- size() - Method in class smile.nlp.dictionary.EnglishPunctuations
-
- size() - Method in enum smile.nlp.dictionary.EnglishStopWords
-
- size() - Method in class smile.nlp.dictionary.SimpleDictionary
-
- size() - Method in class smile.nlp.SimpleCorpus
-
- size() - Method in class smile.nlp.SimpleText
-
- size() - Method in interface smile.nlp.TextTerms
-
Returns the number of words.
- size() - Method in class smile.nlp.Trie
-
Returns the number of entries.
- smile.nlp - package smile.nlp
-
Natural language processing.
- smile.nlp.collocation - package smile.nlp.collocation
-
Collocation finding algorithms.
- smile.nlp.dictionary - package smile.nlp.dictionary
-
Common dictionaries such as stop words, punctuation, common English words, etc.
- smile.nlp.keyword - package smile.nlp.keyword
-
- smile.nlp.pos - package smile.nlp.pos
-
Part-of-speech taggers.
- smile.nlp.relevance - package smile.nlp.relevance
-
Term-document relevance ranking algorithms.
- smile.nlp.stemmer - package smile.nlp.stemmer
-
English word stemmer algorithms.
- smile.nlp.tokenizer - package smile.nlp.tokenizer
-
Sentence splitter and word tokenizer.
- split(String) - Method in class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
-
- split(String) - Method in class smile.nlp.tokenizer.BreakIteratorTokenizer
-
- split(String) - Method in interface smile.nlp.tokenizer.ParagraphSplitter
-
Split text into sentences.
- split(String) - Method in class smile.nlp.tokenizer.PennTreebankTokenizer
-
- split(String) - Method in interface smile.nlp.tokenizer.SentenceSplitter
-
Split text into sentences.
- split(String) - Method in class smile.nlp.tokenizer.SimpleParagraphSplitter
-
- split(String) - Method in class smile.nlp.tokenizer.SimpleSentenceSplitter
-
- split(String) - Method in class smile.nlp.tokenizer.SimpleTokenizer
-
- split(String) - Method in interface smile.nlp.tokenizer.Tokenizer
-
Divide the given string into a list of substrings.
- stem(String) - Method in class smile.nlp.stemmer.LancasterStemmer
-
- stem(String) - Method in class smile.nlp.stemmer.PorterStemmer
-
- stem(String) - Method in interface smile.nlp.stemmer.Stemmer
-
Transforms a word into its root form.
- Stemmer - Interface in smile.nlp.stemmer
-
A Stemmer transforms a word into its root form.
- StopWords - Interface in smile.nlp.dictionary
-
A set of stop words in some language.
- stripPluralParticiple(String) - Method in class smile.nlp.stemmer.PorterStemmer
-
Remove plurals and participles.