Skip navigation links
A B C D E F G H I L M N O P R S T U V W 

A

Abbreviations - Interface in smile.nlp.dictionary
A dictionary interface for abbreviations.
add(Text) - Method in class smile.nlp.SimpleCorpus
Add a document to the corpus.
addAnchor(String) - Method in interface smile.nlp.AnchorText
Add a link label to the anchor text.
addAnchor(String) - Method in class smile.nlp.SimpleText
 
addChild(K[], V, int) - Method in class smile.nlp.Trie.Node
 
AnchorText - Interface in smile.nlp
The anchor text is the visible, clickable text in a hyperlink.
apply(String) - Method in class smile.nlp.embedding.Word2Vec
Returns the vector embedding of a word.

B

Bigram - Class in smile.nlp
Bigrams or digrams are groups of two words, and are very commonly used as the basis for simple statistical analysis of text.
Bigram(String, String) - Constructor for class smile.nlp.Bigram
Constructor.
Bigram - Class in smile.nlp.collocation
Collocations are expressions of multiple words which commonly co-occur.
Bigram(String, String, int, double) - Constructor for class smile.nlp.collocation.Bigram
Constructor.
BM25 - Class in smile.nlp.relevance
The BM25 weighting scheme, often called Okapi weighting, after the system in which it was first implemented, was developed as a way of building a probabilistic model sensitive to term frequency and document length while not introducing too many additional parameters into the model.
BM25() - Constructor for class smile.nlp.relevance.BM25
Default constructor with k1 = 1.2, b = 0.75, delta = 1.0.
BM25(double, double, double) - Constructor for class smile.nlp.relevance.BM25
Constructor.
body - Variable in class smile.nlp.Text
The text body.
BreakIteratorSentenceSplitter - Class in smile.nlp.tokenizer
A sentence splitter based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).
BreakIteratorSentenceSplitter() - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
Constructor for the default locale.
BreakIteratorSentenceSplitter(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
Constructor for the given locale.
BreakIteratorTokenizer - Class in smile.nlp.tokenizer
A word tokenizer based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).
BreakIteratorTokenizer() - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer
Constructor for the default locale.
BreakIteratorTokenizer(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer
Constructor for the given locale.

C

compareTo(Bigram) - Method in class smile.nlp.collocation.Bigram
 
compareTo(NGram) - Method in class smile.nlp.collocation.NGram
 
compareTo(Relevance) - Method in class smile.nlp.relevance.Relevance
 
contains(String) - Method in interface smile.nlp.dictionary.Dictionary
Returns true if this dictionary contains the specified word.
contains(String) - Method in enum smile.nlp.dictionary.EnglishDictionary
 
contains(String) - Method in class smile.nlp.dictionary.EnglishPunctuations
 
contains(String) - Method in enum smile.nlp.dictionary.EnglishStopWords
 
contains(String) - Method in class smile.nlp.dictionary.SimpleDictionary
 
CooccurrenceKeywords - Interface in smile.nlp.keyword
Keyword extraction from a single document using word co-occurrence statistical information.
Corpus - Interface in smile.nlp
A corpus is a collection of documents.
count - Variable in class smile.nlp.collocation.Bigram
The frequency of bigram in the corpus.
count - Variable in class smile.nlp.collocation.NGram
The frequency of n-gram in the corpus.

D

Dictionary - Interface in smile.nlp.dictionary
A dictionary is a set of words in some natural language.
dimension() - Method in class smile.nlp.embedding.Word2Vec
Returns the dimension of vector space.

E

EnglishDictionary - Enum in smile.nlp.dictionary
A concise dictionary of common terms in English.
EnglishPOSLexicon - Class in smile.nlp.pos
An English lexicon with part-of-speech tags.
EnglishPunctuations - Class in smile.nlp.dictionary
Punctuation marks in English.
EnglishStopWords - Enum in smile.nlp.dictionary
Several sets of English stop words.
equals(Object) - Method in class smile.nlp.Bigram
 
equals(Object) - Method in class smile.nlp.NGram
 
equals(Object) - Method in class smile.nlp.SimpleText
 

F

fit(String[][], PennTreebankPOS[][]) - Static method in class smile.nlp.pos.HMMPOSTagger
Fits an HMM POS tagger by maximum likelihood estimation.

G

get(String) - Method in class smile.nlp.embedding.Word2Vec
Returns the vector embedding of a word.
get(String) - Static method in class smile.nlp.pos.EnglishPOSLexicon
Returns part-of-speech tags for given word, or null if the word does not exist in the dictionary.
get(K[]) - Method in class smile.nlp.Trie
Returns the associated value of a given key.
get(K) - Method in class smile.nlp.Trie
Returns the node of a given key.
getAbbreviation(String) - Method in interface smile.nlp.dictionary.Abbreviations
Returns the abbreviation for a word.
getAnchor() - Method in interface smile.nlp.AnchorText
Returns the anchor text if any.
getAnchor() - Method in class smile.nlp.SimpleText
Returns the anchor text if any.
getAverageDocumentSize() - Method in interface smile.nlp.Corpus
Returns the average size of documents in the corpus.
getAverageDocumentSize() - Method in class smile.nlp.SimpleCorpus
 
getBigramFrequency(Bigram) - Method in interface smile.nlp.Corpus
Returns the total frequency of the bigram in the corpus.
getBigramFrequency(Bigram) - Method in class smile.nlp.SimpleCorpus
 
getBigrams() - Method in interface smile.nlp.Corpus
Returns an iterator over the bigrams in the corpus.
getBigrams() - Method in class smile.nlp.SimpleCorpus
 
getChild(K[], int) - Method in class smile.nlp.Trie.Node
 
getChild(K) - Method in class smile.nlp.Trie.Node
 
getDefault() - Static method in class smile.nlp.pos.HMMPOSTagger
Returns the default English POS tagger.
getFull(String) - Method in interface smile.nlp.dictionary.Abbreviations
Returns the full word for a given abbreviation.
getInstance() - Static method in class smile.nlp.dictionary.EnglishPunctuations
Returns the singleton instance.
getInstance() - Static method in class smile.nlp.normalizer.SimpleNormalizer
Returns the singleton instance.
getInstance() - Static method in class smile.nlp.tokenizer.PennTreebankTokenizer
Returns the singleton instance.
getInstance() - Static method in class smile.nlp.tokenizer.SimpleParagraphSplitter
Returns the singleton instance.
getInstance() - Static method in class smile.nlp.tokenizer.SimpleSentenceSplitter
Returns the singleton instance.
getKey() - Method in class smile.nlp.Trie.Node
 
getNumBigrams() - Method in interface smile.nlp.Corpus
Returns the number of bigrams in the corpus.
getNumBigrams() - Method in class smile.nlp.SimpleCorpus
 
getNumDocuments() - Method in interface smile.nlp.Corpus
Returns the number of documents in the corpus.
getNumDocuments() - Method in class smile.nlp.SimpleCorpus
 
getNumTerms() - Method in interface smile.nlp.Corpus
Returns the number of unique terms in the corpus.
getNumTerms() - Method in class smile.nlp.SimpleCorpus
 
getTermFrequency(String) - Method in interface smile.nlp.Corpus
Returns the total frequency of the term in the corpus.
getTermFrequency(String) - Method in class smile.nlp.SimpleCorpus
 
getTerms() - Method in interface smile.nlp.Corpus
Returns an iterator over the terms in the corpus.
getTerms() - Method in class smile.nlp.SimpleCorpus
 
getValue(String) - Static method in enum smile.nlp.pos.PennTreebankPOS
Returns an enum value from a string.
getValue() - Method in class smile.nlp.Trie.Node
 
GloVe - Class in smile.nlp.embedding
Global Vectors for Word Representation.
GloVe() - Constructor for class smile.nlp.embedding.GloVe
 

H

hashCode() - Method in class smile.nlp.Bigram
 
hashCode() - Method in class smile.nlp.NGram
 
hashCode() - Method in class smile.nlp.SimpleText
 
HMMPOSTagger - Class in smile.nlp.pos
Part-of-speech tagging with hidden Markov model.
HMMPOSTagger() - Constructor for class smile.nlp.pos.HMMPOSTagger
Constructor.

I

id - Variable in class smile.nlp.Text
The id of document in the corpus.
iterator() - Method in interface smile.nlp.dictionary.Dictionary
Returns an iterator over the elements in this dictionary.
iterator() - Method in enum smile.nlp.dictionary.EnglishDictionary
 
iterator() - Method in class smile.nlp.dictionary.EnglishPunctuations
 
iterator() - Method in enum smile.nlp.dictionary.EnglishStopWords
 
iterator() - Method in class smile.nlp.dictionary.SimpleDictionary
 

L

LancasterStemmer - Class in smile.nlp.stemmer
The Paice/Husk Lancaster stemming algorithm.
LancasterStemmer() - Constructor for class smile.nlp.stemmer.LancasterStemmer
Constructor with default rules.
LancasterStemmer(boolean) - Constructor for class smile.nlp.stemmer.LancasterStemmer
Constructor with default rules.
LancasterStemmer(InputStream) - Constructor for class smile.nlp.stemmer.LancasterStemmer
Constructor with customized rules.
LancasterStemmer(InputStream, boolean) - Constructor for class smile.nlp.stemmer.LancasterStemmer
Constructor with customized rules.

M

main(String[]) - Static method in class smile.nlp.pos.HMMPOSTagger
Train the default model on WSJ and BROWN datasets.
maxtf() - Method in class smile.nlp.SimpleText
 
maxtf() - Method in interface smile.nlp.TextTerms
Returns the maximum term frequency over all terms in the document.

N

NGram - Class in smile.nlp.collocation
An n-gram is a contiguous sequence of n words from a given sequence of text.
NGram(String[], int) - Constructor for class smile.nlp.collocation.NGram
Constructor.
NGram - Class in smile.nlp
An n-gram is a contiguous sequence of n words from a given sequence of text.
NGram(String[]) - Constructor for class smile.nlp.NGram
Constructor.
Node(K) - Constructor for class smile.nlp.Trie.Node
 
normalize(String) - Method in interface smile.nlp.normalizer.Normalizer
Normalize the given string.
normalize(String) - Method in class smile.nlp.normalizer.SimpleNormalizer
 
Normalizer - Interface in smile.nlp.normalizer
Normalization transforms text into a canonical form by removing unwanted variations.

O

of(Corpus, int, int) - Static method in class smile.nlp.collocation.Bigram
Finds top k bigram collocations in the given corpus.
of(Corpus, double, int) - Static method in class smile.nlp.collocation.Bigram
Finds bigram collocations in the given corpus whose p-value is less than the given threshold.
of(Collection<String[]>, int, int) - Static method in class smile.nlp.collocation.NGram
Extracts n-gram phrases by an Apiori-like algorithm.
of(Path) - Static method in class smile.nlp.embedding.GloVe
Loads a of(Path) - Static method in class smile.nlp.embedding.Word2Vec
Loads a pre-trained word2vec model from binary file of ByteOrder.LITTLE_ENDIAN.
of(Path, ByteOrder) - Static method in class smile.nlp.embedding.Word2Vec
Loads a pre-trained word2vec model from binary file.
of(String) - Static method in interface smile.nlp.keyword.CooccurrenceKeywords
Returns the top 10 keywords.
of(String, int) - Static method in interface smile.nlp.keyword.CooccurrenceKeywords
Returns a given number of top keywords.
open - Variable in enum smile.nlp.pos.PennTreebankPOS
True if the POS is a open class.

P

ParagraphSplitter - Interface in smile.nlp.tokenizer
A paragraph splitter segments text into paragraphs.
PennTreebankPOS - Enum in smile.nlp.pos
The Penn Treebank Tag set.
PennTreebankTokenizer - Class in smile.nlp.tokenizer
A word tokenizer that tokenizes English sentences using the conventions used by the Penn Treebank.
PorterStemmer - Class in smile.nlp.stemmer
Porter's stemming algorithm.
PorterStemmer() - Constructor for class smile.nlp.stemmer.PorterStemmer
Constructor.
POSTagger - Interface in smile.nlp.pos
Part-of-speech tagging (POS tagging) is the process of marking up the words in a sentence as corresponding to a particular part of speech.
Punctuations - Interface in smile.nlp.dictionary
Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.
put(K[], V) - Method in class smile.nlp.Trie
Add a key with associated value to the trie.

R

rank(Corpus, TextTerms, String, int, int) - Method in class smile.nlp.relevance.BM25
 
rank(Corpus, TextTerms, String[], int[], int) - Method in class smile.nlp.relevance.BM25
 
rank(Corpus, TextTerms, String, int, int) - Method in interface smile.nlp.relevance.RelevanceRanker
Returns a relevance score between a term and a document based on a corpus.
rank(Corpus, TextTerms, String[], int[], int) - Method in interface smile.nlp.relevance.RelevanceRanker
Returns a relevance score between a set of terms and a document based on a corpus.
rank(int, int, long, long) - Method in class smile.nlp.relevance.TFIDF
Returns a relevance score between a term and a document based on a corpus.
rank(Corpus, TextTerms, String, int, int) - Method in class smile.nlp.relevance.TFIDF
 
rank(Corpus, TextTerms, String[], int[], int) - Method in class smile.nlp.relevance.TFIDF
 
read(String, List<String[]>, List<PennTreebankPOS[]>) - Static method in class smile.nlp.pos.HMMPOSTagger
Load training data from a corpora.
Relevance - Class in smile.nlp.relevance
In the context of information retrieval, relevance denotes how well a retrieved set of documents meets the information need of the user.
Relevance(Text, double) - Constructor for class smile.nlp.relevance.Relevance
Constructor.
RelevanceRanker - Interface in smile.nlp.relevance
An interface to provide relevance ranking algorithm.

S

score - Variable in class smile.nlp.collocation.Bigram
The chi-square statistical score of the collocation.
score(int, int, double, int, int, double, int, int, double, long, long) - Method in class smile.nlp.relevance.BM25
Returns a relevance score between a term and a document based on a corpus.
score(double, long, long) - Method in class smile.nlp.relevance.BM25
Returns a relevance score between a term and a document based on a corpus.
score(double, int, double, long, long) - Method in class smile.nlp.relevance.BM25
Returns a relevance score between a term and a document based on a corpus.
score - Variable in class smile.nlp.relevance.Relevance
The relevance score.
search(String) - Method in interface smile.nlp.Corpus
Returns an iterator over the set of documents containing the given term.
search(RelevanceRanker, String) - Method in interface smile.nlp.Corpus
Returns an iterator over the set of documents containing the given term in descending order of relevance.
search(RelevanceRanker, String[]) - Method in interface smile.nlp.Corpus
Returns an iterator over the set of documents containing (at least one of) the given terms in descending order of relevance.
search(String) - Method in class smile.nlp.SimpleCorpus
 
search(RelevanceRanker, String) - Method in class smile.nlp.SimpleCorpus
 
search(RelevanceRanker, String[]) - Method in class smile.nlp.SimpleCorpus
 
SentenceSplitter - Interface in smile.nlp.tokenizer
A sentence splitter segments text into sentences (a string of words satisfying the grammatical rules of a language).
setAnchor(String) - Method in interface smile.nlp.AnchorText
Sets the anchor text.
setAnchor(String) - Method in class smile.nlp.SimpleText
Sets the anchor text.
SimpleCorpus - Class in smile.nlp
An in-memory text corpus.
SimpleCorpus() - Constructor for class smile.nlp.SimpleCorpus
Constructor.
SimpleCorpus(SentenceSplitter, Tokenizer, StopWords, Punctuations) - Constructor for class smile.nlp.SimpleCorpus
Constructor.
SimpleDictionary - Class in smile.nlp.dictionary
A simple implementation of dictionary interface.
SimpleDictionary(String) - Constructor for class smile.nlp.dictionary.SimpleDictionary
Constructor.
SimpleNormalizer - Class in smile.nlp.normalizer
A baseline normalizer for processing Unicode text.
SimpleParagraphSplitter - Class in smile.nlp.tokenizer
This is a simple paragraph splitter.
SimpleSentenceSplitter - Class in smile.nlp.tokenizer
This is a simple sentence splitter for English.
SimpleText - Class in smile.nlp
A list-of-words representation of documents.
SimpleText(String, String, String, String[]) - Constructor for class smile.nlp.SimpleText
Constructor.
SimpleTokenizer - Class in smile.nlp.tokenizer
A word tokenizer that tokenizes English sentences with some differences from TreebankWordTokenizer, notably on handling not-contractions.
SimpleTokenizer() - Constructor for class smile.nlp.tokenizer.SimpleTokenizer
Constructor.
SimpleTokenizer(boolean) - Constructor for class smile.nlp.tokenizer.SimpleTokenizer
Constructor.
size() - Method in interface smile.nlp.Corpus
Returns the number of words in the corpus.
size() - Method in interface smile.nlp.dictionary.Dictionary
Returns the number of elements in this dictionary.
size() - Method in enum smile.nlp.dictionary.EnglishDictionary
 
size() - Method in class smile.nlp.dictionary.EnglishPunctuations
 
size() - Method in enum smile.nlp.dictionary.EnglishStopWords
 
size() - Method in class smile.nlp.dictionary.SimpleDictionary
 
size() - Method in class smile.nlp.SimpleCorpus
 
size() - Method in class smile.nlp.SimpleText
 
size() - Method in interface smile.nlp.TextTerms
Returns the number of words.
size() - Method in class smile.nlp.Trie
Returns the number of entries.
smile.nlp - package smile.nlp
Natural language processing.
smile.nlp.collocation - package smile.nlp.collocation
Collocation finding algorithms.
smile.nlp.dictionary - package smile.nlp.dictionary
Common dictionaries such as stop words, punctuation, common English words, etc.
smile.nlp.embedding - package smile.nlp.embedding
Word embedding.
smile.nlp.keyword - package smile.nlp.keyword
Keyword extraction.
smile.nlp.normalizer - package smile.nlp.normalizer
Text normalization.
smile.nlp.pos - package smile.nlp.pos
Part-of-speech taggers.
smile.nlp.relevance - package smile.nlp.relevance
Term-document relevance ranking algorithms.
smile.nlp.stemmer - package smile.nlp.stemmer
English word stemmer algorithms.
smile.nlp.tokenizer - package smile.nlp.tokenizer
Sentence splitter and word tokenizer.
split(String) - Method in class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
 
split(String) - Method in class smile.nlp.tokenizer.BreakIteratorTokenizer
 
split(String) - Method in interface smile.nlp.tokenizer.ParagraphSplitter
Splits the text into paragraphs.
split(String) - Method in class smile.nlp.tokenizer.PennTreebankTokenizer
 
split(String) - Method in interface smile.nlp.tokenizer.SentenceSplitter
Splits the text into sentences.
split(String) - Method in class smile.nlp.tokenizer.SimpleParagraphSplitter
 
split(String) - Method in class smile.nlp.tokenizer.SimpleSentenceSplitter
 
split(String) - Method in class smile.nlp.tokenizer.SimpleTokenizer
 
split(String) - Method in interface smile.nlp.tokenizer.Tokenizer
Splits the string into a list of tokens.
stem(String) - Method in class smile.nlp.stemmer.LancasterStemmer
 
stem(String) - Method in class smile.nlp.stemmer.PorterStemmer
 
stem(String) - Method in interface smile.nlp.stemmer.Stemmer
Transforms a word into its root form.
Stemmer - Interface in smile.nlp.stemmer
A Stemmer transforms a word into its root form.
StopWords - Interface in smile.nlp.dictionary
A set of stop words in some language.
stripPluralParticiple(String) - Method in class smile.nlp.stemmer.PorterStemmer
Remove plurals and participles.

T

tag(String[]) - Method in class smile.nlp.pos.HMMPOSTagger
 
tag(String[]) - Method in interface smile.nlp.pos.POSTagger
Tags the sentence in the form of a sequence of words
text - Variable in class smile.nlp.relevance.Relevance
The document to rank.
Text - Class in smile.nlp
A minimal interface of text in the corpus.
Text(String) - Constructor for class smile.nlp.Text
Constructor.
Text(String, String) - Constructor for class smile.nlp.Text
Constructor.
Text(String, String, String) - Constructor for class smile.nlp.Text
Constructor.
TextTerms - Interface in smile.nlp
The terms in a text.
tf(String) - Method in class smile.nlp.SimpleText
 
tf(String) - Method in interface smile.nlp.TextTerms
Returns the term frequency.
TFIDF - Class in smile.nlp.relevance
The tf-idf weight (term frequency-inverse document frequency) is a weight often used in information retrieval and text mining.
TFIDF() - Constructor for class smile.nlp.relevance.TFIDF
Constructor.
TFIDF(double) - Constructor for class smile.nlp.relevance.TFIDF
Constructor.
title - Variable in class smile.nlp.Text
The title of document;
Tokenizer - Interface in smile.nlp.tokenizer
A token is a string of characters, categorized according to the rules as a symbol.
toString() - Method in class smile.nlp.Bigram
 
toString() - Method in class smile.nlp.collocation.Bigram
 
toString() - Method in class smile.nlp.collocation.NGram
 
toString() - Method in class smile.nlp.NGram
 
toString() - Method in class smile.nlp.SimpleText
 
Trie<K,V> - Class in smile.nlp
A trie, also called digital tree or prefix tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.
Trie() - Constructor for class smile.nlp.Trie
Constructor.
Trie(int) - Constructor for class smile.nlp.Trie
Constructor.
Trie.Node - Class in smile.nlp
The nodes in the trie.

U

unique() - Method in class smile.nlp.SimpleText
 
unique() - Method in interface smile.nlp.TextTerms
Returns the iterator of unique words.

V

valueOf(String) - Static method in enum smile.nlp.dictionary.EnglishDictionary
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum smile.nlp.dictionary.EnglishStopWords
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum smile.nlp.pos.PennTreebankPOS
Returns the enum constant of this type with the specified name.
values() - Static method in enum smile.nlp.dictionary.EnglishDictionary
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum smile.nlp.dictionary.EnglishStopWords
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum smile.nlp.pos.PennTreebankPOS
Returns an array containing the constants of this enum type, in the order they are declared.
vectors - Variable in class smile.nlp.embedding.Word2Vec
The vector space.

W

w1 - Variable in class smile.nlp.Bigram
Immutable first word of bigram.
w2 - Variable in class smile.nlp.Bigram
Immutable second word of bigram.
walkin(File, List<File>) - Static method in class smile.nlp.pos.HMMPOSTagger
Recursive function to descend into the directory tree and find all the files that end with ".POS"
Word2Vec - Class in smile.nlp.embedding
Word2vec is a group of related models that are used to produce word embeddings.
Word2Vec(String[], float[][]) - Constructor for class smile.nlp.embedding.Word2Vec
Constructor.
words - Variable in class smile.nlp.embedding.Word2Vec
The vocabulary.
words - Variable in class smile.nlp.NGram
Immutable word sequences.
words() - Method in class smile.nlp.SimpleText
 
words() - Method in interface smile.nlp.TextTerms
Returns the iterator of the words of the document.
A B C D E F G H I L M N O P R S T U V W 
Skip navigation links