Index

A B C D E F G H I L M N O P R S T U V W

A

Abbreviations - Interface in smile.nlp.dictionary: A dictionary interface for abbreviations.
add(Text) - Method in class smile.nlp.SimpleCorpus: Add a document to the corpus.
addAnchor(String) - Method in interface smile.nlp.AnchorText: Add a link label to the anchor text.
addAnchor(String) - Method in class smile.nlp.SimpleText
addChild(K[], V, int) - Method in class smile.nlp.Trie.Node
AnchorText - Interface in smile.nlp: The anchor text is the visible, clickable text in a hyperlink.
apply(String) - Method in class smile.nlp.embedding.Word2Vec: Returns the vector embedding of a word.

B

Bigram - Class in smile.nlp: Bigrams or digrams are groups of two words, and are very commonly used as the basis for simple statistical analysis of text.
Bigram(String, String) - Constructor for class smile.nlp.Bigram: Constructor.
Bigram - Class in smile.nlp.collocation: Collocations are expressions of multiple words which commonly co-occur.
Bigram(String, String, int, double) - Constructor for class smile.nlp.collocation.Bigram: Constructor.
BM25 - Class in smile.nlp.relevance: The BM25 weighting scheme, often called Okapi weighting, after the system in which it was first implemented, was developed as a way of building a probabilistic model sensitive to term frequency and document length while not introducing too many additional parameters into the model.
BM25() - Constructor for class smile.nlp.relevance.BM25: Default constructor with k1 = 1.2, b = 0.75, delta = 1.0.
BM25(double, double, double) - Constructor for class smile.nlp.relevance.BM25: Constructor.
body - Variable in class smile.nlp.Text: The text body.
BreakIteratorSentenceSplitter - Class in smile.nlp.tokenizer: A sentence splitter based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).
BreakIteratorSentenceSplitter() - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter: Constructor for the default locale.
BreakIteratorSentenceSplitter(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorSentenceSplitter: Constructor for the given locale.
BreakIteratorTokenizer - Class in smile.nlp.tokenizer: A word tokenizer based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).
BreakIteratorTokenizer() - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer: Constructor for the default locale.
BreakIteratorTokenizer(Locale) - Constructor for class smile.nlp.tokenizer.BreakIteratorTokenizer: Constructor for the given locale.

C

compareTo(Bigram) - Method in class smile.nlp.collocation.Bigram
compareTo(NGram) - Method in class smile.nlp.collocation.NGram
compareTo(Relevance) - Method in class smile.nlp.relevance.Relevance
contains(String) - Method in interface smile.nlp.dictionary.Dictionary: Returns true if this dictionary contains the specified word.
contains(String) - Method in enum smile.nlp.dictionary.EnglishDictionary
contains(String) - Method in class smile.nlp.dictionary.EnglishPunctuations
contains(String) - Method in enum smile.nlp.dictionary.EnglishStopWords
contains(String) - Method in class smile.nlp.dictionary.SimpleDictionary
CooccurrenceKeywords - Interface in smile.nlp.keyword: Keyword extraction from a single document using word co-occurrence statistical information.
Corpus - Interface in smile.nlp: A corpus is a collection of documents.
count - Variable in class smile.nlp.collocation.Bigram: The frequency of bigram in the corpus.
count - Variable in class smile.nlp.collocation.NGram: The frequency of n-gram in the corpus.

D

Dictionary - Interface in smile.nlp.dictionary: A dictionary is a set of words in some natural language.
dimension() - Method in class smile.nlp.embedding.Word2Vec: Returns the dimension of vector space.

E

EnglishDictionary - Enum in smile.nlp.dictionary: A concise dictionary of common terms in English.
EnglishPOSLexicon - Class in smile.nlp.pos: An English lexicon with part-of-speech tags.
EnglishPunctuations - Class in smile.nlp.dictionary: Punctuation marks in English.
EnglishStopWords - Enum in smile.nlp.dictionary: Several sets of English stop words.
equals(Object) - Method in class smile.nlp.Bigram
equals(Object) - Method in class smile.nlp.NGram
equals(Object) - Method in class smile.nlp.SimpleText

F

fit(String[][], PennTreebankPOS[][]) - Static method in class smile.nlp.pos.HMMPOSTagger: Fits an HMM POS tagger by maximum likelihood estimation.

G

get(String) - Method in class smile.nlp.embedding.Word2Vec: Returns the vector embedding of a word.
get(String) - Static method in class smile.nlp.pos.EnglishPOSLexicon: Returns part-of-speech tags for given word, or null if the word does not exist in the dictionary.
get(K[]) - Method in class smile.nlp.Trie: Returns the associated value of a given key.
get(K) - Method in class smile.nlp.Trie: Returns the node of a given key.
getAbbreviation(String) - Method in interface smile.nlp.dictionary.Abbreviations: Returns the abbreviation for a word.
getAnchor() - Method in interface smile.nlp.AnchorText: Returns the anchor text if any.
getAnchor() - Method in class smile.nlp.SimpleText: Returns the anchor text if any.
getAverageDocumentSize() - Method in interface smile.nlp.Corpus: Returns the average size of documents in the corpus.
getAverageDocumentSize() - Method in class smile.nlp.SimpleCorpus
getBigramFrequency(Bigram) - Method in interface smile.nlp.Corpus: Returns the total frequency of the bigram in the corpus.
getBigramFrequency(Bigram) - Method in class smile.nlp.SimpleCorpus
getBigrams() - Method in interface smile.nlp.Corpus: Returns an iterator over the bigrams in the corpus.
getBigrams() - Method in class smile.nlp.SimpleCorpus
getChild(K[], int) - Method in class smile.nlp.Trie.Node
getChild(K) - Method in class smile.nlp.Trie.Node
getDefault() - Static method in class smile.nlp.pos.HMMPOSTagger: Returns the default English POS tagger.
getFull(String) - Method in interface smile.nlp.dictionary.Abbreviations: Returns the full word for a given abbreviation.
getInstance() - Static method in class smile.nlp.dictionary.EnglishPunctuations: Returns the singleton instance.
getInstance() - Static method in class smile.nlp.normalizer.SimpleNormalizer: Returns the singleton instance.
getInstance() - Static method in class smile.nlp.tokenizer.PennTreebankTokenizer: Returns the singleton instance.
getInstance() - Static method in class smile.nlp.tokenizer.SimpleParagraphSplitter: Returns the singleton instance.
getInstance() - Static method in class smile.nlp.tokenizer.SimpleSentenceSplitter: Returns the singleton instance.
getKey() - Method in class smile.nlp.Trie.Node
getNumBigrams() - Method in interface smile.nlp.Corpus: Returns the number of bigrams in the corpus.
getNumBigrams() - Method in class smile.nlp.SimpleCorpus
getNumDocuments() - Method in interface smile.nlp.Corpus: Returns the number of documents in the corpus.
getNumDocuments() - Method in class smile.nlp.SimpleCorpus
getNumTerms() - Method in interface smile.nlp.Corpus: Returns the number of unique terms in the corpus.
getNumTerms() - Method in class smile.nlp.SimpleCorpus
getTermFrequency(String) - Method in interface smile.nlp.Corpus: Returns the total frequency of the term in the corpus.
getTermFrequency(String) - Method in class smile.nlp.SimpleCorpus
getTerms() - Method in interface smile.nlp.Corpus: Returns an iterator over the terms in the corpus.
getTerms() - Method in class smile.nlp.SimpleCorpus
getValue(String) - Static method in enum smile.nlp.pos.PennTreebankPOS: Returns an enum value from a string.
getValue() - Method in class smile.nlp.Trie.Node
GloVe - Class in smile.nlp.embedding: Global Vectors for Word Representation.
GloVe() - Constructor for class smile.nlp.embedding.GloVe

H

hashCode() - Method in class smile.nlp.Bigram
hashCode() - Method in class smile.nlp.NGram
hashCode() - Method in class smile.nlp.SimpleText
HMMPOSTagger - Class in smile.nlp.pos: Part-of-speech tagging with hidden Markov model.
HMMPOSTagger() - Constructor for class smile.nlp.pos.HMMPOSTagger: Constructor.

I

id - Variable in class smile.nlp.Text: The id of document in the corpus.
iterator() - Method in interface smile.nlp.dictionary.Dictionary: Returns an iterator over the elements in this dictionary.
iterator() - Method in enum smile.nlp.dictionary.EnglishDictionary
iterator() - Method in class smile.nlp.dictionary.EnglishPunctuations
iterator() - Method in enum smile.nlp.dictionary.EnglishStopWords
iterator() - Method in class smile.nlp.dictionary.SimpleDictionary

L

LancasterStemmer - Class in smile.nlp.stemmer: The Paice/Husk Lancaster stemming algorithm.
LancasterStemmer() - Constructor for class smile.nlp.stemmer.LancasterStemmer: Constructor with default rules.
LancasterStemmer(boolean) - Constructor for class smile.nlp.stemmer.LancasterStemmer: Constructor with default rules.
LancasterStemmer(InputStream) - Constructor for class smile.nlp.stemmer.LancasterStemmer: Constructor with customized rules.
LancasterStemmer(InputStream, boolean) - Constructor for class smile.nlp.stemmer.LancasterStemmer: Constructor with customized rules.

M

main(String[]) - Static method in class smile.nlp.pos.HMMPOSTagger: Train the default model on WSJ and BROWN datasets.
maxtf() - Method in class smile.nlp.SimpleText
maxtf() - Method in interface smile.nlp.TextTerms: Returns the maximum term frequency over all terms in the document.

N

NGram - Class in smile.nlp.collocation: An n-gram is a contiguous sequence of n words from a given sequence of text.
NGram(String[], int) - Constructor for class smile.nlp.collocation.NGram: Constructor.
NGram - Class in smile.nlp: An n-gram is a contiguous sequence of n words from a given sequence of text.
NGram(String[]) - Constructor for class smile.nlp.NGram: Constructor.
Node(K) - Constructor for class smile.nlp.Trie.Node
normalize(String) - Method in interface smile.nlp.normalizer.Normalizer: Normalize the given string.
normalize(String) - Method in class smile.nlp.normalizer.SimpleNormalizer
Normalizer - Interface in smile.nlp.normalizer: Normalization transforms text into a canonical form by removing unwanted variations.

O

of(Corpus, int, int) - Static method in class smile.nlp.collocation.Bigram: Finds top k bigram collocations in the given corpus.
of(Corpus, double, int) - Static method in class smile.nlp.collocation.Bigram: Finds bigram collocations in the given corpus whose p-value is less than the given threshold.
of(Collection<String[]>, int, int) - Static method in class smile.nlp.collocation.NGram: Extracts n-gram phrases by an Apiori-like algorithm.
of(Path) - Static method in class smile.nlp.embedding.GloVe: Loads a of(Path) - Static method in class smile.nlp.embedding.Word2Vec; Loads a pre-trained word2vec model from binary file of ByteOrder.LITTLE_ENDIAN.
of(Path, ByteOrder) - Static method in class smile.nlp.embedding.Word2Vec: Loads a pre-trained word2vec model from binary file.
of(String) - Static method in interface smile.nlp.keyword.CooccurrenceKeywords: Returns the top 10 keywords.
of(String, int) - Static method in interface smile.nlp.keyword.CooccurrenceKeywords: Returns a given number of top keywords.
open - Variable in enum smile.nlp.pos.PennTreebankPOS: True if the POS is a open class.

P

ParagraphSplitter - Interface in smile.nlp.tokenizer: A paragraph splitter segments text into paragraphs.
PennTreebankPOS - Enum in smile.nlp.pos: The Penn Treebank Tag set.
PennTreebankTokenizer - Class in smile.nlp.tokenizer: A word tokenizer that tokenizes English sentences using the conventions used by the Penn Treebank.
PorterStemmer - Class in smile.nlp.stemmer: Porter's stemming algorithm.
PorterStemmer() - Constructor for class smile.nlp.stemmer.PorterStemmer: Constructor.
POSTagger - Interface in smile.nlp.pos: Part-of-speech tagging (POS tagging) is the process of marking up the words in a sentence as corresponding to a particular part of speech.
Punctuations - Interface in smile.nlp.dictionary: Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.
put(K[], V) - Method in class smile.nlp.Trie: Add a key with associated value to the trie.

R

rank(Corpus, TextTerms, String, int, int) - Method in class smile.nlp.relevance.BM25
rank(Corpus, TextTerms, String[], int[], int) - Method in class smile.nlp.relevance.BM25
rank(Corpus, TextTerms, String, int, int) - Method in interface smile.nlp.relevance.RelevanceRanker: Returns a relevance score between a term and a document based on a corpus.
rank(Corpus, TextTerms, String[], int[], int) - Method in interface smile.nlp.relevance.RelevanceRanker: Returns a relevance score between a set of terms and a document based on a corpus.
rank(int, int, long, long) - Method in class smile.nlp.relevance.TFIDF: Returns a relevance score between a term and a document based on a corpus.
rank(Corpus, TextTerms, String, int, int) - Method in class smile.nlp.relevance.TFIDF
rank(Corpus, TextTerms, String[], int[], int) - Method in class smile.nlp.relevance.TFIDF
read(String, List<String[]>, List<PennTreebankPOS[]>) - Static method in class smile.nlp.pos.HMMPOSTagger: Load training data from a corpora.
Relevance - Class in smile.nlp.relevance: In the context of information retrieval, relevance denotes how well a retrieved set of documents meets the information need of the user.
Relevance(Text, double) - Constructor for class smile.nlp.relevance.Relevance: Constructor.
RelevanceRanker - Interface in smile.nlp.relevance: An interface to provide relevance ranking algorithm.

S

score - Variable in class smile.nlp.collocation.Bigram: The chi-square statistical score of the collocation.
score(int, int, double, int, int, double, int, int, double, long, long) - Method in class smile.nlp.relevance.BM25: Returns a relevance score between a term and a document based on a corpus.
score(double, long, long) - Method in class smile.nlp.relevance.BM25: Returns a relevance score between a term and a document based on a corpus.
score(double, int, double, long, long) - Method in class smile.nlp.relevance.BM25: Returns a relevance score between a term and a document based on a corpus.
score - Variable in class smile.nlp.relevance.Relevance: The relevance score.
search(String) - Method in interface smile.nlp.Corpus: Returns an iterator over the set of documents containing the given term.
search(RelevanceRanker, String) - Method in interface smile.nlp.Corpus: Returns an iterator over the set of documents containing the given term in descending order of relevance.
search(RelevanceRanker, String[]) - Method in interface smile.nlp.Corpus: Returns an iterator over the set of documents containing (at least one of) the given terms in descending order of relevance.
search(String) - Method in class smile.nlp.SimpleCorpus
search(RelevanceRanker, String) - Method in class smile.nlp.SimpleCorpus
search(RelevanceRanker, String[]) - Method in class smile.nlp.SimpleCorpus
SentenceSplitter - Interface in smile.nlp.tokenizer: A sentence splitter segments text into sentences (a string of words satisfying the grammatical rules of a language).
setAnchor(String) - Method in interface smile.nlp.AnchorText: Sets the anchor text.
setAnchor(String) - Method in class smile.nlp.SimpleText: Sets the anchor text.
SimpleCorpus - Class in smile.nlp: An in-memory text corpus.
SimpleCorpus() - Constructor for class smile.nlp.SimpleCorpus: Constructor.
SimpleCorpus(SentenceSplitter, Tokenizer, StopWords, Punctuations) - Constructor for class smile.nlp.SimpleCorpus: Constructor.
SimpleDictionary - Class in smile.nlp.dictionary: A simple implementation of dictionary interface.
SimpleDictionary(String) - Constructor for class smile.nlp.dictionary.SimpleDictionary: Constructor.
SimpleNormalizer - Class in smile.nlp.normalizer: A baseline normalizer for processing Unicode text.
SimpleParagraphSplitter - Class in smile.nlp.tokenizer: This is a simple paragraph splitter.
SimpleSentenceSplitter - Class in smile.nlp.tokenizer: This is a simple sentence splitter for English.
SimpleText - Class in smile.nlp: A list-of-words representation of documents.
SimpleText(String, String, String, String[]) - Constructor for class smile.nlp.SimpleText: Constructor.
SimpleTokenizer - Class in smile.nlp.tokenizer: A word tokenizer that tokenizes English sentences with some differences from TreebankWordTokenizer, notably on handling not-contractions.
SimpleTokenizer() - Constructor for class smile.nlp.tokenizer.SimpleTokenizer: Constructor.
SimpleTokenizer(boolean) - Constructor for class smile.nlp.tokenizer.SimpleTokenizer: Constructor.
size() - Method in interface smile.nlp.Corpus: Returns the number of words in the corpus.
size() - Method in interface smile.nlp.dictionary.Dictionary: Returns the number of elements in this dictionary.
size() - Method in enum smile.nlp.dictionary.EnglishDictionary
size() - Method in class smile.nlp.dictionary.EnglishPunctuations
size() - Method in enum smile.nlp.dictionary.EnglishStopWords
size() - Method in class smile.nlp.dictionary.SimpleDictionary
size() - Method in class smile.nlp.SimpleCorpus
size() - Method in class smile.nlp.SimpleText
size() - Method in interface smile.nlp.TextTerms: Returns the number of words.
size() - Method in class smile.nlp.Trie: Returns the number of entries.
smile.nlp - package smile.nlp: Natural language processing.
smile.nlp.collocation - package smile.nlp.collocation: Collocation finding algorithms.
smile.nlp.dictionary - package smile.nlp.dictionary: Common dictionaries such as stop words, punctuation, common English words, etc.
smile.nlp.embedding - package smile.nlp.embedding: Word embedding.
smile.nlp.keyword - package smile.nlp.keyword: Keyword extraction.
smile.nlp.normalizer - package smile.nlp.normalizer: Text normalization.
smile.nlp.pos - package smile.nlp.pos: Part-of-speech taggers.
smile.nlp.relevance - package smile.nlp.relevance: Term-document relevance ranking algorithms.
smile.nlp.stemmer - package smile.nlp.stemmer: English word stemmer algorithms.
smile.nlp.tokenizer - package smile.nlp.tokenizer: Sentence splitter and word tokenizer.
split(String) - Method in class smile.nlp.tokenizer.BreakIteratorSentenceSplitter
split(String) - Method in class smile.nlp.tokenizer.BreakIteratorTokenizer
split(String) - Method in interface smile.nlp.tokenizer.ParagraphSplitter: Splits the text into paragraphs.
split(String) - Method in class smile.nlp.tokenizer.PennTreebankTokenizer
split(String) - Method in interface smile.nlp.tokenizer.SentenceSplitter: Splits the text into sentences.
split(String) - Method in class smile.nlp.tokenizer.SimpleParagraphSplitter
split(String) - Method in class smile.nlp.tokenizer.SimpleSentenceSplitter
split(String) - Method in class smile.nlp.tokenizer.SimpleTokenizer
split(String) - Method in interface smile.nlp.tokenizer.Tokenizer: Splits the string into a list of tokens.
stem(String) - Method in class smile.nlp.stemmer.LancasterStemmer
stem(String) - Method in class smile.nlp.stemmer.PorterStemmer
stem(String) - Method in interface smile.nlp.stemmer.Stemmer: Transforms a word into its root form.
Stemmer - Interface in smile.nlp.stemmer: A Stemmer transforms a word into its root form.
StopWords - Interface in smile.nlp.dictionary: A set of stop words in some language.
stripPluralParticiple(String) - Method in class smile.nlp.stemmer.PorterStemmer: Remove plurals and participles.

T

tag(String[]) - Method in class smile.nlp.pos.HMMPOSTagger
tag(String[]) - Method in interface smile.nlp.pos.POSTagger: Tags the sentence in the form of a sequence of words
text - Variable in class smile.nlp.relevance.Relevance: The document to rank.
Text - Class in smile.nlp: A minimal interface of text in the corpus.
Text(String) - Constructor for class smile.nlp.Text: Constructor.
Text(String, String) - Constructor for class smile.nlp.Text: Constructor.
Text(String, String, String) - Constructor for class smile.nlp.Text: Constructor.
TextTerms - Interface in smile.nlp: The terms in a text.
tf(String) - Method in class smile.nlp.SimpleText
tf(String) - Method in interface smile.nlp.TextTerms: Returns the term frequency.
TFIDF - Class in smile.nlp.relevance: The tf-idf weight (term frequency-inverse document frequency) is a weight often used in information retrieval and text mining.
TFIDF() - Constructor for class smile.nlp.relevance.TFIDF: Constructor.
TFIDF(double) - Constructor for class smile.nlp.relevance.TFIDF: Constructor.
title - Variable in class smile.nlp.Text: The title of document;
Tokenizer - Interface in smile.nlp.tokenizer: A token is a string of characters, categorized according to the rules as a symbol.
toString() - Method in class smile.nlp.Bigram
toString() - Method in class smile.nlp.collocation.Bigram
toString() - Method in class smile.nlp.collocation.NGram
toString() - Method in class smile.nlp.NGram
toString() - Method in class smile.nlp.SimpleText
Trie<K,V> - Class in smile.nlp: A trie, also called digital tree or prefix tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.
Trie() - Constructor for class smile.nlp.Trie: Constructor.
Trie(int) - Constructor for class smile.nlp.Trie: Constructor.
Trie.Node - Class in smile.nlp: The nodes in the trie.

U

unique() - Method in class smile.nlp.SimpleText
unique() - Method in interface smile.nlp.TextTerms: Returns the iterator of unique words.

V

valueOf(String) - Static method in enum smile.nlp.dictionary.EnglishDictionary: Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum smile.nlp.dictionary.EnglishStopWords: Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum smile.nlp.pos.PennTreebankPOS: Returns the enum constant of this type with the specified name.
values() - Static method in enum smile.nlp.dictionary.EnglishDictionary: Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum smile.nlp.dictionary.EnglishStopWords: Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum smile.nlp.pos.PennTreebankPOS: Returns an array containing the constants of this enum type, in the order they are declared.
vectors - Variable in class smile.nlp.embedding.Word2Vec: The vector space.

W

w1 - Variable in class smile.nlp.Bigram: Immutable first word of bigram.
w2 - Variable in class smile.nlp.Bigram: Immutable second word of bigram.
walkin(File, List<File>) - Static method in class smile.nlp.pos.HMMPOSTagger: Recursive function to descend into the directory tree and find all the files that end with ".POS"
Word2Vec - Class in smile.nlp.embedding: Word2vec is a group of related models that are used to produce word embeddings.
Word2Vec(String[], float[][]) - Constructor for class smile.nlp.embedding.Word2Vec: Constructor.
words - Variable in class smile.nlp.embedding.Word2Vec: The vocabulary.
words - Variable in class smile.nlp.NGram: Immutable word sequences.
words() - Method in class smile.nlp.SimpleText
words() - Method in interface smile.nlp.TextTerms: Returns the iterator of the words of the document.

A B C D E F G H I L M N O P R S T U V W