All Classes (predict4all 1.0.0 API)

All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class	Description
AbstractLanguageModel
AbstractNGramDictionary<T extends AbstractNGramTrieNode<T>>	Represent an ngram dictionary in an abstract way : dictionary can be static or dynamic. Each type of dictionary can or can't support operation, such as dictionary saving, or updating probabilities. The dictionary has a `AbstractNGramDictionary.maxOrder` that represents the max order gram that can be found in the dictionary.
AbstractNGramTrieNode<T extends AbstractNGramTrieNode<?>>	Represent a node in a trie structure to represent ngrams.
AbstractPredictionToCompute
AbstractRecursiveMatcher
AbstractTokenTrainingDocument
AbstractTrainingDocument
AbstractWord
AcronymMatcher
ApostropheMatcher
BaseWordDictionary	A language specific dictionary : contains lower case words and their unigram frequencies.
BiIntegerKey
CoOccurrenceKey
CorrectionRule
CorrectionRuleBuilder
CorrectionRuleNode
CorrectionRuleNode.CorrectionRuleNodeType
DaemonThreadFactory
DataTrainer	Class to create prediction data to be used with a word predictor.
DataTrainerResult
DataTrainerResult.Builder	Builder to build `DataTrainerResult`.
DateDayMonthMatcher
DateFullDigitMatcher
DateFullTextMatcher
DateMonthYearMatcher
DateWeekDayMatcher
DefaultCorrectionRuleGenerator
DefaultCorrectionRuleGenerator.CorrectionRuleType
DefaultCorrectionRuleGenerator.TranslationProvider
DoublePredictionToCompute	Represent the prediction for two word in a row. Could have been generic (more than two, but for computing performance, limit combination to two word only)
DynamicNGramDictionary	Represent a `TrainingNGramDictionary` that can also be opened to be trained again. This type of dictionary is useful when using a dynamic user model : the dynamic user dictionary is loaded and trained during each session, and then saved to be used in the next sessions.
DynamicNGramTrieNode	Represent a dynamic trie node structure : this trie node is useful when the ngram count has to be retrieved. Dynamic trie node children are always fully loaded (they are not loaded on demand) and their frequencies can change. Because dynamic trie node are used to be saved and loaded as `StaticNGramTrieNode` or `DynamicNGramTrieNode` they contains two write method : `DynamicNGramTrieNode.writeStaticNode(FileChannel, int)` if they are saved to be loaded as `StaticNGramTrieNode` and `DynamicNGramTrieNode.writeDynamicNode(FileChannel, int)` if they are saved to be loaded as `DynamicNGramTrieNode` : one save static information about the node (frequency, bow), the other only save dynamic information (count) because frequencies are dynamically computed.
EquivalenceClass	Represent a equivalence class type that can be used when training a language model. Useful to group same kind of element in a corpus under a same concept instead of textual data. 3 These are especially used in semantic data.
EquivalenceClassToken
EquivalenceClassWord
FifoSet<T>	A set maintaining exactly `FifoSet.maxSize` or less but keeping there insertion order to always delete the first inserted element when set is full.
FrenchBaseWordDictionary	French dictionary based on Lexique.org
FrenchLanguageModel
FrenchLanguageUtils	Utils methods for french language.
FrenchStopWordDictionary
GeneratingCorrection
GeneratingCorrectionI
HyphenMatcher	Term matcher to match word sequence with hyphen between each word. The sequence should start and end with hyphen, examples : a-t : valid a-t-elle : valid a-t-elle- : not valid -test- : not valid
LanguageDataModelTrainer
LanguageDataModelTrainerArgs
LanguageModel	Represent a model specific to the input language. This model is useful to better perform on NLP task by using specific parameters from a language. E.G.
LoggingProgressIndicator
NextWord
NGramDebugger	This interface can be used to check an ngram dictionary while training models.
NGramDictionaryGenerator	Use this generator to train an ngram model. It will load texts from a `TrainingCorpus` and generate a ngram file that could be later opened with a `StaticNGramTrieDictionary`
NGramKey
NGramPruningMethod
NGramTrainingDocument
NGramWordPredictorUtils	Utils class useful when predicting words with an ngram dictionaries.
NoOpProgressIndicator
NumberDecimalMatcher
NumberIntMatcher
Pair<K,T>
ParserTrainingDocument
PatternMatched
PercentMatcher
Predict4AllInfo	This retrieves information about the library (version and build date). This should mostly be used to ensure consistency on saved data (i.e. save and load data from same versions)
Predict4AllUtils	Contains different utils methods that are used in NLP taks.
PredictionParameter
ProgressIndicator
ProperNameMatcher
SemanticDictionary	Represents a semantic dictionary to be used to predict next words. WARNING : THIS IS A WIP
SemanticDictionaryConfiguration
SemanticDictionaryGenerator	To generate a `SemanticDictionary` from an input corpus. This creates a term x term matrix and then reduces it with SVD (via an optimized R script, "Rscript" should be available in path).
SemanticTrainingDocument
Separator	Represent chars between words. This is preferred to regex pattern because separator are fully controlled. If you add any new separator, watch the last used id
SeparatorToken
SimpleGeneratingCorrection
SimpleWord
SingleThreadDoubleAdder	Similar to `DoubleAdder` but for a single threaded usage. Just a simple double reference without any overhead.
SpecialWordMatcher
StaticNGramTrieDictionary	Represent a static ngram dictionary where trie node are loaded "on demand" while browsing through the nodes. This dictionary is read only and cannot be updated or saved : methods like `StaticNGramTrieDictionary.updateProbabilities(double[])`, `StaticNGramTrieDictionary.putAndIncrementBy(int[], int)` are not supported by this dictionary.
StaticNGramTrieNode	Represent a static ngram trie node : when node are used only to retrieve information and compute probabilities, but children are never updated. This node is particular because children node are loaded on demand from a `FileChannel`. This node is produced in a read only version : to create this node, `DynamicNGramTrieNode` and `TrainingNGramDictionary` should be used.
StopWordDictionary	A language specific dictionary : contains every stop words for a language
StringProducer
Tag	Represent a specific value in a corpus. Useful to tag specific part of the corpus without any semantic information. START : represent a sentence start UNKNOWN : represent a word/expression out of vocabulary
TagToken
TagWord
TermMatcherUtils
Token	Represent the lowest unit when parsing a text.
TokenAppender
TokenConverter	This token converter will convert input token list to another token list, with matched `TokenMatcher` pattern.
TokenConverterTrainingDocument
TokenFileInputStream
TokenFileOutputStream
Tokenizer	This takes a raw text and to create tokens from it.
TokenListAppender
TokenListProvider
TokenMatcher	Represent a matcher that will try to detect if a given token match a specific pattern. If so, the `PatternMatched` contains the the normalized representation of the matched tokens and eventually an `EquivalenceClass`.
TokenProvider
TokenRegexMatcher
TokenRegexMatcher.TokenRegexMatcherBuilder
TokenRegexResult
TrainerTask
TrainingConfiguration
TrainingCorpus
TrainingNGramDictionary	Represent a training dictionary : a ngram dictionary used while training an ngram model. This dictionary is useful because it supports dynamic insertion and probabilities computing...
TrainingStep	Represent the possible training steps. This allow training to be stopped and started again at a specific step : going to converted tokens, and then running WORDS_DICTIONARY multiple times.
TrieNodeMap<V>	Custom implementation copied from `TIntObjectHashMap` but with less attribute to reduce the heap size in Trie. Source is copied from class hierarchy (with manually merging methods): `THash` `TPrimitiveHash` `TIntHash` `TIntObjectHashMap` The implementation is modified to keep the minimum attribute count on this Map because this TrieNodeMap will be created a lot of time !
TrieNodeMapConstant
Triple<K,T,V>
UniquePredictionToCompute
UserWord
Word	Represent a word stored in a `WordDictionary` - word are stored with a int ID to optimize memory usage.
WordCorrectionGenerator	Idées inversion à distance de 2 = "renuméré" Gestion des inversions
WordDictionary	Represent a word dictionary. This dictionary identify each sequence of chars as an unique "word" and keep information for this word (frequency, etc...)
WordDictionaryGenerator	This will generate a word dictionary from a `TrainingCorpus` : this will detect different word in training corpus and try to filter out words : match lower/upper case words, filter on a `BaseWordDictionary`, exclude low count words, etc.
WordDictionaryMatchingException	This exception is mainly thrown if an user dictionary is loaded but is was saved from a previous dictionary.
WordDictionaryTrainingDocument
WordFileInputStream
WordFileOutputStream
WordPrediction
WordPredictionResult
WordPredictor
WordPrefixDetected	Contains information about a started word (found in dictionary)
WordPrefixDetector	Useful to detect if a existing word is started in a token list. It's important to detect if a word is already started when predicting next word, because the prediction result should always takes care of giving prediction result that starts like the already started word. Because word are allowed to have word separator inside (hyphen, etc...), started word detection is much more complicated that just checking if the token list ends with a token separator.
WordToken