All Classes (predict4all-core 1.2.0 API)

All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class	Description
AbstractLanguageModel
AbstractNGramDictionary<T extends AbstractNGramTrieNode<T>>	Represent an ngram dictionary in an abstract way : dictionary can be static or dynamic. Each type of dictionary can or can't support operation, such as dictionary saving, or updating probabilities. The dictionary has a `AbstractNGramDictionary.maxOrder` that represents the max order gram that can be found in the dictionary.
AbstractNGramTrieNode<T extends AbstractNGramTrieNode<?>>	Represent a node in a trie structure to represent ngrams.
AbstractPredictionToCompute
AbstractRecursiveMatcher
AbstractTokenTrainingDocument
AbstractTrainingDocument
AbstractWord
AcronymMatcher
ApostropheMatcher
BaseWordDictionary	A language specific dictionary : contains lower case words and their unigram frequencies.
BiIntegerKey
CachedPrecomputedCorrectionRule	Cached version of a `CorrectionRule` : this rule is to meant to be directly used in `WordCorrectionGenerator`. It only contains information and should not be modified once generated from a `CorrectionRule`
CoOccurrenceKey
CorrectionRule	This correction is the most convenient way to create correction rules as it allow direct modification and has helping methods. The `WordCorrectionGenerator` will then generate `CachedPrecomputedCorrectionRule` to use this rule. Note that a single builder instance can result in multiple correction rule : correction rule should never be directly configured by user as this correction rule is more understandable. Correction rule work as the following : you define errors which are the part replaced, and replacements which are the part correcting errors.
CorrectionRuleNode	The way to represent correction rule used in `WordPredictor` via `WordCorrectionGenerator` Correction rule are represented as a tree where you can enable/disable whole part of it (e.g. disabling a parent node also disable its children). Node are typed with `CorrectionRuleNode.getType()` so they can be `CorrectionRuleNodeType.NODE` or `CorrectionRuleNodeType.LEAF`. Every node can technically contains `CorrectionRuleNode.getCorrectionRule()` but be aware that only `CorrectionRuleNodeType.LEAF` are taken into account by `WordCorrectionGenerator`
CorrectionRuleNodeType	Represent the type of a `CorrectionRuleNode`
DaemonThreadFactory
DataTrainer	Class to create prediction data to be used with a word predictor.
DataTrainerResult
DataTrainerResult.Builder	Builder to build `DataTrainerResult`.
DateDayMonthMatcher
DateFullDigitMatcher
DateFullTextMatcher
DateMonthYearMatcher
DateWeekDayMatcher
DoublePredictionToCompute	Represent the prediction for two word in a row. Could have been generic (more than two, but for computing performance, limit combination to two word only)
DynamicNGramDictionary	Represent a `TrainingNGramDictionary` that can also be opened to be trained again. This type of dictionary is useful when using a dynamic user model : the dynamic user dictionary is loaded and trained during each session, and then saved to be used in the next sessions.
DynamicNGramTrieNode	Represent a dynamic trie node structure : this trie node is useful when the ngram count has to be retrieved. Dynamic trie node children are always fully loaded (they are not loaded on demand) and their frequencies can change. Because dynamic trie node are used to be saved and loaded as `StaticNGramTrieNode` or `DynamicNGramTrieNode` they contains two write method : `DynamicNGramTrieNode.writeStaticNode(FileChannel, int)` if they are saved to be loaded as `StaticNGramTrieNode` and `DynamicNGramTrieNode.writeDynamicNode(FileChannel, int)` if they are saved to be loaded as `DynamicNGramTrieNode` : one save static information about the node (frequency, bow), the other only save dynamic information (count) because frequencies are dynamically computed.
EquivalenceClass	Represent a equivalence class type that can be used when training a language model. Useful to group same kind of element in a corpus under a same concept instead of textual data. 3 These are especially used in semantic data.
EquivalenceClassToken
EquivalenceClassWord
FifoSet<T>	A set maintaining exactly `FifoSet.maxSize` or less but keeping there insertion order to always delete the first inserted element when set is full.
FrenchBaseWordDictionary	French dictionary based on Lexique.org
FrenchDefaultCorrectionRuleGenerator	Generate base correction rule for french language. Keep every possible rule in `FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType` with a translated name, description and example.
FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
FrenchDefaultCorrectionRuleGenerator.TranslationProvider
FrenchLanguageModel
FrenchLanguageUtils	Utils methods for french language.
FrenchStopWordDictionary
GeneratingCorrectionI
HyphenMatcher	Term matcher to match word sequence with hyphen between each word. The sequence should start and end with hyphen, examples : a-t : valid a-t-elle : valid a-t-elle- : not valid -test- : not valid
LanguageModel	Represent a model specific to the input language. This model is useful to better perform on NLP task by using specific parameters from a language. E.G.
LoggingProgressIndicator
NextWord
NGramDebugger	This interface can be used to check an ngram dictionary while training models.
NGramDictionaryGenerator	Use this generator to train an ngram model. It will load texts from a `TrainingCorpus` and generate a ngram file that could be later opened with a `StaticNGramTrieDictionary`
NGramKey
NGramPruningMethod
NGramTrainingDocument
NGramWordPredictorUtils	Utils class useful when predicting words with an ngram dictionaries.
NoOpProgressIndicator
NumberDecimalMatcher
NumberIntMatcher
Pair<K,T>
ParserTrainingDocument
PatternMatched
PercentMatcher
Predict4AllInfo	This retrieves information about the library (version and build date). This should mostly be used to ensure consistency on saved data (i.e. save and load data from same versions)
Predict4AllUtils	Contains different utils methods that are used in NLP taks.
PredictionParameter	Contains parameters to configure how `WordPredictor` is working. Changes to an instance of `PredictionParameter` while the predictor is running could be not reflected as some values are cached internally.
ProgressIndicator
ProperNameMatcher
SemanticDictionary	Represents a semantic dictionary to be used to predict next words. WARNING : THIS IS A WIP
SemanticDictionaryConfiguration
SemanticDictionaryGenerator	To generate a `SemanticDictionary` from an input corpus. This creates a term x term matrix and then reduces it with SVD (via an optimized R script, "Rscript" should be available in path).
SemanticTrainingDocument
Separator	Represent chars between words. This is preferred to regex pattern because separator are fully controlled. If you add any new separator, watch the last used id
SeparatorToken
SimpleGeneratingCorrection
SimpleWord
SingleThreadDoubleAdder	Similar to `DoubleAdder` but for a single threaded usage. Just a simple double reference without any overhead.
SpecialWordMatcher
StaticNGramTrieDictionary	Represent a static ngram dictionary where trie node are loaded "on demand" while browsing through the nodes. This dictionary is read only and cannot be updated or saved : methods like `StaticNGramTrieDictionary.updateProbabilities(double[])`, `StaticNGramTrieDictionary.putAndIncrementBy(int[], int)` are not supported by this dictionary.
StaticNGramTrieNode	Represent a static ngram trie node : when node are used only to retrieve information and compute probabilities, but children are never updated. This node is particular because children node are loaded on demand from a `FileChannel`. This node is produced in a read only version : to create this node, `DynamicNGramTrieNode` and `TrainingNGramDictionary` should be used.
StopWordDictionary	A language specific dictionary : contains every stop words for a language
StringProducer
Tag	Represent a specific value in a corpus. Useful to tag specific part of the corpus without any semantic information. START : represent a sentence start UNKNOWN : represent a word/expression out of vocabulary
TagToken
TagWord
TermMatcherUtils
Token	Represent the lowest unit when parsing a text.
TokenAppender
TokenConverter	This token converter will convert input token list to another token list, with matched `TokenMatcher` pattern.
TokenConverterTrainingDocument
TokenFileInputStream
TokenFileOutputStream
Tokenizer	This takes a raw text and to create tokens from it.
TokenListAppender
TokenListProvider
TokenMatcher	Represent a matcher that will try to detect if a given token match a specific pattern. If so, the `PatternMatched` contains the the normalized representation of the matched tokens and eventually an `EquivalenceClass`.
TokenProvider
TokenRegexMatcher
TokenRegexMatcher.TokenRegexMatcherBuilder
TokenRegexResult
TrainerTask
TrainingConfiguration
TrainingCorpus
TrainingNGramDictionary	Represent a training dictionary : a ngram dictionary used while training an ngram model. This dictionary is useful because it supports dynamic insertion and probabilities computing...
TrainingStep	Represent the possible training steps. This allow training to be stopped and started again at a specific step : going to converted tokens, and then running WORDS_DICTIONARY multiple times.
TrieNodeMap<V>	Custom implementation copied from `TIntObjectHashMap` but with less attribute to reduce the heap size in Trie. Source is copied from class hierarchy (with manually merging methods): `THash` `TPrimitiveHash` `TIntHash` `TIntObjectHashMap` The implementation is modified to keep the minimum attribute count on this Map because this TrieNodeMap will be created a lot of time !
TrieNodeMapConstant
Triple<K,T,V>
UniquePredictionToCompute
UserWord
Word	Represent a word stored in a `WordDictionary` - word are stored with a int ID to optimize memory usage.
WordCorrectionGenerator	Generate possible correction from a input text and tokens. Correction are based on rule (`CorrectionRule`) and generation is done using a thread pool. Result correction could be unique word or double word (for example, the error might be a merged word)
WordDictionary	Represent a word dictionary. This dictionary identify each sequence of chars as an unique "word" and keep information for this word. Each word are identified by a single int ID to save memory and space. The dictionary itself is identified with an UUID to verify consistency when using user dictionary. Note that `Word` added to `WordDictionary` cannot be removed : their ID should be consistent and they could have been used in a `AbstractNGramDictionary` : however, you can disable a word with `Word.setForceInvalid(boolean, boolean)`
WordDictionaryGenerator	This will generate a word dictionary from a `TrainingCorpus` : this will detect different word in training corpus and try to filter out words : match lower/upper case words, filter on a `BaseWordDictionary`, exclude low count words, etc.
WordDictionaryMatchingException	This exception is mainly thrown if an user dictionary is loaded but is was saved from a previous dictionary.
WordDictionaryTrainingDocument
WordFileInputStream
WordFileOutputStream
WordPrediction	Represent a predictor from `WordPredictor`
WordPredictionResult	Contains the result from `WordPredictor`.
WordPredictor	Main entry point of PREDICT4ALL API. Instance of `WordPredictor` can predict next words, current word ends and even current corrections. The predictor mainly relies on two item : ngram dictionary and word dictionary to search for word and existing sequences. Additionally, a dynamic model can be provided to combine both static ngrams originated from an already learned generic model and a dynamic model specific to user, profil, application... The predictor configuration is located in `PredictionParameter` : the instance provided on `WordPredictor` creation can be later modified.
WordPrefixDetected	Contains information about a started word (found in dictionary)
WordPrefixDetector	Useful to detect if a existing word is started in a token list. It's important to detect if a word is already started when predicting next word, because the prediction result should always takes care of giving prediction result that starts like the already started word. Because word are allowed to have word separator inside (hyphen, etc...), started word detection is much more complicated that just checking if the token list ends with a token separator.
WordToken