All Packages
-
Package Summary Package Description org.predict4all.nlp org.predict4all.nlp.exception org.predict4all.nlp.io Contains customInputStream
andOutputStream
to save/load Predict4All specific items (Token
andWord
).
Note that NGram are saved without these stream as they are designed to be loaded on demand with aFileChannel
.
Both token and word streams extendsDataOutputStream
orDataInputStream
: this was done for optimization, this method is much more optimized that using any other serialization methods.org.predict4all.nlp.language Represent every language specific items.
A baseAbstractLanguageModel
allow simplerLanguageModel
implementations.
Sub-packages as "french" may contains language specific code.org.predict4all.nlp.language.french org.predict4all.nlp.language.french.matcher org.predict4all.nlp.ngram Package containing everything about the NGram model used in Predict4All.
Contains the ngram training algorithm inNGramDictionaryGenerator
Also containsAbstractNGramTrieNode
: a trie structure that can be implemented in both ways : dynamic or static.
This trie structure allow having a huge number of ngram available for probability computation without loading them into memory.org.predict4all.nlp.ngram.debug org.predict4all.nlp.ngram.dictionary org.predict4all.nlp.ngram.trie org.predict4all.nlp.ngram.trie.map org.predict4all.nlp.parser org.predict4all.nlp.parser.matcher org.predict4all.nlp.parser.token org.predict4all.nlp.prediction Main PREDICT4ALL entry point : this package contains "the glue" between every prediction components.
PREDICT4ALL core features are located inWordPredictor
org.predict4all.nlp.prediction.model org.predict4all.nlp.semantic Semantic related prediction model (WIP) - not used by currentWordPredictor
org.predict4all.nlp.trainer Represents the whole data training process managed by the mainDataTrainer
.
Training is done with different steps :Tokenizer
TokenConverter
WordDictionaryGenerator
NGramDictionaryGenerator
Note that theDataTrainer
useTrainingCorpus
andAbstractTrainingDocument
: this abstraction level is useful to be able to train the model on same corpus without having to go through every training step : really useful when developing new training algorithms.org.predict4all.nlp.trainer.configuration org.predict4all.nlp.trainer.corpus org.predict4all.nlp.trainer.step org.predict4all.nlp.utils Contains some simple data structure and lambda interfaces and classic "utils" static classes.org.predict4all.nlp.utils.progressindicator org.predict4all.nlp.words Contains classes related toWord
andWordDictionary
Mainly used to identifyWord
as unique instance identified with int ID.
This package mainly focus on managing vocabulary.org.predict4all.nlp.words.correction Contains every classes and algorithms related to word correction.
The main component isWordCorrectionGenerator
as it contains most of the correction logic.
CorrectionRule
is also important as it the main entry point to configure word correction.org.predict4all.nlp.words.model