Package | Description |
---|---|
org.predict4all.nlp | |
org.predict4all.nlp.exception | |
org.predict4all.nlp.io |
Contains custom
InputStream and OutputStream to save/load Predict4All specific items (Token and Word ).Note that NGram are saved without these stream as they are designed to be loaded on demand with a FileChannel .Both token and word streams extends DataOutputStream or DataInputStream : this was done for optimization, this method is much more optimized that using any other serialization methods. |
org.predict4all.nlp.language |
Represent every language specific items.
A base AbstractLanguageModel allow simpler LanguageModel implementations.Sub-packages as "french" may contains language specific code. |
org.predict4all.nlp.language.french | |
org.predict4all.nlp.language.french.matcher | |
org.predict4all.nlp.ngram |
Package containing everything about the NGram model used in Predict4All.
Contains the ngram training algorithm in NGramDictionaryGenerator Also contains AbstractNGramTrieNode : a trie structure that can be implemented in both ways : dynamic or static.This trie structure allow having a huge number of ngram available for probability computation without loading them into memory. |
org.predict4all.nlp.ngram.debug | |
org.predict4all.nlp.ngram.dictionary | |
org.predict4all.nlp.ngram.trie | |
org.predict4all.nlp.ngram.trie.map | |
org.predict4all.nlp.parser | |
org.predict4all.nlp.parser.matcher | |
org.predict4all.nlp.parser.token | |
org.predict4all.nlp.prediction |
Main PREDICT4ALL entry point : this package contains "the glue" between every prediction components.
PREDICT4ALL core features are located in WordPredictor |
org.predict4all.nlp.prediction.model | |
org.predict4all.nlp.semantic |
Semantic related prediction model (WIP) - not used by current
WordPredictor |
org.predict4all.nlp.trainer |
Represents the whole data training process managed by the main
DataTrainer .Training is done with different steps : Tokenizer
TokenConverter
WordDictionaryGenerator
NGramDictionaryGenerator
Note that the DataTrainer use TrainingCorpus and AbstractTrainingDocument :
this abstraction level is useful to be able to train the model on same corpus without having to go through every training step : really useful when developing new training algorithms. |
org.predict4all.nlp.trainer.configuration | |
org.predict4all.nlp.trainer.corpus | |
org.predict4all.nlp.trainer.step | |
org.predict4all.nlp.utils |
Contains some simple data structure and lambda interfaces and classic "utils" static classes.
|
org.predict4all.nlp.utils.progressindicator | |
org.predict4all.nlp.words |
Contains classes related to
Word and WordDictionary Mainly used to identify Word as unique instance identified with int ID.This package mainly focus on managing vocabulary. |
org.predict4all.nlp.words.correction |
Contains every classes and algorithms related to word correction.
The main component is WordCorrectionGenerator as it contains most of the correction logic.CorrectionRule is also important as it the main entry point to configure word correction. |
org.predict4all.nlp.words.model |