Packages 
Package Description
org.predict4all.nlp  
org.predict4all.nlp.exception  
org.predict4all.nlp.io
Contains custom InputStream and OutputStream to save/load Predict4All specific items (Token and Word).
Note that NGram are saved without these stream as they are designed to be loaded on demand with a FileChannel.
Both token and word streams extends DataOutputStream or DataInputStream : this was done for optimization, this method is much more optimized that using any other serialization methods.
org.predict4all.nlp.language
Represent every language specific items.
A base AbstractLanguageModel allow simpler LanguageModel implementations.
Sub-packages as "french" may contains language specific code.
org.predict4all.nlp.language.french  
org.predict4all.nlp.language.french.matcher  
org.predict4all.nlp.ngram
Package containing everything about the NGram model used in Predict4All.
Contains the ngram training algorithm in NGramDictionaryGenerator
Also contains AbstractNGramTrieNode : a trie structure that can be implemented in both ways : dynamic or static.
This trie structure allow having a huge number of ngram available for probability computation without loading them into memory.
org.predict4all.nlp.ngram.debug  
org.predict4all.nlp.ngram.dictionary  
org.predict4all.nlp.ngram.trie  
org.predict4all.nlp.ngram.trie.map  
org.predict4all.nlp.parser
Package mainly focus on classes to convert a raw input text (as String) to Token that can be used by Predict4All.
This package is used by both training algorithms and predictor : this allow consistency among parsing and using user input.
Both word and token stream
org.predict4all.nlp.parser.matcher  
org.predict4all.nlp.parser.token  
org.predict4all.nlp.prediction
Main PREDICT4ALL entry point : this package contains "the glue" between every prediction components.
PREDICT4ALL core features are located in WordPredictor
org.predict4all.nlp.prediction.model  
org.predict4all.nlp.semantic
Semantic related prediction model (WIP) - not used by current WordPredictor
org.predict4all.nlp.trainer
Represents the whole data training process managed by the main DataTrainer.
Training is done with different steps : Tokenizer TokenConverter WordDictionaryGenerator NGramDictionaryGenerator Note that the DataTrainer use TrainingCorpus and AbstractTrainingDocument : this abstraction level is useful to be able to train the model on same corpus without having to go through every training step : really useful when developing new training algorithms.
org.predict4all.nlp.trainer.configuration  
org.predict4all.nlp.trainer.corpus  
org.predict4all.nlp.trainer.step  
org.predict4all.nlp.utils
Contains some simple data structure and lambda interfaces and classic "utils" static classes.
org.predict4all.nlp.utils.progressindicator  
org.predict4all.nlp.words
Contains classes related to Word and WordDictionary
Mainly used to identify Word as unique instance identified with int ID.
This package mainly focus on managing vocabulary.
org.predict4all.nlp.words.correction
Contains every classes and algorithms related to word correction.
The main component is WordCorrectionGenerator as it contains most of the correction logic.
CorrectionRule is also important as it the main entry point to configure word correction.
org.predict4all.nlp.words.model