A B C D E F G H I K L M N O P Q R S T U V W _
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- ABBREVIATIONS - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- AbstractLanguageModel - Class in org.predict4all.nlp.language
- AbstractLanguageModel() - Constructor for class org.predict4all.nlp.language.AbstractLanguageModel
- AbstractNGramDictionary<T extends AbstractNGramTrieNode<T>> - Class in org.predict4all.nlp.ngram.dictionary
-
Represent an ngram dictionary in an abstract way : dictionary can be static or dynamic.
Each type of dictionary can or can't support operation, such as dictionary saving, or updating probabilities.
The dictionary has aAbstractNGramDictionary.maxOrder
that represents the max order gram that can be found in the dictionary. - AbstractNGramDictionary(T, int) - Constructor for class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Construct a dictionary with a given root node and a max possible order.
- AbstractNGramTrieNode<T extends AbstractNGramTrieNode<?>> - Class in org.predict4all.nlp.ngram.trie
-
Represent a node in a trie structure to represent ngrams.
- AbstractNGramTrieNode() - Constructor for class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
- AbstractPredictionToCompute - Class in org.predict4all.nlp.prediction.model
- AbstractPredictionToCompute() - Constructor for class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- AbstractRecursiveMatcher - Class in org.predict4all.nlp.parser.matcher
- AbstractRecursiveMatcher(boolean, Separator, String) - Constructor for class org.predict4all.nlp.parser.matcher.AbstractRecursiveMatcher
- AbstractRecursiveMatcher(boolean, Separator, String, EquivalenceClass) - Constructor for class org.predict4all.nlp.parser.matcher.AbstractRecursiveMatcher
- AbstractTokenTrainingDocument - Class in org.predict4all.nlp.trainer.corpus
- AbstractTokenTrainingDocument(TrainingStep, File, File) - Constructor for class org.predict4all.nlp.trainer.corpus.AbstractTokenTrainingDocument
- AbstractTrainingDocument - Class in org.predict4all.nlp.trainer.corpus
- AbstractTrainingDocument(TrainingStep, String, File, File) - Constructor for class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- AbstractWord - Class in org.predict4all.nlp.words.model
- AbstractWord(int) - Constructor for class org.predict4all.nlp.words.model.AbstractWord
- ACCENTS - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- ACRONYM - org.predict4all.nlp.EquivalenceClass
- AcronymMatcher - Class in org.predict4all.nlp.language.french.matcher
- AcronymMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.AcronymMatcher
- add(T) - Method in class org.predict4all.nlp.utils.FifoSet
- ADD_LETTER - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- addAndReturnAdded(double) - Method in class org.predict4all.nlp.utils.SingleThreadDoubleAdder
- addChild(CorrectionRuleNode) - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
Convenient method to add a child to this node.
This method is NOOP ifCorrectionRuleNode.getType()
isCorrectionRuleNodeType.LEAF
- addCorrectionsFor(String, Map<BiIntegerKey, NextWord>, Set<Integer>) - Method in class org.predict4all.nlp.words.correction.WordCorrectionGenerator
- addTo(Collection<CorrectionRule>) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Helper to add this correction rule to a collection (useful for chaining calls)
- APOSTROPHE - org.predict4all.nlp.Separator
- ApostropheMatcher - Class in org.predict4all.nlp.language.french.matcher
- ApostropheMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.ApostropheMatcher
- append(Token) - Method in class org.predict4all.nlp.io.TokenFileOutputStream
- append(Token) - Method in interface org.predict4all.nlp.parser.TokenAppender
- append(Token) - Method in class org.predict4all.nlp.parser.TokenListAppender
- appendDebugInformationForCurrentPart(StringBuilder, Pair<StringBuilder, StringBuilder>, CachedPrecomputedCorrectionRule) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- appendDebugInformationForCurrentPart(StringBuilder, Pair<StringBuilder, StringBuilder>, CachedPrecomputedCorrectionRule) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- appendToCurrentPart(CharSequence) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- appendToCurrentPart(CharSequence) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- AZERTY_KEYBOARD - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
B
- BACKSLASH - org.predict4all.nlp.Separator
- BaseWordDictionary - Interface in org.predict4all.nlp.language
-
A language specific dictionary : contains lower case words and their unigram frequencies.
- BiIntegerKey - Class in org.predict4all.nlp.utils
- build() - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- build() - Method in class org.predict4all.nlp.trainer.DataTrainerResult.Builder
- BUILD_DATE - Static variable in class org.predict4all.nlp.Predict4AllInfo
- builder() - Static method in class org.predict4all.nlp.trainer.DataTrainerResult
-
Creates builder to build
DataTrainerResult
.
C
- CachedPrecomputedCorrectionRule - Class in org.predict4all.nlp.words.correction
-
Cached version of a
CorrectionRule
: this rule is to meant to be directly used inWordCorrectionGenerator
.
It only contains information and should not be modified once generated from aCorrectionRule
- calculateGrownCapacity() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- call() - Method in class org.predict4all.nlp.trainer.TrainerTask
- capacity() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Returns the capacity of the hash table.
- capitalize(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- capture(String) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- changeCurrentPartTo(StringBuilder) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- changeCurrentPartTo(StringBuilder) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- checkChildrenLoading(FileChannel) - Method in class org.predict4all.nlp.ngram.trie.StaticNGramTrieNode
-
Check that children for this node are loaded.
If not, try to load children from the given fileChannel. - checkChildrenLoading(DynamicNGramTrieNode) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- checkChildrenLoading(StaticNGramTrieNode) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- checkChildrenLoading(T) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
To check that the children of a given node are loaded into memory (and can be used)
- checkNull(T, String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
-
Throws a
IllegalArgumentException
if a given object is null - children - Variable in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
-
Represent the children node for this node.
Each child is stored by its value (= word id) and represent the possible next value.
To save memory, the map is created on demand, so even if this node has children, the map can be null if children are not loaded yet. - childrenBackoffWeight - Variable in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
-
Backoff weight for this node children frequencies
- childrenPosition - Variable in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
-
Contains the children nodes position in file.
Position in aFileChannel
is a long type, but to save memory the value is stored as an int (trie file never contains more thanInteger.MAX_VALUE
byte) - clear() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Empties the collection.
- clear() - Method in class org.predict4all.nlp.utils.FifoSet
- clearNextCache() - Method in interface org.predict4all.nlp.parser.token.Token
- clone() - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- clone() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- clone(int) - Method in class org.predict4all.nlp.words.model.EquivalenceClassWord
- clone(int) - Method in class org.predict4all.nlp.words.model.SimpleWord
- clone(int) - Method in class org.predict4all.nlp.words.model.TagWord
- clone(int) - Method in class org.predict4all.nlp.words.model.UserWord
- clone(int) - Method in interface org.predict4all.nlp.words.model.Word
-
Create a clone of this word.
This allow duplication existing word, an new id should be provided. - close() - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- close() - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- close() - Method in class org.predict4all.nlp.parser.TokenListAppender
- close() - Method in class org.predict4all.nlp.parser.TokenListProvider
- CLOSE_HOOK - org.predict4all.nlp.Separator
- COMMA - org.predict4all.nlp.Separator
- compact() - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Compact the nodes in this dictionary (this will call
AbstractNGramTrieNode.compact()
on root) - compact() - Method in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
-
compact the children of this node (if this node has children)
- compact() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Compresses the hashtable to the minimum prime size (as defined by PrimeFinder) that will hold all of the elements currently in the table.
- compareTo(AbstractPredictionToCompute) - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- computeD(TrainingConfiguration) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Compute the optimal value for d (absolute discounting parameter).
Usually d is computed with formula :
D = C1 / (C1 + 2 * C2)
Where C1 = number of ngram with count == 1, and C2 = number of ngram with count == 2. - computeD(TrainingConfiguration) - Method in class org.predict4all.nlp.ngram.dictionary.DynamicNGramDictionary
- computeD(TrainingConfiguration) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- computeD(TrainingConfiguration) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- computeMaxSize(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Computes the values of maxSize.
- computePrediction(WordDictionary) - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- computePrediction(WordDictionary) - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- computePrediction(WordDictionary) - Method in class org.predict4all.nlp.prediction.model.UniquePredictionToCompute
- computeProbabilityForChildren(int, double[], boolean) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Compute frequency for this node using absolute discounting.
Compute this node frequency, backoff weight, and then compute the frequency for node children. - consumeFreeSlot - Variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- contains(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Searches the set for
val
- contains(T) - Method in class org.predict4all.nlp.utils.FifoSet
- containsKey(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- containsUpperCase(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- containsValue(Object) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- containsWord(int) - Method in class org.predict4all.nlp.language.french.FrenchStopWordDictionary
- containsWord(int) - Method in interface org.predict4all.nlp.language.StopWordDictionary
- containsWord(String) - Method in interface org.predict4all.nlp.language.BaseWordDictionary
- containsWord(String) - Method in class org.predict4all.nlp.language.french.FrenchBaseWordDictionary
- convertWrittenYearToExactYear(int) - Static method in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- CoOccurrenceKey - Class in org.predict4all.nlp.semantic
- CoOccurrenceKey(int, int) - Constructor for class org.predict4all.nlp.semantic.CoOccurrenceKey
- correction - Variable in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- CorrectionRule - Class in org.predict4all.nlp.words.correction
-
This correction is the most convenient way to create correction rules as it allow direct modification and has helping methods.
TheWordCorrectionGenerator
will then generateCachedPrecomputedCorrectionRule
to use this rule.
Note that a single builder instance can result in multiple correction rule : correction rule should never be directly configured by user as this correction rule is more understandable.
Correction rule work as the following : you define errors which are the part replaced, and replacements which are the part correcting errors. - CorrectionRuleNode - Class in org.predict4all.nlp.words.correction
-
The way to represent correction rule used in
WordPredictor
viaWordCorrectionGenerator
Correction rule are represented as a tree where you can enable/disable whole part of it (e.g. disabling a parent node also disable its children).
Node are typed withCorrectionRuleNode.getType()
so they can beCorrectionRuleNodeType.NODE
orCorrectionRuleNodeType.LEAF
.
Every node can technically containsCorrectionRuleNode.getCorrectionRule()
but be aware that onlyCorrectionRuleNodeType.LEAF
are taken into account byWordCorrectionGenerator
- CorrectionRuleNode(CorrectionRuleNodeType) - Constructor for class org.predict4all.nlp.words.correction.CorrectionRuleNode
- CorrectionRuleNodeType - Enum in org.predict4all.nlp.words.correction
-
Represent the type of a
CorrectionRuleNode
- count - Variable in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- COUNT_FORMAT - Static variable in class org.predict4all.nlp.words.WordDictionaryGenerator
- countEndUntilNextSeparator(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- countNGram(int, int, AtomicInteger, AtomicInteger) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Count the number of ngram on a order : count the total count (occurence count) and the unique count (difference ngram count)
- countNGrams() - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- countOneAndTwoOccurenceNGrams(int, AtomicInteger[], AtomicInteger[]) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Count the number of ngram with a count == 1 or == 2.
This ignore ngram containingTag.START
- countStartUntilNextSeparator(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- create(int) - Static method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
-
Create an empty training ngram trie dictionary
- create(int, String) - Static method in class org.predict4all.nlp.words.model.SimpleWord
- create(int, String) - Static method in class org.predict4all.nlp.words.model.UserWord
- create(int, String, double, boolean, boolean, long, int) - Static method in class org.predict4all.nlp.words.model.UserWord
- create(String) - Static method in class org.predict4all.nlp.parser.token.WordToken
- create(String, EquivalenceClass) - Static method in class org.predict4all.nlp.parser.token.EquivalenceClassToken
- create(Separator) - Static method in class org.predict4all.nlp.parser.token.SeparatorToken
- create(Tag) - Static method in class org.predict4all.nlp.parser.token.TagToken
- createDouble(int, int, Separator, double, boolean, StringBuilder) - Static method in class org.predict4all.nlp.words.NextWord
- createMap(T...) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- createMatchedString(List<String>) - Method in class org.predict4all.nlp.language.french.matcher.AcronymMatcher
- createMatchedString(List<String>) - Method in class org.predict4all.nlp.language.french.matcher.HyphenMatcher
- createMatchedString(List<String>) - Method in class org.predict4all.nlp.parser.matcher.AbstractRecursiveMatcher
- createModified(int, String, boolean, boolean, double, boolean, boolean) - Static method in class org.predict4all.nlp.words.model.SimpleWord
- createPrefixFor(List<Token>, WordPrefixDetected, int, boolean) - Method in class org.predict4all.nlp.ngram.NGramWordPredictorUtils
-
Create the prefix for a given raw context (token list) : the context is meant to be used for ngram trie exploring.
The context takes care of using only the last sentence, to detect the current written word, and to retrieve a context of the wanted order. - createUnique(int, double, boolean, StringBuilder) - Static method in class org.predict4all.nlp.words.NextWord
- createWordDictionary(TrainingCorpus, Consumer<List<TrainerTask>>, File) - Method in class org.predict4all.nlp.words.WordDictionaryGenerator
- CURRENCY_EURO_SYMBOL - org.predict4all.nlp.Separator
- currentPartFinishedAndNewPartStarted(Separator, StringBuilder) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- currentPartFinishedAndNewPartStarted(Separator, StringBuilder) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- CUSTOM - org.predict4all.nlp.EquivalenceClass
D
- DaemonThreadFactory - Class in org.predict4all.nlp.utils
- DaemonThreadFactory() - Constructor for class org.predict4all.nlp.utils.DaemonThreadFactory
- DataTrainer - Class in org.predict4all.nlp.trainer
-
Class to create prediction data to be used with a word predictor.
- DataTrainer(File, File, File, File, LanguageModel, TrainingConfiguration) - Constructor for class org.predict4all.nlp.trainer.DataTrainer
- DataTrainerResult - Class in org.predict4all.nlp.trainer
- DataTrainerResult.Builder - Class in org.predict4all.nlp.trainer
-
Builder to build
DataTrainerResult
. - DATE_DAY_MONTH - org.predict4all.nlp.EquivalenceClass
- DATE_FULL_DIGIT - org.predict4all.nlp.EquivalenceClass
- DATE_FULL_TEXT - org.predict4all.nlp.EquivalenceClass
- DATE_HOUR - org.predict4all.nlp.EquivalenceClass
- DATE_MONTH - org.predict4all.nlp.EquivalenceClass
- DATE_MONTH_YEAR - org.predict4all.nlp.EquivalenceClass
- DATE_WEEK_DAY - org.predict4all.nlp.EquivalenceClass
- DateDayMonthMatcher - Class in org.predict4all.nlp.language.french.matcher
- DateDayMonthMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.DateDayMonthMatcher
- DateFullDigitMatcher - Class in org.predict4all.nlp.language.french.matcher
- DateFullDigitMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.DateFullDigitMatcher
- DateFullTextMatcher - Class in org.predict4all.nlp.language.french.matcher
- DateFullTextMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.DateFullTextMatcher
- DateMonthYearMatcher - Class in org.predict4all.nlp.language.french.matcher
- DateMonthYearMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.DateMonthYearMatcher
- DateWeekDayMatcher - Class in org.predict4all.nlp.language.french.matcher
- DateWeekDayMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.DateWeekDayMatcher
- debug(WordDictionary, AbstractNGramTrieNode<?>) - Method in interface org.predict4all.nlp.ngram.debug.NGramDebugger
- debugInformation - Variable in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- DECIMAL - org.predict4all.nlp.EquivalenceClass
- defaultConfiguration() - Static method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- defaultConfiguration(File) - Static method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- DICTIONARY_INFORMATION_BYTE_COUNT - Static variable in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Byte count needed to save general information about this dictionary.
- dispose() - Method in class org.predict4all.nlp.prediction.WordPredictor
-
Try to close/dispose word predictor resources.
This will close ngram dictionary and dynamic dictionary (equivalent to callAutoCloseable.close()
on both) and also theWordCorrectionGenerator
if it was enabled. - dispose() - Method in class org.predict4all.nlp.words.correction.WordCorrectionGenerator
- document - Variable in class org.predict4all.nlp.trainer.TrainerTask
- DOUBLE_LETTER - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- DoublePredictionToCompute - Class in org.predict4all.nlp.prediction.model
-
Represent the prediction for two word in a row.
Could have been generic (more than two, but for computing performance, limit combination to two word only) - DoublePredictionToCompute(int, int, boolean, int[], int[], double, boolean, StringBuilder) - Constructor for class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- DYNAMIC_TRIE_NODE_SIZE_BYTE - Static variable in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
-
Dynamic node byte size (4 integer) Integer : word id, children size, children position, count
- DynamicNGramDictionary - Class in org.predict4all.nlp.ngram.dictionary
-
Represent a
TrainingNGramDictionary
that can also be opened to be trained again.
This type of dictionary is useful when using a dynamic user model : the dynamic user dictionary is loaded and trained during each session, and then saved to be used in the next sessions. - DynamicNGramDictionary(int) - Constructor for class org.predict4all.nlp.ngram.dictionary.DynamicNGramDictionary
- DynamicNGramTrieNode - Class in org.predict4all.nlp.ngram.trie
-
Represent a dynamic trie node structure : this trie node is useful when the ngram count has to be retrieved.
Dynamic trie node children are always fully loaded (they are not loaded on demand) and their frequencies can change.
Because dynamic trie node are used to be saved and loaded asStaticNGramTrieNode
orDynamicNGramTrieNode
they contains two write method :DynamicNGramTrieNode.writeStaticNode(FileChannel, int)
if they are saved to be loaded asStaticNGramTrieNode
andDynamicNGramTrieNode.writeDynamicNode(FileChannel, int)
if they are saved to be loaded asDynamicNGramTrieNode
: one save static information about the node (frequency, bow), the other only save dynamic information (count) because frequencies are dynamically computed. - DynamicNGramTrieNode() - Constructor for class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
E
- encoding - Variable in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- endCorrection(double) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- endCorrection(double) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- endsWith(String, String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- EQUAL - org.predict4all.nlp.Separator
- equals(Object) - Method in class org.predict4all.nlp.ngram.NGramKey
- equals(Object) - Method in class org.predict4all.nlp.parser.token.SeparatorToken
- equals(Object) - Method in class org.predict4all.nlp.parser.token.WordToken
- equals(Object) - Method in class org.predict4all.nlp.semantic.CoOccurrenceKey
- equals(Object) - Method in class org.predict4all.nlp.utils.BiIntegerKey
- equals(Object) - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- EquivalenceClass - Enum in org.predict4all.nlp
-
Represent a equivalence class type that can be used when training a language model.
Useful to group same kind of element in a corpus under a same concept instead of textual data.
3 These are especially used in semantic data. - EquivalenceClassToken - Class in org.predict4all.nlp.parser.token
- EquivalenceClassWord - Class in org.predict4all.nlp.words.model
- EquivalenceClassWord(EquivalenceClass) - Constructor for class org.predict4all.nlp.words.model.EquivalenceClassWord
- EXCLAMATION - org.predict4all.nlp.Separator
- executeLSATrainingForR(TrainingCorpus, File, Consumer<List<? extends TrainerTask>>) - Method in class org.predict4all.nlp.semantic.SemanticDictionaryGenerator
- executeNGramTraining(TrainingCorpus, File, Consumer<List<TrainerTask>>) - Method in class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- executeTermDetection(List<Token>) - Method in class org.predict4all.nlp.parser.matcher.TokenConverter
- executeTokenPatternMatching(TrainingCorpus) - Method in class org.predict4all.nlp.parser.matcher.TokenConverter
- executeWriteLevelOnRoot(FileChannel, int) - Method in class org.predict4all.nlp.ngram.dictionary.DynamicNGramDictionary
- executeWriteLevelOnRoot(FileChannel, int) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
-
Call the correct node method to save a trie level to file.
- exploreChildren(int, int, BiConsumer<Integer, DynamicNGramTrieNode>) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
F
- factor - Variable in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- FifoSet<T> - Class in org.predict4all.nlp.utils
-
A set maintaining exactly
FifoSet.maxSize
or less but keeping there insertion order to always delete the first inserted element when set is full. - FifoSet(int) - Constructor for class org.predict4all.nlp.utils.FifoSet
- FILENAME_LSA_DICTIONARY - Static variable in class org.predict4all.nlp.trainer.DataTrainer
- FILENAME_NGRAM_DICTIONARY - Static variable in class org.predict4all.nlp.trainer.DataTrainer
- FILENAME_WORD_DICTIONARY - Static variable in class org.predict4all.nlp.trainer.DataTrainer
- forceInvalid - Variable in class org.predict4all.nlp.words.model.SimpleWord
- forceValid - Variable in class org.predict4all.nlp.words.model.SimpleWord
- forEach(TIntProcedure) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Executes
procedure
for each element in the set. - forEachEntry(TIntObjectProcedure<? super V>) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- forEachKey(TIntProcedure) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- forEachValue(TObjectProcedure<? super V>) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- forEachValue(Consumer<? super V>) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- FOUR_DIGIT_FORMAT_ALWAYS - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- FREE - Static variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
flag indicating that a slot in the hashtable is available
- FrenchBaseWordDictionary - Class in org.predict4all.nlp.language.french
-
French dictionary based on Lexique.org
- FrenchBaseWordDictionary(String) - Constructor for class org.predict4all.nlp.language.french.FrenchBaseWordDictionary
- FrenchDefaultCorrectionRuleGenerator - Class in org.predict4all.nlp.language.french
-
Generate base correction rule for french language.
Keep every possible rule inFrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
with a translated name, description and example. - FrenchDefaultCorrectionRuleGenerator() - Constructor for class org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator
- FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType - Enum in org.predict4all.nlp.language.french
- FrenchDefaultCorrectionRuleGenerator.TranslationProvider - Interface in org.predict4all.nlp.language.french
- FrenchLanguageModel - Class in org.predict4all.nlp.language.french
- FrenchLanguageModel() - Constructor for class org.predict4all.nlp.language.french.FrenchLanguageModel
- FrenchLanguageUtils - Class in org.predict4all.nlp.language.french
-
Utils methods for french language.
- FrenchStopWordDictionary - Class in org.predict4all.nlp.language.french
- FrenchStopWordDictionary(String) - Constructor for class org.predict4all.nlp.language.french.FrenchStopWordDictionary
- frequency - Variable in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
-
Computed frequency for this node
- from(TrainingConfiguration) - Static method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- FULL - Static variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
flag indicating that a slot in the hashtable is occupied
G
- GE_GU_SOUND - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- generateNodeFor(PredictionParameter) - Method in enum org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- GeneratingCorrectionI - Interface in org.predict4all.nlp.words.correction
- get(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- getAbbreviationOrRegex() - Static method in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- getAllWords() - Method in class org.predict4all.nlp.words.WordDictionary
-
All the existing words in this dictionary.
Words can be special words asTagWord
,EquivalenceClassWord
, etc.
They can also beSimpleWord
from a trained model, andUserWord
if they are word "learned" when using the predictor.
Note that if you ony want the possible words for final user, you should useWord.isValidToBePredicted(PredictionParameter)
to filter out invalid words. - getAverageVocabularySize() - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getAverageVocabularySize() - Method in interface org.predict4all.nlp.language.LanguageModel
-
Average total vocabulary size (different existing words)
- getAverageWordLength() - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getAverageWordLength() - Method in interface org.predict4all.nlp.language.LanguageModel
- getBaseWordDictionary(TrainingConfiguration) - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getBaseWordDictionary(TrainingConfiguration) - Method in interface org.predict4all.nlp.language.LanguageModel
- getBaseWordDictionaryPath() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getById(byte) - Static method in enum org.predict4all.nlp.Tag
- getChildren() - Method in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
- getChildren() - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
This node children, only useful if
CorrectionRuleNode.getType()
isCorrectionRuleNodeType.NODE
- getChildrenBackoffWeight() - Method in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
- getChildrenCountSum() - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
- getChildrenSize() - Method in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
- getChildrenSize() - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
- getChildrenSize() - Method in class org.predict4all.nlp.ngram.trie.StaticNGramTrieNode
- getConcurrencyLevel() - Method in class org.predict4all.nlp.trainer.corpus.TrainingCorpus
- getConvertCaseFromDictionaryModelThreshold() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getCorpus() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getCorrectionDefaultCost() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The default cost applied to correction rule if
CorrectionRule.getCost()
is null. - getCorrectionDefaultFactor() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The default factor applied to correction rule if
CorrectionRule.getFactor()
is null. - getCorrectionMaxCost() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Determine how much correction will be applied to a same input.
The higher this value, the more correction will be applied. - getCorrectionRule() - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
The correction rule associated on this node.
Will be taken into account only ifCorrectionRuleNode.getType()
isCorrectionRuleNodeType.LEAF
- getCorrectionRulesRoot() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The correction rules to apply if
PredictionParameter.isEnableWordCorrection()
is enabled.
Correction rule are organised as a tree to allow enabling/disabled a whole part of the tree.
Correction rules can be created programmatically, orFrenchDefaultCorrectionRuleGenerator
can be used. - getCost() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getCost() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
The correction cost will influence how many correction are cumulated for a same input.
Typically, correction costs are added to check that they are bellowPredictionParameter.getCorrectionMaxCost()
- getCount() - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
- getCount() - Method in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- getCount() - Method in class org.predict4all.nlp.utils.progressindicator.LoggingProgressIndicator
- getCount() - Method in class org.predict4all.nlp.utils.progressindicator.NoOpProgressIndicator
- getCount() - Method in interface org.predict4all.nlp.utils.progressindicator.ProgressIndicator
- getCountMap() - Method in class org.predict4all.nlp.utils.FifoSet
- getCurrentPart() - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getCurrentPart() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getCurrentPartLength() - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getCurrentPartLength() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getCustomParameters() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Free to use String,String map to had your custom parameters.
This is just a helper asPredictionParameter
can be loaded/saved withPredictionParameter.saveTo(File)
andPredictionParameter.loadFrom(LanguageModel, File)
It allows you to add your custom prediction relative configuration parameter without having to save them in a different file - getDebugInformation() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- getDebugInformation() - Method in class org.predict4all.nlp.prediction.WordPrediction
- getDebugInformation() - Method in class org.predict4all.nlp.prediction.WordPredictionResult
- getDebugInformation() - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getDebugInformation() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getDebugInformation() - Method in class org.predict4all.nlp.words.NextWord
- getDebugPrefix() - Method in class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- getDebugPrefix() - Method in class org.predict4all.nlp.trainer.DataTrainer
- getDensitiesMap() - Method in class org.predict4all.nlp.semantic.SemanticDictionary
- getDescriptionId() - Method in enum org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- getDirectlyValidWordCountThreshold() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getDocuments(TrainingStep) - Method in class org.predict4all.nlp.trainer.corpus.TrainingCorpus
- getDynamicModelMinimumWeight() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Useful to set the minimum weight for dynamic model when interpolating both static and dynamic model.
Value should ranger between 0.0 and 0.5. - getDynamicNGramDictionary() - Method in class org.predict4all.nlp.prediction.WordPredictor
- getECById(byte) - Static method in enum org.predict4all.nlp.EquivalenceClass
- getEncoding() - Method in class org.predict4all.nlp.trainer.corpus.TrainingCorpus
- getEndFactor() - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getEndFactor() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getEndPart(int) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getEndPart(int) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getEndSeparator(int) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getEndSeparator(int) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getEndUntilNextSeparator(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- getEquivalenceClass() - Method in class org.predict4all.nlp.parser.token.EquivalenceClassToken
- getEquivalenceClass() - Method in interface org.predict4all.nlp.parser.token.Token
- getEquivalenceClass() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getEquivalenceClass() - Method in interface org.predict4all.nlp.words.model.Word
- getEquivalenceClassId() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getEquivalenceClassId() - Method in class org.predict4all.nlp.words.model.EquivalenceClassWord
- getEquivalenceClassId() - Method in interface org.predict4all.nlp.words.model.Word
- getError() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getErrors() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Determine the errors for this correction rule.
Error define the part of the text that could be replaced.
For example, if error is "a", every "a" char in user input could be potnetially replaced withCorrectionRule.withReplacement(String...)
Errors should not contains any word separator(space, etc...) - getExactWordsWithPrefixExist(String) - Method in class org.predict4all.nlp.words.WordDictionary
- getExampleId() - Method in enum org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- getExtractedTokenValues() - Method in class org.predict4all.nlp.parser.matcher.TokenRegexResult
- getExtractedValue(int) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexResult
- getFactor() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- getFactor() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getFactor() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
The correction factor will influence how much the correction "counts" relatively to a correct word.
- getFactor() - Method in class org.predict4all.nlp.words.NextWord
- getFirstPrefix() - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- getFirstWordId() - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- getFormattedText() - Method in class org.predict4all.nlp.parser.matcher.PatternMatched
- getFrequency() - Method in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
- getFrequency(String) - Method in interface org.predict4all.nlp.language.BaseWordDictionary
- getFrequency(String) - Method in class org.predict4all.nlp.language.french.FrenchBaseWordDictionary
- getId() - Method in enum org.predict4all.nlp.EquivalenceClass
- getId() - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getId() - Method in interface org.predict4all.nlp.language.LanguageModel
- getId() - Method in enum org.predict4all.nlp.Separator
- getId() - Method in enum org.predict4all.nlp.Tag
- getID() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getID() - Method in interface org.predict4all.nlp.words.model.Word
- getIdByte() - Method in enum org.predict4all.nlp.EquivalenceClass
- getIdByte() - Method in enum org.predict4all.nlp.Separator
- getIdByte() - Method in enum org.predict4all.nlp.Tag
- getInputFile() - Method in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- getKey() - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getKey() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getLanguageModel() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The language model to be used to predict words.
Typically implementations are provided by the framework (seeFrenchLanguageModel
)
It should always be filled - getLastMatchedToken() - Method in class org.predict4all.nlp.parser.matcher.TokenRegexResult
- getLastUseDate() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getLastUseDate() - Method in class org.predict4all.nlp.words.model.UserWord
- getLastUseDate() - Method in interface org.predict4all.nlp.words.model.Word
- getLeft() - Method in class org.predict4all.nlp.utils.Pair
- getLeft() - Method in class org.predict4all.nlp.utils.Triple
- getLongestMatchingWords(List<Token>, int, Set<Integer>) - Method in class org.predict4all.nlp.words.WordPrefixDetector
-
Try to detect if the end of the given sentence finish with a word already started.
This is much more precise than just checking if the last token is a separator, because a word could have separator inside (e.g. : "New York" has a space, "là -bas" has a hyphen). - getLongestWordPrefix() - Method in class org.predict4all.nlp.words.WordPrefixDetected
- getLsaDensitySize() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getLsaFrequentWordSize() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getLsaTargetSvdSize() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getLsaVocabularySize() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getLsaWindowSize() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getMax() - Method in class org.predict4all.nlp.utils.progressindicator.LoggingProgressIndicator
- getMax() - Method in class org.predict4all.nlp.utils.progressindicator.NoOpProgressIndicator
- getMax() - Method in interface org.predict4all.nlp.utils.progressindicator.ProgressIndicator
- getMaxIdValue() - Static method in enum org.predict4all.nlp.EquivalenceClass
- getMaxIndexFromEnd() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getMaxIndexFromEnd() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Max index (from word end), exclusive (e.g. maxIndexFromEnd = 2, never apply the rule on the last two char)
Useful to ignore word ends. - getMaxIndexFromStart() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getMaxIndexFromStart() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Max index (from word start), exclusive (e.g. maxIndexFromStart = 1, only the first char)
Useful to restrain correction to the word start. - getMaxOrder() - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
- getMiddle() - Method in class org.predict4all.nlp.utils.Triple
- getMinCountToProvideCorrection() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
A minimum count of char before having correction result integrated.
Note that this has not effect ifPredictionParameter.isEnableWordCorrection()
is disabled.
Typically, setting this value to 3 will allow predictor to check for correction only once user typed 3 chars. - getMinCountToProvidePrediction() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
A minimum count of char before having prediction result.
This built-in feature allow prediction to be displayed only after a certain amount of user input.
Typically, setting this value to 1 will disable next word prediction and will only predict current typed word prediction. - getMinIndexFromEnd() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getMinIndexFromEnd() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Min index, from end, inclusive (inclusive from word end, e.g. if = 1, only correct the last char)
Useful to correct only the last part of a word. - getMinIndexFromStart() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getMinIndexFromStart() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Min index, from start, inclusive (inclusive from word start, e.g. if = 1, never correct the first char)
Useful to correct only the "middle" area of a word - getMinUseCountToValidateNewWord() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Minimum new word use count to be displayed in the prediction.
This avoid having typing errors displayed as prediction results. - getMonthOrRegex() - Static method in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- getName() - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
The name for this node (just informative)
- getNameId() - Method in enum org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- getNext() - Method in class org.predict4all.nlp.io.TokenFileInputStream
- getNext() - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher
- getNext() - Method in class org.predict4all.nlp.parser.TokenListProvider
- getNext() - Method in interface org.predict4all.nlp.parser.TokenProvider
- getNext(TokenProvider) - Method in interface org.predict4all.nlp.parser.token.Token
- getNextCharCountToRemove() - Method in class org.predict4all.nlp.prediction.WordPredictionResult
- getNextWord(int[]) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Return the immediate next words for a given prefix (without any filter)
- getNgramCounts() - Method in class org.predict4all.nlp.trainer.DataTrainerResult
- getNgramDebugAfterPruning() - Method in class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- getNgramDebugBeforePruning() - Method in class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- getNgramOrder() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getNgramPruningCountThreshold() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getNgramPruningOrderCountThresholds() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getNgramPruningWeightedDifferenceThreshold() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getNGramTag() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getNGramTag() - Method in interface org.predict4all.nlp.words.model.Word
- getNGramTagId() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getNGramTagId() - Method in class org.predict4all.nlp.words.model.TagWord
- getNGramTagId() - Method in interface org.predict4all.nlp.words.model.Word
- getNodeFor(int[], int, int) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Retrieve a node for a given prefix (if exists)
- getNodeFor(FileChannel, int[], int) - Method in class org.predict4all.nlp.ngram.trie.StaticNGramTrieNode
-
Will try to retrieve a node for a given prefix.
Load needed node on demand while browsing the trie.
Children of the returned node are not loaded yet. - getNodeForPrefix(int[], int) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Use to retrieve a node for a given prefix.
For example, for prefix = [1,2] will return the trie node corresponding to {2}.
The children of the given node may have not been loaded. - getNodeForPrefix(int[], int) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- getNodeForPrefix(int[], int) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- getOfficialChar() - Method in enum org.predict4all.nlp.Separator
- getOfficialCharString() - Method in enum org.predict4all.nlp.Separator
- getOrDefault(T, T) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- getOutputDirectory() - Method in class org.predict4all.nlp.trainer.corpus.TrainingCorpus
- getOutputDirectoryName() - Method in enum org.predict4all.nlp.trainer.step.TrainingStep
- getOutputFile() - Method in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- getPartCount() - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- getPartCount() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- getPattern(String, int) - Static method in class org.predict4all.nlp.parser.matcher.TermMatcherUtils
- getPrediction() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- getPredictionParameter() - Method in class org.predict4all.nlp.prediction.WordPredictor
- getPredictions() - Method in class org.predict4all.nlp.prediction.WordPredictionResult
- getPredictionToDisplay() - Method in class org.predict4all.nlp.prediction.WordPrediction
- getPredictionToInsert() - Method in class org.predict4all.nlp.prediction.WordPrediction
- getPreviousCharCountToRemove() - Method in class org.predict4all.nlp.prediction.WordPrediction
- getPreviousEndToken() - Method in class org.predict4all.nlp.parser.matcher.PatternMatched
- getPreviousStep() - Method in enum org.predict4all.nlp.trainer.step.TrainingStep
- getProbability(int[], int, int, int) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Return the probability of a word for a given prefix.
Given index = 0 and length = prefix.length will return the maximum order probability (e.g. prefix.length = 3, will return probability for order 3) - getProbFactor() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getProbFactor() - Method in class org.predict4all.nlp.words.model.SimpleWord
- getProbFactor() - Method in interface org.predict4all.nlp.words.model.Word
-
This factor can be used to modify final probabilities of the predictions.
It will be applied once probabilities are computed to influence result list.
It is mainly used in a multiplication with the original probability (and then the result list is normalized).
To only rely on probabilities, the value should be 1.0 - getPruningMethod() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getRawProbability(int[], int, int, int) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
- getRegex() - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher
- getReplacement() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getReplacementLeftPart() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getReplacementRightPart() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getReplacements() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Determine the replacements that could be used to correct the
CorrectionRule.getErrors()
.
Replacement could contains at most one word separator (space, etc...) : this allow correction merged words. - getReplacementSeparator() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getReplacementSeparatorIndex() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getRight() - Method in class org.predict4all.nlp.utils.Pair
- getRight() - Method in class org.predict4all.nlp.utils.Triple
- getRoot() - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
- getRootBlockSize() - Method in class org.predict4all.nlp.ngram.dictionary.DynamicNGramDictionary
- getRootBlockSize() - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- getScore() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- getScore() - Method in class org.predict4all.nlp.prediction.WordPrediction
- getSecondPrefix() - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- getSecondWordId() - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- getSemanticContrastFactor() - Method in interface org.predict4all.nlp.semantic.SemanticDictionaryConfiguration
- getSemanticDensityMaxBound() - Method in interface org.predict4all.nlp.semantic.SemanticDictionaryConfiguration
- getSemanticDensityMinBound() - Method in interface org.predict4all.nlp.semantic.SemanticDictionaryConfiguration
- getSeparator() - Method in class org.predict4all.nlp.parser.token.SeparatorToken
- getSeparator() - Method in interface org.predict4all.nlp.parser.token.Token
- getSeparator() - Method in class org.predict4all.nlp.words.NextWord
- getSeparatorById(byte) - Static method in enum org.predict4all.nlp.Separator
- getSeparatorFor(char) - Static method in enum org.predict4all.nlp.Separator
- getSet() - Method in class org.predict4all.nlp.utils.FifoSet
- getSimilarityCosineFor(Collection<Integer>, List<AbstractPredictionToCompute>, double) - Method in class org.predict4all.nlp.semantic.SemanticDictionary
- getSmoothingDiscountValue() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getSmoothingDiscountValueLowerBound() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getSmoothingDiscountValueUpperBound() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getSrcBuilder() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- getStartUntilNextSeparator(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- getStaticNgramDictionary() - Method in class org.predict4all.nlp.prediction.WordPredictor
- getStep() - Method in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- getStep(File, TrainingCorpus) - Method in enum org.predict4all.nlp.trainer.step.TrainingStep
- getStopWordDictionary(TrainingConfiguration) - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getStopWordDictionary(TrainingConfiguration) - Method in interface org.predict4all.nlp.language.LanguageModel
- getStopWordDictionaryPath() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getTag() - Method in class org.predict4all.nlp.parser.token.TagToken
- getTag() - Method in interface org.predict4all.nlp.parser.token.Token
- getText() - Method in class org.predict4all.nlp.parser.token.SeparatorToken
- getText() - Method in interface org.predict4all.nlp.parser.token.Token
- getText() - Method in class org.predict4all.nlp.parser.token.WordToken
- getTextForType() - Method in interface org.predict4all.nlp.parser.token.Token
- getTokenCount() - Method in class org.predict4all.nlp.words.WordPrefixDetected
- getTokenMatchersForNGram() - Method in class org.predict4all.nlp.language.AbstractLanguageModel
- getTokenMatchersForNGram() - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getTokenMatchersForNGram() - Method in interface org.predict4all.nlp.language.LanguageModel
- getTokenMatchersForSemanticAnalysis() - Method in class org.predict4all.nlp.language.AbstractLanguageModel
- getTokenMatchersForSemanticAnalysis() - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getTokenMatchersForSemanticAnalysis() - Method in interface org.predict4all.nlp.language.LanguageModel
- getTotalCountFor(TrainingStep) - Method in class org.predict4all.nlp.trainer.corpus.TrainingCorpus
- getType() - Method in class org.predict4all.nlp.parser.matcher.PatternMatched
- getType() - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
Type of this node
- getType() - Method in class org.predict4all.nlp.words.model.EquivalenceClassWord
- getType() - Method in class org.predict4all.nlp.words.model.SimpleWord
- getType() - Method in class org.predict4all.nlp.words.model.TagWord
- getType() - Method in class org.predict4all.nlp.words.model.UserWord
- getType() - Method in interface org.predict4all.nlp.words.model.Word
- getUnknownWordCountThreshold() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getUpperCaseReplacementThreshold() - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- getUsageCount() - Method in class org.predict4all.nlp.words.model.AbstractWord
- getUsageCount() - Method in class org.predict4all.nlp.words.model.UserWord
- getUsageCount() - Method in interface org.predict4all.nlp.words.model.Word
- getValidOneCharWords() - Method in class org.predict4all.nlp.language.AbstractLanguageModel
- getValidOneCharWords() - Method in class org.predict4all.nlp.language.french.FrenchLanguageModel
- getValidOneCharWords() - Method in interface org.predict4all.nlp.language.LanguageModel
- getValidWordForPredictionByPrefix(String, PredictionParameter, int, Set<Integer>) - Method in class org.predict4all.nlp.words.WordDictionary
-
Returns all the words that starts with a given prefix.
The returned list is not sorted. - getWeekDaysOrRegex() - Static method in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- getWord() - Method in class org.predict4all.nlp.words.model.EquivalenceClassWord
- getWord() - Method in class org.predict4all.nlp.words.model.SimpleWord
- getWord() - Method in class org.predict4all.nlp.words.model.TagWord
- getWord() - Method in interface org.predict4all.nlp.words.model.Word
- getWord(int) - Method in class org.predict4all.nlp.words.WordDictionary
-
To get a word entity from id.
Contrary to otherWordDictionary.getWord(String)
method, this return null if there is no word for the given ID - getWord(String) - Method in class org.predict4all.nlp.words.WordDictionary
-
To get the word entity from text.
Note that this method will never return null : it can however returnTag.UNKNOWN
id if there is no word in the dictionary for the given text. - getWordDictionary() - Method in class org.predict4all.nlp.prediction.WordPredictor
- getWordId() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- getWordId() - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- getWordId() - Method in class org.predict4all.nlp.prediction.model.UniquePredictionToCompute
- getWordId() - Method in class org.predict4all.nlp.prediction.WordPrediction
- getWordId(String) - Method in class org.predict4all.nlp.words.WordDictionary
-
To get a word ID.
Note that this method will never return null : it can however returnTag.UNKNOWN
id if there is no word in the dictionary for the given text. - getWordId(WordDictionary) - Method in interface org.predict4all.nlp.parser.token.Token
- getWordId1() - Method in class org.predict4all.nlp.words.NextWord
- getWordId2() - Method in class org.predict4all.nlp.words.NextWord
- getWords() - Method in class org.predict4all.nlp.words.WordPrefixDetected
- getWordUsed() - Method in class org.predict4all.nlp.ngram.dictionary.DynamicNGramDictionary
H
- HASH - org.predict4all.nlp.Separator
- hashCode() - Method in class org.predict4all.nlp.ngram.NGramKey
- hashCode() - Method in class org.predict4all.nlp.parser.token.SeparatorToken
- hashCode() - Method in class org.predict4all.nlp.parser.token.WordToken
- hashCode() - Method in class org.predict4all.nlp.semantic.CoOccurrenceKey
- hashCode() - Method in class org.predict4all.nlp.utils.BiIntegerKey
- hashCode() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- hasNext() - Method in interface org.predict4all.nlp.parser.StringProducer
- HEARING_CONFUSION - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- HOMOPHONE - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- HYPHEN - org.predict4all.nlp.Separator
- HyphenMatcher - Class in org.predict4all.nlp.language.french.matcher
-
Term matcher to match word sequence with hyphen between each word.
The sequence should start and end with hyphen, examples : a-t : valid a-t-elle : valid a-t-elle- : not valid -test- : not valid - HyphenMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.HyphenMatcher
I
- increment() - Method in class org.predict4all.nlp.utils.progressindicator.LoggingProgressIndicator
- increment() - Method in class org.predict4all.nlp.utils.progressindicator.NoOpProgressIndicator
- increment() - Method in interface org.predict4all.nlp.utils.progressindicator.ProgressIndicator
- incrementUsageCount() - Method in class org.predict4all.nlp.words.model.AbstractWord
- incrementUsageCount() - Method in class org.predict4all.nlp.words.model.UserWord
- incrementUsageCount() - Method in interface org.predict4all.nlp.words.model.Word
-
To increase the "usage" count of this word
- incrementUserWord(int) - Method in class org.predict4all.nlp.words.WordDictionary
- index(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Locates the index of
val
. - indexOfInCurrentPart(String, int) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- indexOfInCurrentPart(String, int) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- INF_SUP - org.predict4all.nlp.Separator
- INFO_EXTENSION - Static variable in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- initialize() - Method in interface org.predict4all.nlp.language.BaseWordDictionary
- initialize() - Method in class org.predict4all.nlp.language.french.FrenchBaseWordDictionary
- initialize(WordDictionary) - Method in class org.predict4all.nlp.language.french.FrenchStopWordDictionary
- initialize(WordDictionary) - Method in interface org.predict4all.nlp.language.StopWordDictionary
- initializeInformation() - Method in class org.predict4all.nlp.trainer.corpus.AbstractTokenTrainingDocument
- initializeInformation() - Method in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- initializeInformation() - Method in class org.predict4all.nlp.trainer.step.ParserTrainingDocument
- initStep(TrainingStep) - Method in class org.predict4all.nlp.trainer.corpus.TrainingCorpus
- inputFile - Variable in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- insertKey(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Locates the index at which
val
can be inserted. if there is already a value equal()ingval
in the set, returns that value as a negative integer. - INSTANCE - Static variable in class org.predict4all.nlp.utils.progressindicator.NoOpProgressIndicator
- INTEGER - org.predict4all.nlp.EquivalenceClass
- isAddNewWordsEnabled() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Enable/disable that new words can be added to dictionary while using the word predictor or while training the dynamic model.
Note that added word could appears in predictions only if they appeared enough times, as suggestion byPredictionParameter.getMinUseCountToValidateNewWord()
- isBidirectional() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Bidirectional indicate that the resulting rule will be acting in both way : errors will be replaced with replacements and replacements will be replaced with errors.
This is quite the same asCorrectionRule.withConfusionSet(String...)
with a difference is that in a confusion set, errors can be replaced by errors and replacements can be replaced with replacements, which is not the case in bidirectional rules.
If you have only one error and replacement, confusion set and bidirectional will result in the same rule. - isBlank(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- isCapitalized(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- isCapitalizedWord() - Method in class org.predict4all.nlp.words.WordPrefixDetected
- isCaptureValue() - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher
- isCorrection() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- isCorrection() - Method in class org.predict4all.nlp.prediction.WordPrediction
- isCorrection() - Method in class org.predict4all.nlp.words.NextWord
- isDouble() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- isDouble() - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- isDouble() - Method in class org.predict4all.nlp.prediction.model.UniquePredictionToCompute
- isDouble() - Method in class org.predict4all.nlp.words.NextWord
- isDynamicModelEnabled() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Enable/disable the use of dynamic model when predicting next words.
If enabled, it should always be combined with an instance ofDynamicNGramDictionary
given toWordPredictor
.
The model can be trained via theWordPredictor.trainDynamicModel(String, boolean)
- isEmpty() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Tells whether this set is currently holding any elements.
- isEmpty(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- isEnabled() - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
To know if this node should be taken into account in
WordCorrectionGenerator
Note that individually node can be enabled but then ignored. - isEnableDebugInformation() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
To enable debug information while predictor is working.
This should be enabled carefully : debug is expensive in memory and computing cost as it creates a lot of String instance.
Typically this should never be enabled when using the predictor in production.
Enabling debug will fillWordPredictionResult.getDebugInformation()
andWordPrediction.getDebugInformation()
- isEnableWordCorrection() - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
To enable/disable word correction.
Enabling this parameter will usePredictionParameter.getCorrectionRulesRoot()
to determine the rules to be used by the correction algorithms. - isEquivalenceClass() - Method in class org.predict4all.nlp.parser.token.EquivalenceClassToken
- isEquivalenceClass() - Method in interface org.predict4all.nlp.parser.token.Token
- isEquivalenceClass() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isEquivalenceClass() - Method in interface org.predict4all.nlp.words.model.Word
- isForceInvalid() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isForceInvalid() - Method in class org.predict4all.nlp.words.model.SimpleWord
- isForceInvalid() - Method in interface org.predict4all.nlp.words.model.Word
-
To force that this word is invalid.
In fact, this method allow removal of a word from prediction result : words can't be removed from dictionary as they can be used in ngrams, but having forceInvalid true has the same effect than removing a word. - isForceValid() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isForceValid() - Method in class org.predict4all.nlp.words.model.SimpleWord
- isForceValid() - Method in interface org.predict4all.nlp.words.model.Word
-
To force that this word become valid, mostly use on
UserWord
to ignore validation. - isFullUpperCase(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- isInitialized() - Method in interface org.predict4all.nlp.language.BaseWordDictionary
- isInitialized() - Method in class org.predict4all.nlp.language.french.FrenchBaseWordDictionary
- isInitialized() - Method in class org.predict4all.nlp.language.french.FrenchStopWordDictionary
- isInitialized() - Method in interface org.predict4all.nlp.language.StopWordDictionary
- isInsertSpacePossible() - Method in class org.predict4all.nlp.prediction.WordPrediction
- isModifiedBySystem() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isModifiedBySystem() - Method in class org.predict4all.nlp.words.model.SimpleWord
- isModifiedBySystem() - Method in interface org.predict4all.nlp.words.model.Word
-
Indicate that this word was modified by the system (e.g. calling a modification method with
modificationByUser
parameter to false) - isModifiedByUser() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isModifiedByUser() - Method in class org.predict4all.nlp.words.model.SimpleWord
- isModifiedByUser() - Method in interface org.predict4all.nlp.words.model.Word
-
Indicate that this word was modified by the user (e.g. calling a modification method with
modificationByUser
parameter to true) - isModifiedByUserOrSystem() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isModifiedByUserOrSystem() - Method in class org.predict4all.nlp.words.model.SimpleWord
- isModifiedByUserOrSystem() - Method in interface org.predict4all.nlp.words.model.Word
- isNextWordsCapitalized(List<Token>, String, int) - Method in class org.predict4all.nlp.words.WordPrefixDetector
- isNGramTag() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isNGramTag() - Method in class org.predict4all.nlp.words.model.TagWord
- isNGramTag() - Method in interface org.predict4all.nlp.words.model.Word
- isNotBlank(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- isNotEmpty(Collection<?>) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- isOptional() - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher
- isPredictionInitialized() - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- isSentenceSeparator() - Method in enum org.predict4all.nlp.Separator
- isSeparator() - Method in class org.predict4all.nlp.parser.token.SeparatorToken
- isSeparator() - Method in interface org.predict4all.nlp.parser.token.Token
- isSpaceBetween() - Method in class org.predict4all.nlp.prediction.model.DoublePredictionToCompute
- isTag() - Method in class org.predict4all.nlp.parser.token.TagToken
- isTag() - Method in interface org.predict4all.nlp.parser.token.Token
- isTokenValidToCreateUserWord(Token) - Method in class org.predict4all.nlp.words.WordDictionary
- isUnique() - Method in class org.predict4all.nlp.words.NextWord
- isUserWord() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isUserWord() - Method in class org.predict4all.nlp.words.model.UserWord
- isUserWord() - Method in interface org.predict4all.nlp.words.model.Word
- isValidForSaving() - Method in class org.predict4all.nlp.words.model.AbstractWord
- isValidForSaving() - Method in class org.predict4all.nlp.words.model.EquivalenceClassWord
- isValidForSaving() - Method in class org.predict4all.nlp.words.model.TagWord
- isValidForSaving() - Method in interface org.predict4all.nlp.words.model.Word
- isValidToBePredicted(PredictionParameter) - Method in class org.predict4all.nlp.words.model.AbstractWord
- isValidToBePredicted(PredictionParameter) - Method in class org.predict4all.nlp.words.model.EquivalenceClassWord
- isValidToBePredicted(PredictionParameter) - Method in class org.predict4all.nlp.words.model.SimpleWord
- isValidToBePredicted(PredictionParameter) - Method in class org.predict4all.nlp.words.model.TagWord
- isValidToBePredicted(PredictionParameter) - Method in class org.predict4all.nlp.words.model.UserWord
- isValidToBePredicted(PredictionParameter) - Method in interface org.predict4all.nlp.words.model.Word
-
To check if this word can be displayed as a prediction result.
This typically return true for original words, but can be sensible to computation for user words.
This can also return true/false regardingWord.isForceInvalid()
orWord.isForceValid()
Also, user word are valid for prediction regardingPredictionParameter.getMinUseCountToValidateNewWord()
- isWord() - Method in class org.predict4all.nlp.parser.token.EquivalenceClassToken
- isWord() - Method in interface org.predict4all.nlp.parser.token.Token
- isWord() - Method in class org.predict4all.nlp.parser.token.WordToken
K
- keys() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- keys(int[]) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
L
- LanguageModel - Interface in org.predict4all.nlp.language
-
Represent a model specific to the input language.
This model is useful to better perform on NLP task by using specific parameters from a language.
E.G. - launchLSATraining(TrainingStep) - Method in class org.predict4all.nlp.trainer.DataTrainer
- launchNGramTraining(TrainingStep) - Method in class org.predict4all.nlp.trainer.DataTrainer
- LBRACKET - org.predict4all.nlp.Separator
- LEAF - org.predict4all.nlp.words.correction.CorrectionRuleNodeType
- length(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- listNextWords(int[], WordDictionary, PredictionParameter, Set<Integer>, Map<BiIntegerKey, NextWord>, int, boolean) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Will go through each ngram dictionary order to find the next possible words for a given prefix
Will first go through the highest order for the given prefix (e.g. prefix length == 3 = order is 4), and if the wantedCount is not reached, will go to the lower order to find new next possible. - listTrieLeaves(int[], int, int, int, BiConsumer<int[], Integer>) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Detect each unique trie leaves for a wanted order, and then call the found callback with the found prefix and word id
- load(File) - Static method in class org.predict4all.nlp.ngram.dictionary.DynamicNGramDictionary
-
Create and open a existing dynamic ngram dictionary.
- loadDictionary(File, SemanticDictionaryConfiguration) - Static method in class org.predict4all.nlp.semantic.SemanticDictionary
- loadDictionary(LanguageModel, File) - Static method in class org.predict4all.nlp.words.WordDictionary
-
Create a word dictionary from a word dictionary data file previously created with the training algorithm.
This method should not be called on user dictionary file, useWordDictionary.loadUserDictionary(File)
instead. - loadFrom(File, File) - Static method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- loadFrom(LanguageModel, File) - Static method in class org.predict4all.nlp.prediction.PredictionParameter
-
Load prediction parameters from a JSON config file (see
PredictionParameter.saveTo(File)
). - loadUserDictionary(File) - Method in class org.predict4all.nlp.words.WordDictionary
-
To load user dictionary on an existing trained dictionary.
This will supplement this dictionary with custom word from user, or existing word with modified parameters.
This should be called on dictionary previously saved withWordDictionary.saveUserDictionary(File)
- LoggingProgressIndicator - Class in org.predict4all.nlp.utils.progressindicator
- LoggingProgressIndicator(String, long) - Constructor for class org.predict4all.nlp.utils.progressindicator.LoggingProgressIndicator
- lowerCase(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
M
- M_FRONT_MBP - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.ApostropheMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.DateDayMonthMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.DateFullDigitMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.DateFullTextMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.DateMonthYearMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.DateWeekDayMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.NumberDecimalMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.NumberIntMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.PercentMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.ProperNameMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.language.french.matcher.SpecialWordMatcher
- match(Token, TokenProvider) - Method in class org.predict4all.nlp.parser.matcher.AbstractRecursiveMatcher
- match(Token, TokenProvider) - Method in interface org.predict4all.nlp.parser.matcher.TokenMatcher
- MATCHERS_NGRAM_FR - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageModel
- MATCHERS_SEMANTIC_ANALYSIS_FR - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageModel
- matchRegexPattern(Token, TokenRegexMatcher, TokenProvider, int) - Static method in class org.predict4all.nlp.parser.matcher.TermMatcherUtils
- maxOrder - Variable in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Max order possible to store in this dictionary.
Could be retrieved by opening the dictionary, or set by user as a limit. - MISC - org.predict4all.nlp.EquivalenceClass
- modifiedBySystem - Variable in class org.predict4all.nlp.words.model.SimpleWord
- modifiedByUser - Variable in class org.predict4all.nlp.words.model.SimpleWord
- MONEY_AMOUNT - org.predict4all.nlp.EquivalenceClass
- MONTHS - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageUtils
N
- NEWLINE - org.predict4all.nlp.Separator
- newThread(Runnable) - Method in class org.predict4all.nlp.utils.DaemonThreadFactory
- next() - Method in interface org.predict4all.nlp.parser.StringProducer
- NextWord - Class in org.predict4all.nlp.words
- NGRAM_COUNT_FORMAT - Static variable in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- NGRAM_DICTIONARY - org.predict4all.nlp.trainer.step.TrainingStep
- NGramDebugger - Interface in org.predict4all.nlp.ngram.debug
-
This interface can be used to check an ngram dictionary while training models.
- NGramDictionaryGenerator - Class in org.predict4all.nlp.ngram
-
Use this generator to train an ngram model.
It will load texts from aTrainingCorpus
and generate a ngram file that could be later opened with aStaticNGramTrieDictionary
- NGramDictionaryGenerator(LanguageModel, TrainingConfiguration, WordDictionary) - Constructor for class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- NGramKey - Class in org.predict4all.nlp.ngram
- NGramPruningMethod - Enum in org.predict4all.nlp.trainer.configuration
- NGramTrainingDocument - Class in org.predict4all.nlp.trainer.step
- NGramTrainingDocument(File, File) - Constructor for class org.predict4all.nlp.trainer.step.NGramTrainingDocument
- NGramWordPredictorUtils - Class in org.predict4all.nlp.ngram
-
Utils class useful when predicting words with an ngram dictionaries.
- NGramWordPredictorUtils(WordDictionary, PredictionParameter) - Constructor for class org.predict4all.nlp.ngram.NGramWordPredictorUtils
- NODE - org.predict4all.nlp.words.correction.CorrectionRuleNodeType
- NONE - org.predict4all.nlp.trainer.configuration.NGramPruningMethod
- NoOpProgressIndicator - Class in org.predict4all.nlp.utils.progressindicator
- NoOpProgressIndicator() - Constructor for class org.predict4all.nlp.utils.progressindicator.NoOpProgressIndicator
- normalizeRow(double[]) - Static method in class org.predict4all.nlp.semantic.SemanticDictionary
- NumberDecimalMatcher - Class in org.predict4all.nlp.language.french.matcher
- NumberDecimalMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.NumberDecimalMatcher
- NumberIntMatcher - Class in org.predict4all.nlp.language.french.matcher
- NumberIntMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.NumberIntMatcher
O
- of(int) - Static method in class org.predict4all.nlp.utils.BiIntegerKey
- of(int, int) - Static method in class org.predict4all.nlp.utils.BiIntegerKey
- of(K, T) - Static method in class org.predict4all.nlp.utils.Pair
- of(K, T, V) - Static method in class org.predict4all.nlp.utils.Triple
- oneInstanceCount - Static variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- open(File) - Static method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
-
Create a static ngram dictionary from a given file.
- OPEN_HOOK - org.predict4all.nlp.Separator
- openDictionary(File) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Open a dictionary from a file.
To use the dictionary, the sameWordDictionary
used to save it should be used. - openDictionary(File) - Method in class org.predict4all.nlp.ngram.dictionary.DynamicNGramDictionary
- openDictionary(File) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- openDictionary(File) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- opposite() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- optional(String) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- optional(Separator) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- or(String...) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- or(Separator...) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- or(Separator, String) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- ORDER_COUNT - org.predict4all.nlp.trainer.configuration.NGramPruningMethod
- org.predict4all.nlp - package org.predict4all.nlp
- org.predict4all.nlp.exception - package org.predict4all.nlp.exception
- org.predict4all.nlp.io - package org.predict4all.nlp.io
-
Contains custom
InputStream
andOutputStream
to save/load Predict4All specific items (Token
andWord
).
Note that NGram are saved without these stream as they are designed to be loaded on demand with aFileChannel
.
Both token and word streams extendsDataOutputStream
orDataInputStream
: this was done for optimization, this method is much more optimized that using any other serialization methods. - org.predict4all.nlp.language - package org.predict4all.nlp.language
-
Represent every language specific items.
A baseAbstractLanguageModel
allow simplerLanguageModel
implementations.
Sub-packages as "french" may contains language specific code. - org.predict4all.nlp.language.french - package org.predict4all.nlp.language.french
- org.predict4all.nlp.language.french.matcher - package org.predict4all.nlp.language.french.matcher
- org.predict4all.nlp.ngram - package org.predict4all.nlp.ngram
-
Package containing everything about the NGram model used in Predict4All.
Contains the ngram training algorithm inNGramDictionaryGenerator
Also containsAbstractNGramTrieNode
: a trie structure that can be implemented in both ways : dynamic or static.
This trie structure allow having a huge number of ngram available for probability computation without loading them into memory. - org.predict4all.nlp.ngram.debug - package org.predict4all.nlp.ngram.debug
- org.predict4all.nlp.ngram.dictionary - package org.predict4all.nlp.ngram.dictionary
- org.predict4all.nlp.ngram.trie - package org.predict4all.nlp.ngram.trie
- org.predict4all.nlp.ngram.trie.map - package org.predict4all.nlp.ngram.trie.map
- org.predict4all.nlp.parser - package org.predict4all.nlp.parser
- org.predict4all.nlp.parser.matcher - package org.predict4all.nlp.parser.matcher
- org.predict4all.nlp.parser.token - package org.predict4all.nlp.parser.token
- org.predict4all.nlp.prediction - package org.predict4all.nlp.prediction
-
Main PREDICT4ALL entry point : this package contains "the glue" between every prediction components.
PREDICT4ALL core features are located inWordPredictor
- org.predict4all.nlp.prediction.model - package org.predict4all.nlp.prediction.model
- org.predict4all.nlp.semantic - package org.predict4all.nlp.semantic
-
Semantic related prediction model (WIP) - not used by current
WordPredictor
- org.predict4all.nlp.trainer - package org.predict4all.nlp.trainer
-
Represents the whole data training process managed by the main
DataTrainer
.
Training is done with different steps :Tokenizer
TokenConverter
WordDictionaryGenerator
NGramDictionaryGenerator
Note that theDataTrainer
useTrainingCorpus
andAbstractTrainingDocument
: this abstraction level is useful to be able to train the model on same corpus without having to go through every training step : really useful when developing new training algorithms. - org.predict4all.nlp.trainer.configuration - package org.predict4all.nlp.trainer.configuration
- org.predict4all.nlp.trainer.corpus - package org.predict4all.nlp.trainer.corpus
- org.predict4all.nlp.trainer.step - package org.predict4all.nlp.trainer.step
- org.predict4all.nlp.utils - package org.predict4all.nlp.utils
-
Contains some simple data structure and lambda interfaces and classic "utils" static classes.
- org.predict4all.nlp.utils.progressindicator - package org.predict4all.nlp.utils.progressindicator
- org.predict4all.nlp.words - package org.predict4all.nlp.words
-
Contains classes related to
Word
andWordDictionary
Mainly used to identifyWord
as unique instance identified with int ID.
This package mainly focus on managing vocabulary. - org.predict4all.nlp.words.correction - package org.predict4all.nlp.words.correction
-
Contains every classes and algorithms related to word correction.
The main component isWordCorrectionGenerator
as it contains most of the correction logic.
CorrectionRule
is also important as it the main entry point to configure word correction. - org.predict4all.nlp.words.model - package org.predict4all.nlp.words.model
- OTHER - org.predict4all.nlp.Separator
- OUTPUT_EXTENSION - Static variable in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- outputFile - Variable in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
P
- Pair<K,T> - Class in org.predict4all.nlp.utils
- PARSER - org.predict4all.nlp.trainer.step.TrainingStep
- ParserTrainingDocument - Class in org.predict4all.nlp.trainer.step
- ParserTrainingDocument(String, File, File) - Constructor for class org.predict4all.nlp.trainer.step.ParserTrainingDocument
- PatternMatched - Class in org.predict4all.nlp.parser.matcher
- PatternMatched(String, Token) - Constructor for class org.predict4all.nlp.parser.matcher.PatternMatched
- PatternMatched(EquivalenceClass, String, Token) - Constructor for class org.predict4all.nlp.parser.matcher.PatternMatched
- PERCENT - org.predict4all.nlp.EquivalenceClass
- PERCENT - org.predict4all.nlp.Separator
- PERCENT_FORMAT - Static variable in class org.predict4all.nlp.trainer.DataTrainer
- PERCENT_FORMAT - Static variable in class org.predict4all.nlp.words.WordDictionaryGenerator
- PercentMatcher - Class in org.predict4all.nlp.language.french.matcher
- PercentMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.PercentMatcher
- PHONEM_CONFUSION_SET - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- PIPE - org.predict4all.nlp.Separator
- POINT - org.predict4all.nlp.Separator
- postInsertHook(boolean) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
After an insert, this hook is called to adjust the size/free values of the set and to perform rehashing if necessary.
- predict(String) - Method in class org.predict4all.nlp.prediction.WordPredictor
-
See
WordPredictor.predict(String, String, int, Set)
called with default wanted size (5) - predict(String, int) - Method in class org.predict4all.nlp.prediction.WordPredictor
- predict(String, String, int) - Method in class org.predict4all.nlp.prediction.WordPredictor
- predict(String, String, int, Set<Integer>) - Method in class org.predict4all.nlp.prediction.WordPredictor
-
Try to predict the best
wantedCount
words for a giventextBeforeCaret
.
Will go through ngram dictionary (and eventually through dynamic dictionary) to find the best matching word.
This may predict next word if the predictor detect that there is no current word (e.g. the giventextBeforeCaret
ends with a space) or it may also try to complete the current word end.
If a word is already started andPredictionParameter.isEnableWordCorrection()
, the givenWordPrediction
could be correction of the already started word.
textAfterCaret
is not useful to predict next words, but it used to determineWordPredictionResult.getNextCharCountToRemove()
wantedCount
should be given according to real needs : prediction computing will be longer if wantedCount it too high.
wordIdsToExclude
can be useful to exclude already seen words if the current word is the same from the last prediction call. - predict(String, Set<Integer>) - Method in class org.predict4all.nlp.prediction.WordPredictor
-
See
WordPredictor.predict(String, String, int, Set)
called with default wanted size (5) - Predict4AllInfo - Class in org.predict4all.nlp
-
This retrieves information about the library (version and build date).
This should mostly be used to ensure consistency on saved data (i.e. save and load data from same versions) - Predict4AllInfo() - Constructor for class org.predict4all.nlp.Predict4AllInfo
- Predict4AllUtils - Class in org.predict4all.nlp.utils
-
Contains different utils methods that are used in NLP taks.
- prediction - Variable in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- PredictionParameter - Class in org.predict4all.nlp.prediction
-
Contains parameters to configure how
WordPredictor
is working.
Changes to an instance ofPredictionParameter
while the predictor is running could be not reflected as some values are cached internally. - PredictionParameter(LanguageModel) - Constructor for class org.predict4all.nlp.prediction.PredictionParameter
- probFactor - Variable in class org.predict4all.nlp.words.model.SimpleWord
- progressIndicator - Variable in class org.predict4all.nlp.trainer.TrainerTask
- ProgressIndicator - Interface in org.predict4all.nlp.utils.progressindicator
- PROPER_NAME - org.predict4all.nlp.EquivalenceClass
- ProperNameMatcher - Class in org.predict4all.nlp.language.french.matcher
- ProperNameMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.ProperNameMatcher
- pruneNGramsCount(int, TrainingConfiguration) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- pruneNGramsOrderCount(int[], TrainingConfiguration) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- pruneNGramsWeightedDifference(double, TrainingConfiguration, NGramPruningMethod) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
-
Execute a pruning on the dictionary.
Pruning is implemented with a "weighted difference" algorithm : difference is computed between high order model and a lower order model (e.g. difference between 4-gram - 3gram, then 3-gram - 2-gram) and if the difference is bellow a certain level (threshold), the high order model is deleted.
Difference pruning is executed for max order to bigram level, probabilities are computed again after the pruning. - pruningCountingNGram(int, int, int) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
- put(int, V) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- putAll(Map<? extends Integer, ? extends V>) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- putAndIncrementBy(int[], int) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Add a given ngram to the dictionary and to increment its count.
If the ngram is already in the dictionary, will just increment its count.
This will callAbstractNGramDictionary.putAndIncrementBy(int[], int, int)
with a index = 0 - putAndIncrementBy(int[], int) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- putAndIncrementBy(int[], int) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- putAndIncrementBy(int[], int, int) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Add a given ngram to the dictionary and to increment its count.
If the ngram is already in the dictionary, will just increment its count. - putAndIncrementBy(int[], int, int) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- putAndIncrementBy(int[], int, int) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- putAndIncrementBy(int[], int, int) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Put a ngram into this trie structure, create node if needed, and increment the existing one.
- putIfAbsent(int, V) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- putUserWord(String) - Method in class org.predict4all.nlp.words.WordDictionary
-
To manually add an user word to this dictionary.
This will create the associated word entity.
This doesn't check that a previous word was in the dictionary with the same word : you should check it before calling this method (useWordDictionary.getWord(String)
) - putUserWord(Token) - Method in class org.predict4all.nlp.words.WordDictionary
- putWordTraining(String) - Method in class org.predict4all.nlp.words.WordDictionary
Q
R
- RAW_COUNT - org.predict4all.nlp.trainer.configuration.NGramPruningMethod
- RBRACKET - org.predict4all.nlp.Separator
- readAllChildren(FileChannel, int) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Read all children from a given file channel, then load recursively all the children.
- readDictionaryInformation(ByteBuffer) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Read the general information for this dictionary from a given buffer (doesn't do any check)
- readNodeInformation(ByteBuffer) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Read this node information from a given buffer.
- readNodeInformation(ByteBuffer) - Method in class org.predict4all.nlp.ngram.trie.StaticNGramTrieNode
-
Read the node information contained into the given buffer to this node (without any check).
- readToken() - Method in class org.predict4all.nlp.io.TokenFileInputStream
- readWord() - Method in class org.predict4all.nlp.io.WordFileInputStream
- rehash(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- remove(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- REMOVE_LETTER - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- removeAt(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Delete the record at
index
. - REMOVED - Static variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
flag indicating that the value of a slot in the hashtable was deleted
- retainEntries(TIntObjectProcedure<? super V>) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- rootNode - Variable in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Root node of this dictionary (this node contains as children the whole vocabulary)
- ruleBuilder() - Static method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Initialize empty correction rule builder
- run() - Method in class org.predict4all.nlp.trainer.TrainerTask
S
- saveDictionary(File) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Save this dictionary to a file.
Will save the dictionary relative with id only, this means that the same word dictionary should be loaded if this dictionary is opened later. - saveDictionary(File) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- saveDictionary(File) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- saveTo(File) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Save this prediction parameters to a config file that could be later loaded with
PredictionParameter.loadFrom(LanguageModel, File)
This implementation used JSON to save values. - saveTo(File) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- saveUserDictionary(File) - Method in class org.predict4all.nlp.words.WordDictionary
-
To save this dictionary modified words.
This will saved into the given file : theUserWord
added to the dictionary, but also everyWord
that was modified (e.g. ifWord.setProbFactor(double, boolean)
,Word.setForceInvalid(boolean, boolean)
etc... was called).
This file can later be loaded withWordDictionary.loadUserDictionary(File)
- SEMANTIC_DICTIONARY - org.predict4all.nlp.trainer.step.TrainingStep
- SemanticDictionary - Class in org.predict4all.nlp.semantic
-
Represents a semantic dictionary to be used to predict next words.
WARNING : THIS IS A WIP - SemanticDictionaryConfiguration - Interface in org.predict4all.nlp.semantic
- SemanticDictionaryGenerator - Class in org.predict4all.nlp.semantic
-
To generate a
SemanticDictionary
from an input corpus.
This creates a term x term matrix and then reduces it with SVD (via an optimized R script, "Rscript" should be available in path). - SemanticDictionaryGenerator(LanguageModel, WordDictionary, TrainingConfiguration) - Constructor for class org.predict4all.nlp.semantic.SemanticDictionaryGenerator
- SemanticTrainingDocument - Class in org.predict4all.nlp.trainer.step
- SemanticTrainingDocument(File, File) - Constructor for class org.predict4all.nlp.trainer.step.SemanticTrainingDocument
- SEMICOLON - org.predict4all.nlp.Separator
- Separator - Enum in org.predict4all.nlp
-
Represent chars between words.
This is preferred to regex pattern because separator are fully controlled.
If you add any new separator, watch the last used id - SeparatorToken - Class in org.predict4all.nlp.parser.token
- SeparatorToken(Separator) - Constructor for class org.predict4all.nlp.parser.token.SeparatorToken
- SEQUENCES - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- setAddNewWordsEnabled(boolean) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Enable/disable that new words can be added to dictionary while using the word predictor or while training the dynamic model.
Note that added word could appears in predictions only if they appeared enough times, as suggestion byPredictionParameter.getMinUseCountToValidateNewWord()
- setBaseWordDictionaryPath(String) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setConvertCaseFromDictionaryModelThreshold(double) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setCorrectionDefaultCost(double) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The default cost applied to correction rule if
CorrectionRule.getCost()
is null. - setCorrectionDefaultFactor(double) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The default factor applied to correction rule if
CorrectionRule.getFactor()
is null. - setCorrectionMaxCost(double) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Determine how much correction will be applied to a same input.
The higher this value, the more correction will be applied. - setCorrectionRule(CorrectionRule) - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
The correction rule associated on this node.
Will be taken into account only ifCorrectionRuleNode.getType()
isCorrectionRuleNodeType.LEAF
- setCorrectionRulesRoot(CorrectionRuleNode) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The correction rules to apply if
PredictionParameter.isEnableWordCorrection()
is enabled.
Correction rule are organised as a tree to allow enabling/disabled a whole part of the tree.
Correction rules can be created programmatically, orFrenchDefaultCorrectionRuleGenerator
can be used. - setDebugPrefix(String) - Method in class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- setDebugPrefix(String) - Method in class org.predict4all.nlp.trainer.DataTrainer
- setDirectlyValidWordCountThreshold(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setDynamicModelEnabled(boolean) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Enable/disable the use of dynamic model when predicting next words.
If enabled, it should always be combined with an instance ofDynamicNGramDictionary
given toWordPredictor
.
The model can be trained via theWordPredictor.trainDynamicModel(String, boolean)
- setDynamicModelMinimumWeight(double) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Useful to set the minimum weight for dynamic model when interpolating both static and dynamic model.
Value should ranger between 0.0 and 0.5. - setEnabled(boolean) - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
To know if this node should be taken into account in
WordCorrectionGenerator
Note that individually node can be enabled but then ignored. - setEnableDebugInformation(boolean) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
To enable debug information while predictor is working.
This should be enabled carefully : debug is expensive in memory and computing cost as it creates a lot of String instance.
Typically this should never be enabled when using the predictor in production.
Enabling debug will fillWordPredictionResult.getDebugInformation()
andWordPrediction.getDebugInformation()
- setEnableWordCorrection(boolean) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
To enable/disable word correction.
Enabling this parameter will usePredictionParameter.getCorrectionRulesRoot()
to determine the rules to be used by the correction algorithms. - setForceInvalid(boolean, boolean) - Method in class org.predict4all.nlp.words.model.AbstractWord
- setForceInvalid(boolean, boolean) - Method in class org.predict4all.nlp.words.model.SimpleWord
- setForceInvalid(boolean, boolean) - Method in interface org.predict4all.nlp.words.model.Word
-
To force that this word is invalid.
In fact, this method allow removal of a word from prediction result : words can't be removed from dictionary as they can be used in ngrams, but having forceInvalid true has the same effect than removing a word. - setForceValid(boolean, boolean) - Method in class org.predict4all.nlp.words.model.AbstractWord
- setForceValid(boolean, boolean) - Method in class org.predict4all.nlp.words.model.SimpleWord
- setForceValid(boolean, boolean) - Method in interface org.predict4all.nlp.words.model.Word
-
To force that this word become valid, mostly use on
UserWord
to ignore validation. - setLanguageModel(LanguageModel) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
The language model to be used to predict words.
Typically implementations are provided by the framework (seeFrenchLanguageModel
)
It should always be filled - setLsaDensitySize(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setLsaFrequentWordSize(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setLsaTargetSvdSize(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setLsaVocabularySize(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setLsaWindowSize(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setMinCountToProvideCorrection(int) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
A minimum count of char before having correction result integrated.
Note that this has not effect ifPredictionParameter.isEnableWordCorrection()
is disabled.
Typically, setting this value to 3 will allow predictor to check for correction only once user typed 3 chars. - setMinCountToProvidePrediction(int) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
A minimum count of char before having prediction result.
This built-in feature allow prediction to be displayed only after a certain amount of user input.
Typically, setting this value to 1 will disable next word prediction and will only predict current typed word prediction. - setMinUseCountToValidateNewWord(int) - Method in class org.predict4all.nlp.prediction.PredictionParameter
-
Minimum new word use count to be displayed in the prediction.
This avoid having typing errors displayed as prediction results. - setModifiedBySystem(boolean) - Method in class org.predict4all.nlp.words.model.AbstractWord
- setModifiedBySystem(boolean) - Method in interface org.predict4all.nlp.words.model.Word
-
To manually set modification by system flag
- setModifiedByUser(boolean) - Method in class org.predict4all.nlp.words.model.AbstractWord
- setModifiedByUser(boolean) - Method in class org.predict4all.nlp.words.model.SimpleWord
- setModifiedByUser(boolean) - Method in interface org.predict4all.nlp.words.model.Word
-
To manually set modification by user flag
- setName(String) - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
-
The name for this node (just informative)
- setNgramDebugAfterPruning(NGramDebugger) - Method in class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- setNgramDebugAfterPruning(NGramDebugger) - Method in class org.predict4all.nlp.trainer.DataTrainer
- setNgramDebugBeforePruning(NGramDebugger) - Method in class org.predict4all.nlp.ngram.NGramDictionaryGenerator
- setNgramDebugBeforePruning(NGramDebugger) - Method in class org.predict4all.nlp.trainer.DataTrainer
- setNgramOrder(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setNgramPruningCountThreshold(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setNgramPruningOrderCountThresholds(int[]) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setNgramPruningWeightedDifferenceThreshold(double) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setProbFactor(double, boolean) - Method in class org.predict4all.nlp.words.model.AbstractWord
- setProbFactor(double, boolean) - Method in class org.predict4all.nlp.words.model.SimpleWord
- setProbFactor(double, boolean) - Method in interface org.predict4all.nlp.words.model.Word
-
This factor can be used to modify final probabilities of the predictions.
It will be applied once probabilities are computed to influence result list.
It is mainly used in a multiplication with the original probability (and then the result list is normalized).
To only rely on probabilities, the value should be 1.0 - setPruningMethod(NGramPruningMethod) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setScore(double) - Method in class org.predict4all.nlp.prediction.model.AbstractPredictionToCompute
- setSmoothingDiscountValue(double) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setSmoothingDiscountValueLowerBound(double) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setSmoothingDiscountValueUpperBound(double) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setStopWordDictionaryPath(String) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setTranslationProvider(FrenchDefaultCorrectionRuleGenerator.TranslationProvider) - Static method in class org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator
- setUnknownWordCountThreshold(int) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- setUp(int) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
initializes the hashtable to a prime capacity which is at least
initialCapacity + 1
. - setUpperCaseReplacementThreshold(double) - Method in class org.predict4all.nlp.trainer.configuration.TrainingConfiguration
- SimpleGeneratingCorrection - Class in org.predict4all.nlp.words.correction
- SimpleGeneratingCorrection(String, boolean) - Constructor for class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- SimpleWord - Class in org.predict4all.nlp.words.model
- SimpleWord(int, String) - Constructor for class org.predict4all.nlp.words.model.SimpleWord
- SimpleWord(int, String, boolean, boolean, double, boolean, boolean) - Constructor for class org.predict4all.nlp.words.model.SimpleWord
- SingleThreadDoubleAdder - Class in org.predict4all.nlp.utils
-
Similar to
DoubleAdder
but for a single threaded usage.
Just a simple double reference without any overhead. - SingleThreadDoubleAdder(double) - Constructor for class org.predict4all.nlp.utils.SingleThreadDoubleAdder
- size() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Returns the number of distinct elements in this collection.
- size() - Method in interface org.predict4all.nlp.parser.StringProducer
- size() - Method in class org.predict4all.nlp.utils.FifoSet
- size() - Method in class org.predict4all.nlp.words.WordDictionary
-
The word count stored in this dictionary.
- SLASH - org.predict4all.nlp.Separator
- SPACE - org.predict4all.nlp.Separator
- SpecialWordMatcher - Class in org.predict4all.nlp.language.french.matcher
- SpecialWordMatcher() - Constructor for class org.predict4all.nlp.language.french.matcher.SpecialWordMatcher
- start() - Static method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- START - org.predict4all.nlp.Tag
- STATIC_TRIE_NODE_SIZE_BYTE - Static variable in class org.predict4all.nlp.ngram.trie.AbstractNGramTrieNode
-
Static node byte size (3 integer, 2 double).
Integer : word id, children size, children position.
Double : frequency, backoff weight. - StaticNGramTrieDictionary - Class in org.predict4all.nlp.ngram.dictionary
-
Represent a static ngram dictionary where trie node are loaded "on demand" while browsing through the nodes.
This dictionary is read only and cannot be updated or saved : methods likeStaticNGramTrieDictionary.updateProbabilities(double[])
,StaticNGramTrieDictionary.putAndIncrementBy(int[], int)
are not supported by this dictionary. - StaticNGramTrieDictionary() - Constructor for class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- StaticNGramTrieNode - Class in org.predict4all.nlp.ngram.trie
-
Represent a static ngram trie node : when node are used only to retrieve information and compute probabilities, but children are never updated.
This node is particular because children node are loaded on demand from aFileChannel
.
This node is produced in a read only version : to create this node,DynamicNGramTrieNode
andTrainingNGramDictionary
should be used. - StaticNGramTrieNode() - Constructor for class org.predict4all.nlp.ngram.trie.StaticNGramTrieNode
- StopWordDictionary - Interface in org.predict4all.nlp.language
-
A language specific dictionary : contains every stop words for a language
- strEquals(String, String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- strEqualsIgnoreCase(String, String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- StringProducer - Interface in org.predict4all.nlp.parser
- strSplit(String, String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- substringInCurrentPart(int, int) - Method in interface org.predict4all.nlp.words.correction.GeneratingCorrectionI
- substringInCurrentPart(int, int) - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- sum() - Method in class org.predict4all.nlp.utils.SingleThreadDoubleAdder
T
- TAB - org.predict4all.nlp.Separator
- Tag - Enum in org.predict4all.nlp
-
Represent a specific value in a corpus.
Useful to tag specific part of the corpus without any semantic information.
START : represent a sentence start UNKNOWN : represent a word/expression out of vocabulary - TagToken - Class in org.predict4all.nlp.parser.token
- TagWord - Class in org.predict4all.nlp.words.model
- TagWord(Tag) - Constructor for class org.predict4all.nlp.words.model.TagWord
- TermMatcherUtils - Class in org.predict4all.nlp.parser.matcher
- testAll() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- then(String) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- then(Separator) - Method in class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- Token - Interface in org.predict4all.nlp.parser.token
-
Represent the lowest unit when parsing a text.
- TOKEN_CONVERT - org.predict4all.nlp.trainer.step.TrainingStep
- TOKEN_COUNT_FORMAT - Static variable in class org.predict4all.nlp.parser.Tokenizer
- TokenAppender - Interface in org.predict4all.nlp.parser
- TokenConverter - Class in org.predict4all.nlp.parser.matcher
-
This token converter will convert input token list to another token list, with matched
TokenMatcher
pattern. - TokenConverter(TokenMatcher[]) - Constructor for class org.predict4all.nlp.parser.matcher.TokenConverter
- TokenConverterTrainingDocument - Class in org.predict4all.nlp.trainer.step
- TokenConverterTrainingDocument(File, File) - Constructor for class org.predict4all.nlp.trainer.step.TokenConverterTrainingDocument
- TokenFileInputStream - Class in org.predict4all.nlp.io
- TokenFileInputStream(File) - Constructor for class org.predict4all.nlp.io.TokenFileInputStream
- TokenFileOutputStream - Class in org.predict4all.nlp.io
- TokenFileOutputStream(File) - Constructor for class org.predict4all.nlp.io.TokenFileOutputStream
- tokenize(String) - Method in class org.predict4all.nlp.parser.Tokenizer
- tokenize(TrainingCorpus) - Method in class org.predict4all.nlp.parser.Tokenizer
- Tokenizer - Class in org.predict4all.nlp.parser
-
This takes a raw text and to create tokens from it.
- Tokenizer(LanguageModel) - Constructor for class org.predict4all.nlp.parser.Tokenizer
- TokenListAppender - Class in org.predict4all.nlp.parser
- TokenListAppender(List<Token>) - Constructor for class org.predict4all.nlp.parser.TokenListAppender
- TokenListProvider - Class in org.predict4all.nlp.parser
- TokenListProvider(Collection<Token>) - Constructor for class org.predict4all.nlp.parser.TokenListProvider
- TokenMatcher - Interface in org.predict4all.nlp.parser.matcher
-
Represent a matcher that will try to detect if a given token match a specific pattern.
If so, thePatternMatched
contains the the normalized representation of the matched tokens and eventually anEquivalenceClass
. - TokenProvider - Interface in org.predict4all.nlp.parser
- TokenRegexMatcher - Class in org.predict4all.nlp.parser.matcher
- TokenRegexMatcher.TokenRegexMatcherBuilder - Class in org.predict4all.nlp.parser.matcher
- TokenRegexMatcherBuilder() - Constructor for class org.predict4all.nlp.parser.matcher.TokenRegexMatcher.TokenRegexMatcherBuilder
- TokenRegexResult - Class in org.predict4all.nlp.parser.matcher
- TokenRegexResult(Token, List<String>) - Constructor for class org.predict4all.nlp.parser.matcher.TokenRegexResult
- toPrimitive(Integer[]) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- toString() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- toString() - Method in class org.predict4all.nlp.prediction.WordPrediction
- toString() - Method in class org.predict4all.nlp.utils.Pair
- toString() - Method in class org.predict4all.nlp.utils.Triple
- toString() - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- toString() - Method in class org.predict4all.nlp.words.correction.CorrectionRule
- toString() - Method in class org.predict4all.nlp.words.correction.CorrectionRuleNode
- toString() - Method in class org.predict4all.nlp.words.correction.SimpleGeneratingCorrection
- toString() - Method in class org.predict4all.nlp.words.model.AbstractWord
- toString() - Method in class org.predict4all.nlp.words.model.UserWord
- toString() - Method in class org.predict4all.nlp.words.NextWord
- toString() - Method in class org.predict4all.nlp.words.WordPrefixDetected
- trainDynamicModel(String) - Method in class org.predict4all.nlp.prediction.WordPredictor
-
See
WordPredictor.trainDynamicModel(String, boolean)
called withignoreLastSentence
to false - trainDynamicModel(String, boolean) - Method in class org.predict4all.nlp.prediction.WordPredictor
-
Train the current dynamic model (if present and enabled).
This allow the dynamic model to integrate the given text as validated input for next predictions.
This will also update the "seen" word (Word.getUsageCount()
and add non existing word to dictionary ifPredictionParameter.isAddNewWordsEnabled()
is true.
Always train your dynamic model with correct input
This method doesn't save the dynamic model, it should be then saved withTrainingNGramDictionary.saveDictionary(File)
- TrainerTask - Class in org.predict4all.nlp.trainer
- TrainerTask(ProgressIndicator, AbstractTrainingDocument) - Constructor for class org.predict4all.nlp.trainer.TrainerTask
- TrainingConfiguration - Class in org.predict4all.nlp.trainer.configuration
- TrainingCorpus - Class in org.predict4all.nlp.trainer.corpus
- TrainingCorpus(int, File, File, String) - Constructor for class org.predict4all.nlp.trainer.corpus.TrainingCorpus
- TrainingNGramDictionary - Class in org.predict4all.nlp.ngram.dictionary
-
Represent a training dictionary : a ngram dictionary used while training an ngram model.
This dictionary is useful because it supports dynamic insertion and probabilities computing... - TrainingNGramDictionary(int) - Constructor for class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- TrainingNGramDictionary(DynamicNGramTrieNode, int) - Constructor for class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- TrainingStep - Enum in org.predict4all.nlp.trainer.step
-
Represent the possible training steps.
This allow training to be stopped and started again at a specific step : going to converted tokens, and then running WORDS_DICTIONARY multiple times. - transformValues(TObjectFunction<V, V>) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- transitive(CachedPrecomputedCorrectionRule, CachedPrecomputedCorrectionRule) - Static method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- transitivePossible(CachedPrecomputedCorrectionRule) - Method in class org.predict4all.nlp.words.correction.CachedPrecomputedCorrectionRule
- translate(String, Object...) - Method in interface org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.TranslationProvider
- TrieNodeMap<V> - Class in org.predict4all.nlp.ngram.trie.map
-
Custom implementation copied from
TIntObjectHashMap
but with less attribute to reduce the heap size in Trie.
Source is copied from class hierarchy (with manually merging methods):THash
TPrimitiveHash
TIntHash
TIntObjectHashMap
The implementation is modified to keep the minimum attribute count on this Map because this TrieNodeMap will be created a lot of time ! - TrieNodeMap() - Constructor for class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
Creates a new
THash
instance with a prime capacity at or near the minimum needed to holdinitialCapacity
elements with load factorloadFactor
without triggering a rehash. - TrieNodeMapConstant - Class in org.predict4all.nlp.ngram.trie.map
- TrieNodeMapConstant() - Constructor for class org.predict4all.nlp.ngram.trie.map.TrieNodeMapConstant
- trimToSize() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
This simply calls
compact
. - Triple<K,T,V> - Class in org.predict4all.nlp.utils
- TWO_DIGIT_FORMAT_ALWAYS - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- TWO_DIGIT_FORMAT_SOMETIMES - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- TWOPOINT - org.predict4all.nlp.Separator
- TYPE_EQUIVALENCE_CLASS - Static variable in interface org.predict4all.nlp.parser.token.Token
- TYPE_EQUIVALENCE_CLASS - Static variable in interface org.predict4all.nlp.words.model.Word
- TYPE_NGRAM_TAG - Static variable in interface org.predict4all.nlp.words.model.Word
- TYPE_SEPARATOR - Static variable in interface org.predict4all.nlp.parser.token.Token
- TYPE_SIMPLE - Static variable in interface org.predict4all.nlp.words.model.Word
- TYPE_TAG - Static variable in interface org.predict4all.nlp.parser.token.Token
- TYPE_USER_WORD - Static variable in interface org.predict4all.nlp.words.model.Word
- TYPE_WORD - Static variable in interface org.predict4all.nlp.parser.token.Token
U
- uncapitalize(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- UniquePredictionToCompute - Class in org.predict4all.nlp.prediction.model
- UniquePredictionToCompute(int, double, boolean, StringBuilder) - Constructor for class org.predict4all.nlp.prediction.model.UniquePredictionToCompute
- UNKNOWN - org.predict4all.nlp.Tag
- updateProbabilities(double[]) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Update the whole probabilities in this dictionary.
Can take a while if there is a lot of nodes in the dictionary. - updateProbabilities(double[]) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- updateProbabilities(double[]) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- updateProbabilities(int[], int, double[]) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Update probabilities in this dictionary for a specific ngram prefix : this will update the probabilities of the prefix children, and update the backoff weight of the parent node.
This is much more optimized thanAbstractNGramDictionary.updateProbabilities(double[])
- updateProbabilities(int[], int, double[]) - Method in class org.predict4all.nlp.ngram.dictionary.StaticNGramTrieDictionary
- updateProbabilities(int[], int, double[]) - Method in class org.predict4all.nlp.ngram.dictionary.TrainingNGramDictionary
- upperCase(String) - Static method in class org.predict4all.nlp.utils.Predict4AllUtils
- UserWord - Class in org.predict4all.nlp.words.model
- UserWord(int, String) - Constructor for class org.predict4all.nlp.words.model.UserWord
- UserWord(int, String, double, boolean, boolean, long, int) - Constructor for class org.predict4all.nlp.words.model.UserWord
V
- valueOf(String) - Static method in enum org.predict4all.nlp.EquivalenceClass
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.predict4all.nlp.Separator
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.predict4all.nlp.Tag
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.predict4all.nlp.trainer.configuration.NGramPruningMethod
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.predict4all.nlp.trainer.step.TrainingStep
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.predict4all.nlp.words.correction.CorrectionRuleNodeType
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum org.predict4all.nlp.EquivalenceClass
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- values() - Static method in enum org.predict4all.nlp.Separator
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.predict4all.nlp.Tag
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.predict4all.nlp.trainer.configuration.NGramPruningMethod
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.predict4all.nlp.trainer.step.TrainingStep
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.predict4all.nlp.words.correction.CorrectionRuleNodeType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values(V[]) - Method in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
- VERSION - Static variable in class org.predict4all.nlp.Predict4AllInfo
- VISUAL_CONFUSION - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
W
- WEEK_DAYS - Static variable in class org.predict4all.nlp.language.french.FrenchLanguageUtils
- WEIGHTED_DIFFERENCE_FULL_PROB - org.predict4all.nlp.trainer.configuration.NGramPruningMethod
- WEIGHTED_DIFFERENCE_RAW_PROB - org.predict4all.nlp.trainer.configuration.NGramPruningMethod
- withBidirectional(boolean) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Bidirectional indicate that the resulting rule will be acting in both way : errors will be replaced with replacements and replacements will be replaced with errors.
This is quite the same asCorrectionRule.withConfusionSet(String...)
with a difference is that in a confusion set, errors can be replaced by errors and replacements can be replaced with replacements, which is not the case in bidirectional rules.
If you have only one error and replacement, confusion set and bidirectional will result in the same rule. - withConfusionSet(String...) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Helper method to allow having the same errors and replacements.
This is useful if you want to create a confusion set, if the user always invert two letter or group without difference.
After creating a confusion set,CorrectionRule.getErrors()
andCorrectionRule.getReplacements()
will be the same. - withCost(double) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
The correction cost will influence how many correction are cumulated for a same input.
Typically, correction costs are added to check that they are bellowPredictionParameter.getCorrectionMaxCost()
- withError(String...) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Determine the errors for this correction rule.
Error define the part of the text that could be replaced.
For example, if error is "a", every "a" char in user input could be potnetially replaced withCorrectionRule.withReplacement(String...)
Errors should not contains any word separator(space, etc...) - withFactor(double) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
The correction factor will influence how much the correction "counts" relatively to a correct word.
- withMaxIndexFromEnd(int) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Max index (from word end), exclusive (e.g. maxIndexFromEnd = 2, never apply the rule on the last two char)
Useful to ignore word ends. - withMaxIndexFromStart(int) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Max index (from word start), exclusive (e.g. maxIndexFromStart = 1, only the first char)
Useful to restrain correction to the word start. - withMinIndexFromEnd(int) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Min index, from end, inclusive (inclusive from word end, e.g. if = 1, only correct the last char)
Useful to correct only the last part of a word. - withMinIndexFromStart(int) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Min index, from start, inclusive (inclusive from word start, e.g. if = 1, never correct the first char)
Useful to correct only the "middle" area of a word - withNgramCounts(Map<Integer, Pair<Integer, Integer>>) - Method in class org.predict4all.nlp.trainer.DataTrainerResult.Builder
- withReplacement(String...) - Method in class org.predict4all.nlp.words.correction.CorrectionRule
-
Determine the replacements that could be used to correct the
CorrectionRule.getErrors()
.
Replacement could contains at most one word separator (space, etc...) : this allow correction merged words. - word - Variable in class org.predict4all.nlp.words.model.SimpleWord
- Word - Interface in org.predict4all.nlp.words.model
-
Represent a word stored in a
WordDictionary
- word are stored with a int ID to optimize memory usage. - WORD_ENDINGS - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- WORD_SPACE_APOSTROPHE - org.predict4all.nlp.language.french.FrenchDefaultCorrectionRuleGenerator.CorrectionRuleType
- WordCorrectionGenerator - Class in org.predict4all.nlp.words.correction
-
Generate possible correction from a input text and tokens.
Correction are based on rule (CorrectionRule
) and generation is done using a thread pool.
Result correction could be unique word or double word (for example, the error might be a merged word) - WordCorrectionGenerator(WordDictionary, AbstractNGramDictionary<? extends AbstractNGramTrieNode<?>>, PredictionParameter) - Constructor for class org.predict4all.nlp.words.correction.WordCorrectionGenerator
- WordDictionary - Class in org.predict4all.nlp.words
-
Represent a word dictionary.
This dictionary identify each sequence of chars as an unique "word" and keep information for this word.
Each word are identified by a single int ID to save memory and space.
The dictionary itself is identified with an UUID to verify consistency when using user dictionary.
Note thatWord
added toWordDictionary
cannot be removed : their ID should be consistent and they could have been used in aAbstractNGramDictionary
: however, you can disable a word withWord.setForceInvalid(boolean, boolean)
- WordDictionary(LanguageModel, String) - Constructor for class org.predict4all.nlp.words.WordDictionary
- WordDictionaryGenerator - Class in org.predict4all.nlp.words
-
This will generate a word dictionary from a
TrainingCorpus
: this will detect different word in training corpus and try to filter out words : match lower/upper case words, filter on aBaseWordDictionary
, exclude low count words, etc. - WordDictionaryGenerator(LanguageModel, TrainingConfiguration) - Constructor for class org.predict4all.nlp.words.WordDictionaryGenerator
- WordDictionaryMatchingException - Exception in org.predict4all.nlp.exception
-
This exception is mainly thrown if an user dictionary is loaded but is was saved from a previous dictionary.
- WordDictionaryMatchingException() - Constructor for exception org.predict4all.nlp.exception.WordDictionaryMatchingException
- WordDictionaryTrainingDocument - Class in org.predict4all.nlp.trainer.step
- WordDictionaryTrainingDocument(File, File) - Constructor for class org.predict4all.nlp.trainer.step.WordDictionaryTrainingDocument
- WordFileInputStream - Class in org.predict4all.nlp.io
- WordFileInputStream(File) - Constructor for class org.predict4all.nlp.io.WordFileInputStream
- WordFileOutputStream - Class in org.predict4all.nlp.io
- WordFileOutputStream(File) - Constructor for class org.predict4all.nlp.io.WordFileOutputStream
- WordPrediction - Class in org.predict4all.nlp.prediction
-
Represent a predictor from
WordPredictor
- WordPredictionResult - Class in org.predict4all.nlp.prediction
-
Contains the result from
WordPredictor
. - WordPredictor - Class in org.predict4all.nlp.prediction
-
Main entry point of PREDICT4ALL API.
Instance ofWordPredictor
can predict next words, current word ends and even current corrections.
The predictor mainly relies on two item : ngram dictionary and word dictionary to search for word and existing sequences.
Additionally, a dynamic model can be provided to combine both static ngrams originated from an already learned generic model and a dynamic model specific to user, profil, application...
The predictor configuration is located inPredictionParameter
: the instance provided onWordPredictor
creation can be later modified. - WordPredictor(PredictionParameter, WordDictionary, AbstractNGramDictionary<? extends AbstractNGramTrieNode<?>>) - Constructor for class org.predict4all.nlp.prediction.WordPredictor
-
Create a predictor without dynamic model
- WordPredictor(PredictionParameter, WordDictionary, AbstractNGramDictionary<? extends AbstractNGramTrieNode<?>>, AbstractNGramDictionary<? extends DynamicNGramTrieNode>) - Constructor for class org.predict4all.nlp.prediction.WordPredictor
-
Create a predictor with dynamic model
- WordPrefixDetected - Class in org.predict4all.nlp.words
-
Contains information about a started word (found in dictionary)
- WordPrefixDetected(String, int, Map<BiIntegerKey, NextWord>, boolean) - Constructor for class org.predict4all.nlp.words.WordPrefixDetected
- WordPrefixDetector - Class in org.predict4all.nlp.words
-
Useful to detect if a existing word is started in a token list.
It's important to detect if a word is already started when predicting next word, because the prediction result should always takes care of giving prediction result that starts like the already started word.
Because word are allowed to have word separator inside (hyphen, etc...), started word detection is much more complicated that just checking if the token list ends with a token separator. - WordPrefixDetector(WordDictionary, WordCorrectionGenerator, PredictionParameter) - Constructor for class org.predict4all.nlp.words.WordPrefixDetector
- WORDS_DICTIONARY - org.predict4all.nlp.trainer.step.TrainingStep
- WordToken - Class in org.predict4all.nlp.parser.token
- WordToken(String) - Constructor for class org.predict4all.nlp.parser.token.WordToken
- writeDictionaryInfo(ByteBuffer) - Method in class org.predict4all.nlp.ngram.dictionary.AbstractNGramDictionary
-
Write the general information for this dictionary to a given buffer
- writeInformations(int) - Method in class org.predict4all.nlp.trainer.corpus.AbstractTrainingDocument
- writeLevelForDynamicUse(FileChannel, int, int, int) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Write a trie level with
DynamicNGramTrieNode.writeLevel(FileChannel, int, int, int, BiConsumer)
withDynamicNGramTrieNode.writeDynamicNode(FileChannel, int)
as save method.
Result trie file should be read asDynamicNGramTrieNode
- writeLevelForStaticUse(FileChannel, int, int, int) - Method in class org.predict4all.nlp.ngram.trie.DynamicNGramTrieNode
-
Write a trie level with
DynamicNGramTrieNode.writeLevel(FileChannel, int, int, int, BiConsumer)
withDynamicNGramTrieNode.writeStaticNode(FileChannel, int)
as save method.
Result trie file should be read asStaticNGramTrieNode
- writeToken(Token) - Method in class org.predict4all.nlp.io.TokenFileOutputStream
- writeWord(Word) - Method in class org.predict4all.nlp.io.WordFileOutputStream
_
- _free - Variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
the current number of free slots in the hash.
- _maxSize - Variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
The maximum number of elements allowed without allocating more space.
- _set - Variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
the set of ints
- _size - Variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
the current number of occupied slots in the hash.
- _states - Variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
flags indicating whether each position in the hash is FREE, FULL, or REMOVED
- _values - Variable in class org.predict4all.nlp.ngram.trie.map.TrieNodeMap
-
the values of the map
All Classes All Packages