Package org.predict4all.nlp.words
Class WordDictionaryGenerator
- java.lang.Object
-
- org.predict4all.nlp.words.WordDictionaryGenerator
-
public class WordDictionaryGenerator extends java.lang.Object
This will generate a word dictionary from aTrainingCorpus
: this will detect different word in training corpus and try to filter out words : match lower/upper case words, filter on aBaseWordDictionary
, exclude low count words, etc.
-
-
Field Summary
Fields Modifier and Type Field Description static java.text.DecimalFormat
COUNT_FORMAT
static java.text.DecimalFormat
PERCENT_FORMAT
-
Constructor Summary
Constructors Constructor Description WordDictionaryGenerator(LanguageModel languageModel, TrainingConfiguration trainingConfiguration)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
createWordDictionary(TrainingCorpus corpus, java.util.function.Consumer<java.util.List<TrainerTask>> taskExecutor, java.io.File dictionaryOuputFile)
-
-
-
Constructor Detail
-
WordDictionaryGenerator
public WordDictionaryGenerator(LanguageModel languageModel, TrainingConfiguration trainingConfiguration) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
createWordDictionary
public void createWordDictionary(TrainingCorpus corpus, java.util.function.Consumer<java.util.List<TrainerTask>> taskExecutor, java.io.File dictionaryOuputFile) throws java.io.FileNotFoundException, java.io.IOException
- Throws:
java.io.FileNotFoundException
java.io.IOException
-
-