Package org.predict4all.nlp.words
Class WordDictionaryGenerator
- java.lang.Object
-
- org.predict4all.nlp.words.WordDictionaryGenerator
-
public class WordDictionaryGenerator extends Object
This will generate a word dictionary from aTrainingCorpus
: this will detect different word in training corpus and try to filter out words : match lower/upper case words, filter on aBaseWordDictionary
, exclude low count words, etc.
-
-
Field Summary
Fields Modifier and Type Field Description static DecimalFormat
COUNT_FORMAT
static DecimalFormat
PERCENT_FORMAT
-
Constructor Summary
Constructors Constructor Description WordDictionaryGenerator(LanguageModel languageModel, TrainingConfiguration trainingConfiguration)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
createWordDictionary(TrainingCorpus corpus, Consumer<List<TrainerTask>> taskExecutor, File dictionaryOuputFile)
-
-
-
Field Detail
-
PERCENT_FORMAT
public static final DecimalFormat PERCENT_FORMAT
-
COUNT_FORMAT
public static final DecimalFormat COUNT_FORMAT
-
-
Constructor Detail
-
WordDictionaryGenerator
public WordDictionaryGenerator(LanguageModel languageModel, TrainingConfiguration trainingConfiguration) throws IOException
- Throws:
IOException
-
-
Method Detail
-
createWordDictionary
public void createWordDictionary(TrainingCorpus corpus, Consumer<List<TrainerTask>> taskExecutor, File dictionaryOuputFile) throws FileNotFoundException, IOException
- Throws:
FileNotFoundException
IOException
-
-