Class WordDictionaryGenerator


  • public class WordDictionaryGenerator
    extends java.lang.Object
    This will generate a word dictionary from a TrainingCorpus : this will detect different word in training corpus and try to filter out words : match lower/upper case words, filter on a BaseWordDictionary, exclude low count words, etc.
    • Field Detail

      • PERCENT_FORMAT

        public static final java.text.DecimalFormat PERCENT_FORMAT
      • COUNT_FORMAT

        public static final java.text.DecimalFormat COUNT_FORMAT
    • Constructor Detail

      • WordDictionaryGenerator

        public WordDictionaryGenerator​(LanguageModel languageModel,
                                       TrainingConfiguration trainingConfiguration)
                                throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • createWordDictionary

        public void createWordDictionary​(TrainingCorpus corpus,
                                         java.util.function.Consumer<java.util.List<TrainerTask>> taskExecutor,
                                         java.io.File dictionaryOuputFile)
                                  throws java.io.FileNotFoundException,
                                         java.io.IOException
        Throws:
        java.io.FileNotFoundException
        java.io.IOException