Class TextConfig


  • public final class TextConfig
    extends Object
    Capture a set of key metrics for any given language.
    • Constructor Summary

      Constructors 
      Constructor Description
      TextConfig​(int longWord, double averageLow, double averageHigh, int alphaSpacePercentage, int simplePercentage, int maxLength, String sentenceBreak, String wordBreak, String punctuation, String[] starts)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int getAlphaSpacePercentage()
      An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.
      double getAverageHigh()
      A reasonable upper bound for the average word length in the language.
      double getAverageLow()
      A reasonable lower bound for the average word length in the language.
      int getLongWord()
      The maximum length we expect any likely word to be in the target language.
      int getMaxLength()
      The maximum number of character to analyze in the input.
      String getPunctuation()
      The Punctuation characters.
      String getSentenceBreak()
      The Sentence Break characters.
      int getSimplePercentage()
      An estimate of the percentage of 'reasonable' characters that we expect to be present.
      Set<String> getStarts()
      The 'likely' set of two character initial stems.
      String getWordBreak()
      The Word Break characters.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • TextConfig

        public TextConfig​(int longWord,
                          double averageLow,
                          double averageHigh,
                          int alphaSpacePercentage,
                          int simplePercentage,
                          int maxLength,
                          String sentenceBreak,
                          String wordBreak,
                          String punctuation,
                          String[] starts)
    • Method Detail

      • getLongWord

        public int getLongWord()
        The maximum length we expect any likely word to be in the target language.
        Returns:
        The maximum length we expect any likely word to be in the target language.
      • getAverageLow

        public double getAverageLow()
        A reasonable lower bound for the average word length in the language.
        Returns:
        A reasonable lower bound for the average word length in the language.
      • getAverageHigh

        public double getAverageHigh()
        A reasonable upper bound for the average word length in the language.
        Returns:
        A reasonable upper bound for the average word length in the language.
      • getAlphaSpacePercentage

        public int getAlphaSpacePercentage()
        An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.
        Returns:
        An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.
      • getSimplePercentage

        public int getSimplePercentage()
        An estimate of the percentage of 'reasonable' characters that we expect to be present. Note: The reasonable characters are defined as the sum of: - alphas, digits (in digit only words), wordBreaks, spaces, and punctuation
        Returns:
        An estimate of the percentage of 'reasonable' characters that we expect to be present.
      • getMaxLength

        public int getMaxLength()
        The maximum number of character to analyze in the input.
        Returns:
        The maximum number of character to analyze in the input.
      • getSentenceBreak

        public String getSentenceBreak()
        The Sentence Break characters.
        Returns:
        The set of characters used to break paragraphs into sentences.
      • getWordBreak

        public String getWordBreak()
        The Word Break characters.
        Returns:
        The set of characters used to break sentences into words.
      • getPunctuation

        public String getPunctuation()
        The Punctuation characters.
        Returns:
        The set of characters recognized as punctuation.
      • getStarts

        public Set<String> getStarts()
        The 'likely' set of two character initial stems. For example, in English 'fo' is reasonable (for, form, foot, ...) whereas 'xz' is not, as no words start with xz.
        Returns:
        The 'likely' set of two character initial stems.