Package com.cobber.fta.text
Class TextConfig
Object
com.cobber.fta.text.TextConfig
Capture a set of key metrics for any given language.
-
Constructor Summary
ConstructorDescriptionTextConfig
(int longWord, double averageLow, double averageHigh, int alphaSpacePercentage, int simplePercentage, int maxLength, String sentenceBreak, String wordBreak, String punctuation, String[] starts) -
Method Summary
Modifier and TypeMethodDescriptionint
An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.double
A reasonable upper bound for the average word length in the language.double
A reasonable lower bound for the average word length in the language.int
The maximum length we expect any likely word to be in the target language.int
The maximum number of character to analyze in the input.The Punctuation characters.The Sentence Break characters.int
An estimate of the percentage of 'reasonable' characters that we expect to be present.The 'likely' set of two character initial stems.The Word Break characters.
-
Constructor Details
-
TextConfig
-
-
Method Details
-
getLongWord
public int getLongWord()The maximum length we expect any likely word to be in the target language.- Returns:
- The maximum length we expect any likely word to be in the target language.
-
getAverageLow
public double getAverageLow()A reasonable lower bound for the average word length in the language.- Returns:
- A reasonable lower bound for the average word length in the language.
-
getAverageHigh
public double getAverageHigh()A reasonable upper bound for the average word length in the language.- Returns:
- A reasonable upper bound for the average word length in the language.
-
getAlphaSpacePercentage
public int getAlphaSpacePercentage()An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.- Returns:
- An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.
-
getSimplePercentage
public int getSimplePercentage()An estimate of the percentage of 'reasonable' characters that we expect to be present. Note: The reasonable characters are defined as the sum of: - alphas, digits (in digit only words), wordBreaks, spaces, and punctuation- Returns:
- An estimate of the percentage of 'reasonable' characters that we expect to be present.
-
getMaxLength
public int getMaxLength()The maximum number of character to analyze in the input.- Returns:
- The maximum number of character to analyze in the input.
-
getSentenceBreak
The Sentence Break characters.- Returns:
- The set of characters used to break paragraphs into sentences.
-
getWordBreak
The Word Break characters.- Returns:
- The set of characters used to break sentences into words.
-
getPunctuation
The Punctuation characters.- Returns:
- The set of characters recognized as punctuation.
-
getStarts
The 'likely' set of two character initial stems. For example, in English 'fo' is reasonable (for, form, foot, ...) whereas 'xz' is not, as no words start with xz.- Returns:
- The 'likely' set of two character initial stems.
-