Package com.cobber.fta.text
Class TextConfig
- Object
-
- com.cobber.fta.text.TextConfig
-
public final class TextConfig extends Object
Capture a set of key metrics for any given language.
-
-
Constructor Summary
Constructors Constructor Description TextConfig(int longWord, double averageLow, double averageHigh, int alphaSpacePercentage, int simplePercentage, int maxLength, String sentenceBreak, String wordBreak, String punctuation, String[] starts)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getAlphaSpacePercentage()
An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.double
getAverageHigh()
A reasonable upper bound for the average word length in the language.double
getAverageLow()
A reasonable lower bound for the average word length in the language.int
getLongWord()
The maximum length we expect any likely word to be in the target language.int
getMaxLength()
The maximum number of character to analyze in the input.String
getPunctuation()
The Punctuation characters.String
getSentenceBreak()
The Sentence Break characters.int
getSimplePercentage()
An estimate of the percentage of 'reasonable' characters that we expect to be present.Set<String>
getStarts()
The 'likely' set of two character initial stems.String
getWordBreak()
The Word Break characters.
-
-
-
Method Detail
-
getLongWord
public int getLongWord()
The maximum length we expect any likely word to be in the target language.- Returns:
- The maximum length we expect any likely word to be in the target language.
-
getAverageLow
public double getAverageLow()
A reasonable lower bound for the average word length in the language.- Returns:
- A reasonable lower bound for the average word length in the language.
-
getAverageHigh
public double getAverageHigh()
A reasonable upper bound for the average word length in the language.- Returns:
- A reasonable upper bound for the average word length in the language.
-
getAlphaSpacePercentage
public int getAlphaSpacePercentage()
An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.- Returns:
- An estimate of the percentage of 'alpha' or space (isWhiteSpace()) characters that we expect to be present.
-
getSimplePercentage
public int getSimplePercentage()
An estimate of the percentage of 'reasonable' characters that we expect to be present. Note: The reasonable characters are defined as the sum of: - alphas, digits (in digit only words), wordBreaks, spaces, and punctuation- Returns:
- An estimate of the percentage of 'reasonable' characters that we expect to be present.
-
getMaxLength
public int getMaxLength()
The maximum number of character to analyze in the input.- Returns:
- The maximum number of character to analyze in the input.
-
getSentenceBreak
public String getSentenceBreak()
The Sentence Break characters.- Returns:
- The set of characters used to break paragraphs into sentences.
-
getWordBreak
public String getWordBreak()
The Word Break characters.- Returns:
- The set of characters used to break sentences into words.
-
getPunctuation
public String getPunctuation()
The Punctuation characters.- Returns:
- The set of characters recognized as punctuation.
-
getStarts
public Set<String> getStarts()
The 'likely' set of two character initial stems. For example, in English 'fo' is reasonable (for, form, foot, ...) whereas 'xz' is not, as no words start with xz.- Returns:
- The 'likely' set of two character initial stems.
-
-