public class HyphenNormalizer extends java.lang.Object implements TextProcessor
Constructor and Description |
---|
HyphenNormalizer() |
Modifier and Type | Method and Description |
---|---|
static boolean |
isHyphenLike(java.lang.Integer codePoint)
Returns whether the given code point is a hyphen-like codepoint.
|
static java.lang.String |
normalizeHyphens(java.lang.String s)
Replaces hyphen like codepoints by ASCII "-", removes soft hyphens.
|
java.util.List<java.lang.String> |
preprocess(java.util.List<java.lang.String> tokens)
Applies the preprocessing defined to the given input tokens.
|
public static boolean isHyphenLike(java.lang.Integer codePoint)
codePoint
- A unicode code point. (not a char!)public static java.lang.String normalizeHyphens(java.lang.String s)
s
- input string to replace hyphens inpublic java.util.List<java.lang.String> preprocess(java.util.List<java.lang.String> tokens)
preprocess
in interface TextProcessor
tokens
- the tokens created after the input text is tokenized