public class HyphenNormalizer extends java.lang.Object implements TextProcessor
| Constructor and Description |
|---|
HyphenNormalizer() |
| Modifier and Type | Method and Description |
|---|---|
static boolean |
isHyphenLike(java.lang.Integer codePoint)
Returns whether the given code point is a hyphen-like codepoint.
|
static java.lang.String |
normalizeHyphens(java.lang.String s)
Replaces hyphen like codepoints by ASCII "-", removes soft hyphens.
|
java.util.List<java.lang.String> |
preprocess(java.util.List<java.lang.String> tokens)
Applies the preprocessing defined to the given input tokens.
|
public static boolean isHyphenLike(java.lang.Integer codePoint)
codePoint - A unicode code point. (not a char!)public static java.lang.String normalizeHyphens(java.lang.String s)
s - input string to replace hyphens inpublic java.util.List<java.lang.String> preprocess(java.util.List<java.lang.String> tokens)
TextProcessorpreprocess in interface TextProcessortokens - the tokens created after the input text is tokenized