public class UnicodeNormalizer extends java.lang.Object implements TextProcessor
Modifier and Type | Field and Description |
---|---|
static java.text.Normalizer.Form |
DEFAULT_FORM |
Constructor and Description |
---|
UnicodeNormalizer()
Default version of the Unicode Normalizer using NFKC normal form.
|
UnicodeNormalizer(java.text.Normalizer.Form normalForm)
Unicode normalizer with a configurable normal form.
|
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
normalizeDefault(java.lang.String s)
Normalizes a String using a sensible default normal form.
|
java.util.List<java.lang.String> |
preprocess(java.util.List<java.lang.String> tokens)
Applies the preprocessing defined to the given input tokens.
|
public UnicodeNormalizer(java.text.Normalizer.Form normalForm)
normalForm
- The normal form to use.public UnicodeNormalizer()
public static java.lang.String normalizeDefault(java.lang.String s)
s
- Any non-null stringpublic java.util.List<java.lang.String> preprocess(java.util.List<java.lang.String> tokens)
preprocess
in interface TextProcessor
tokens
- the tokens created after the input text is tokenized