Package opennlp.tools.tokenize
Class SimpleTokenizer
- java.lang.Object
-
- opennlp.tools.tokenize.SimpleTokenizer
-
- All Implemented Interfaces:
Tokenizer
public class SimpleTokenizer extends java.lang.Object
Performs tokenization using character classes.
-
-
Field Summary
Fields Modifier and Type Field Description static SimpleTokenizer
INSTANCE
-
Constructor Summary
Constructors Constructor Description SimpleTokenizer()
Deprecated.Use INSTANCE field instead to obtain an instance, constructor will be made private in the future.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static void
main(java.lang.String[] args)
Deprecated.this method will be removed, use the new command line interface instead!java.lang.String[]
tokenize(java.lang.String s)
Splits a string into its atomic partsSpan[]
tokenizePos(java.lang.String s)
Finds the boundaries of atomic parts in a string.
-
-
-
Field Detail
-
INSTANCE
public static final SimpleTokenizer INSTANCE
-
-
Method Detail
-
tokenizePos
public Span[] tokenizePos(java.lang.String s)
Description copied from interface:Tokenizer
Finds the boundaries of atomic parts in a string.- Parameters:
s
- The string to be tokenized.- Returns:
- The Span[] with the spans (offsets into s) for each token as the individuals array elements.
-
main
@Deprecated public static void main(java.lang.String[] args) throws java.io.IOException
Deprecated.this method will be removed, use the new command line interface instead!- Parameters:
args
- the command line arguments- Throws:
java.io.IOException
- if reading or writing from stdin or stdout fails in anyway
-
tokenize
public java.lang.String[] tokenize(java.lang.String s)
Description copied from interface:Tokenizer
Splits a string into its atomic parts
-
-