Class WhitespaceTokenizer

java.lang.Object
opennlp.tools.tokenize.WhitespaceTokenizer
All Implemented Interfaces:
Tokenizer

public class WhitespaceTokenizer extends Object
This tokenizer uses white spaces to tokenize the input text. To obtain an instance of this tokenizer use the static final INSTANCE field.
  • Field Details

  • Method Details

    • tokenizePos

      public Span[] tokenizePos(String d)
      Description copied from interface: Tokenizer
      Finds the boundaries of atomic parts in a string.
      Parameters:
      d - The string to be tokenized.
      Returns:
      The Span[] with the spans (offsets into s) for each token as the individuals array elements.
    • tokenize

      public String[] tokenize(String s)
      Description copied from interface: Tokenizer
      Splits a string into its atomic parts
      Specified by:
      tokenize in interface Tokenizer
      Parameters:
      s - The string to be tokenized.
      Returns:
      The String[] with the individual tokens as the array elements.