Class SimpleTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class SimpleTokenizer
    extends java.lang.Object
    Performs tokenization using character classes.
    • Constructor Summary

      Constructors 
      Constructor Description
      SimpleTokenizer()
      Deprecated.
      Use INSTANCE field instead to obtain an instance, constructor will be made private in the future.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      static void main​(java.lang.String[] args)
      Deprecated.
      this method will be removed, use the new command line interface instead!
      java.lang.String[] tokenize​(java.lang.String s)
      Splits a string into its atomic parts
      Span[] tokenizePos​(java.lang.String s)
      Finds the boundaries of atomic parts in a string.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SimpleTokenizer

        @Deprecated
        public SimpleTokenizer()
        Deprecated.
        Use INSTANCE field instead to obtain an instance, constructor will be made private in the future.
    • Method Detail

      • tokenizePos

        public Span[] tokenizePos​(java.lang.String s)
        Description copied from interface: Tokenizer
        Finds the boundaries of atomic parts in a string.
        Parameters:
        s - The string to be tokenized.
        Returns:
        The Span[] with the spans (offsets into s) for each token as the individuals array elements.
      • main

        @Deprecated
        public static void main​(java.lang.String[] args)
                         throws java.io.IOException
        Deprecated.
        this method will be removed, use the new command line interface instead!
        Parameters:
        args - the command line arguments
        Throws:
        java.io.IOException - if reading or writing from stdin or stdout fails in anyway
      • tokenize

        public java.lang.String[] tokenize​(java.lang.String s)
        Description copied from interface: Tokenizer
        Splits a string into its atomic parts
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        s - The string to be tokenized.
        Returns:
        The String[] with the individual tokens as the array elements.