Class WhitespaceTokenizer

All Implemented Interfaces:
Closeable, AutoCloseable

public final class WhitespaceTokenizer extends CharTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.

You must specify the required Version compatibility when creating WhitespaceTokenizer:

  • As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See CharTokenizer.isTokenChar(int) and CharTokenizer.normalize(int) for details.
  • Constructor Details

    • WhitespaceTokenizer

      public WhitespaceTokenizer(Version matchVersion, Reader in)
      Construct a new WhitespaceTokenizer. * @param matchVersion Lucene version to match See
      invalid @link
      {@link <a href="#version">above</a>
      }
      Parameters:
      in - the input to split up into tokens
    • WhitespaceTokenizer

      public WhitespaceTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)
      Construct a new WhitespaceTokenizer using a given AttributeSource.AttributeFactory.
      Parameters:
      matchVersion - Lucene version to match See
      invalid @link
      {@link <a href="#version">above</a>
      }
      factory - the attribute factory to use for this Tokenizer
      in - the input to split up into tokens