org.apache.lucene.analysis.core.WhitespaceTokenizer

public final class WhitespaceTokenizer extends CharTokenizer

A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.

As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See CharTokenizer.isTokenChar(int) and CharTokenizer.normalize(int) for details.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
Constructor Summary

Constructors

Constructor

Description

WhitespaceTokenizer(Version matchVersion, Reader in)

Construct a new WhitespaceTokenizer.

WhitespaceTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)

Construct a new WhitespaceTokenizer using a given AttributeSource.AttributeFactory.
Method Summary

Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
end, incrementToken, reset

Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, setReader

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

Class WhitespaceTokenizer

Nested Class Summary