Package org.apache.lucene.analysis.core
Class WhitespaceTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.CharTokenizer
org.apache.lucene.analysis.core.WhitespaceTokenizer
- All Implemented Interfaces:
Closeable
,AutoCloseable
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
Adjacent sequences of non-Whitespace characters form tokens.
You must specify the required Version
compatibility when creating
WhitespaceTokenizer
:
- As of 3.1,
CharTokenizer
uses an int based API to normalize and detect token characters. SeeCharTokenizer.isTokenChar(int)
andCharTokenizer.normalize(int)
for details.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
Constructor Summary
ConstructorsConstructorDescriptionWhitespaceTokenizer
(Version matchVersion, Reader in) Construct a new WhitespaceTokenizer.WhitespaceTokenizer
(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in) Construct a new WhitespaceTokenizer using a givenAttributeSource.AttributeFactory
. -
Method Summary
Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
end, incrementToken, reset
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
Constructor Details
-
WhitespaceTokenizer
Construct a new WhitespaceTokenizer. * @param matchVersion Lucene version to match Seeinvalid @link
{@link <a href="#version">above</a>
- Parameters:
in
- the input to split up into tokens
-
WhitespaceTokenizer
public WhitespaceTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in) Construct a new WhitespaceTokenizer using a givenAttributeSource.AttributeFactory
.- Parameters:
matchVersion
- Lucene version to match Seeinvalid @link
{@link <a href="#version">above</a>
factory
- the attribute factory to use for thisTokenizer
in
- the input to split up into tokens
-