org.apache.lucene.analysis.core.LowerCaseTokenizer

All Implemented Interfaces:: Closeable, AutoCloseable

public final class LowerCaseTokenizer extends LetterTokenizer

LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

You must specify the required Version compatibility when creating LowerCaseTokenizer:

As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See CharTokenizer.isTokenChar(int) and CharTokenizer.normalize(int) for details.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
Constructor Summary

Constructors

Constructor

Description

LowerCaseTokenizer(Version matchVersion, Reader in)

Construct a new LowerCaseTokenizer.

LowerCaseTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)

Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.
Method Summary

Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
end, incrementToken, reset

Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, setReader

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

Constructor Details
- LowerCaseTokenizer
  
  public LowerCaseTokenizer(Version matchVersion, Reader in)
  
  Construct a new LowerCaseTokenizer.
  Parameters:
  
  matchVersion - Lucene version to match See
  
  invalid @link
  
  {@link <a href="#version">above</a>
  
  }
  
  in - the input to split up into tokens
- LowerCaseTokenizer
  
  public LowerCaseTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)
  
  Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.
  Parameters:
  
  matchVersion - Lucene version to match See
  
  invalid @link
  
  {@link <a href="#version">above</a>
  
  }
  
  factory - the attribute factory to use for this Tokenizer
  
  in - the input to split up into tokens

Class LowerCaseTokenizer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer

Methods inherited from class org.apache.lucene.analysis.Tokenizer

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Constructor Details

LowerCaseTokenizer

LowerCaseTokenizer