LowerCaseTokenizer (The Adobe Experience Manager SDK 2020.6.3717.20200611T200904Z-200604)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.util.CharTokenizer
      - org.apache.lucene.analysis.core.LetterTokenizer
        
        org.apache.lucene.analysis.core.LowerCaseTokenizer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public final class LowerCaseTokenizer
extends LetterTokenizer
```
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

You must specify the required Version compatibility when creating LowerCaseTokenizer:
- As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See CharTokenizer.isTokenChar(int) and CharTokenizer.normalize(int) for details.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.AttributeFactory, AttributeSource.State

Constructor Summary

Constructors
Constructor and Description
`LowerCaseTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)` Construct a new LowerCaseTokenizer using a given `AttributeSource.AttributeFactory`.
`LowerCaseTokenizer(Version matchVersion, Reader in)` Construct a new LowerCaseTokenizer.

Method Summary
- Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
  end, incrementToken, reset
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, setReader
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
- Methods inherited from class java.lang.Object
  getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - LowerCaseTokenizer
```
public LowerCaseTokenizer(Version matchVersion,
                          Reader in)
```
    Construct a new LowerCaseTokenizer.
    
    Parameters:
    
    matchVersion - Lucene version to match See above
    
    in - the input to split up into tokens
  - LowerCaseTokenizer
```
public LowerCaseTokenizer(Version matchVersion,
                          AttributeSource.AttributeFactory factory,
                          Reader in)
```
    Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.
    
    Parameters:
    
    matchVersion - Lucene version to match See above
    
    factory - the attribute factory to use for this Tokenizer
    
    in - the input to split up into tokens

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2010 - 2020 Adobe. All Rights Reserved