org.apache.lucene.analysis.ngram.EdgeNGramTokenizer

All Implemented Interfaces:: Closeable, AutoCloseable

public class EdgeNGramTokenizer extends NGramTokenizer

Tokenizes the input from an edge into n-grams of given size(s).

This Tokenizer create n-grams from the beginning edge or ending edge of a input token.

As of Lucene 4.4, this tokenizer

can handle maxGram larger than 1024 chars, but beware that this will result in increased memory usage

doesn't trim the input,

sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones

doesn't support backward n-grams anymore.

supports pre-tokenization,

correctly handles supplementary characters.

Although highly discouraged, it is still possible to use the old behavior through Lucene43EdgeNGramTokenizer.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_MAX_GRAM_SIZE

static final int

DEFAULT_MIN_GRAM_SIZE

Fields inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer
DEFAULT_MAX_NGRAM_SIZE, DEFAULT_MIN_NGRAM_SIZE
Constructor Summary

Constructors

Constructor

Description

EdgeNGramTokenizer(Version version, Reader input, int minGram, int maxGram)

Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range

EdgeNGramTokenizer(Version version, AttributeSource.AttributeFactory factory, Reader input, int minGram, int maxGram)

Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
Method Summary

Methods inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer
end, incrementToken, reset

Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, setReader

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

Field Details
- DEFAULT_MAX_GRAM_SIZE
  
  public static final int DEFAULT_MAX_GRAM_SIZE
  See Also:
  
  Constant Field Values
- DEFAULT_MIN_GRAM_SIZE
  
  public static final int DEFAULT_MIN_GRAM_SIZE
  See Also:
  
  Constant Field Values
Constructor Details
- EdgeNGramTokenizer
  
  public EdgeNGramTokenizer(Version version, Reader input, int minGram, int maxGram)
  
  Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
  
  Parameters:
  
  version - the Lucene match version
  
  input - Reader holding the input to be tokenized
  
  minGram - the smallest n-gram to generate
  
  maxGram - the largest n-gram to generate
- EdgeNGramTokenizer
  
  public EdgeNGramTokenizer(Version version, AttributeSource.AttributeFactory factory, Reader input, int minGram, int maxGram)
  
  Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
  
  Parameters:
  
  version - the Lucene match version
  
  factory - AttributeSource.AttributeFactory to use
  
  input - Reader holding the input to be tokenized
  
  minGram - the smallest n-gram to generate
  
  maxGram - the largest n-gram to generate

Class EdgeNGramTokenizer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer

Methods inherited from class org.apache.lucene.analysis.Tokenizer

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Field Details

DEFAULT_MAX_GRAM_SIZE

DEFAULT_MIN_GRAM_SIZE

Constructor Details

EdgeNGramTokenizer

EdgeNGramTokenizer