Analyzer (Lucene 4.0.0-ALPHA API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis
Class Analyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer

Direct Known Subclasses:: AnalyzerWrapper

public abstract class Analyzer
extends Object
extends Object

An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.

In order to define what analysis is done, subclasses must define their Analyzer.TokenStreamComponents in createComponents(String, Reader). The components are then reused in each call to tokenStream(String, Reader).

Nested Class Summary
`static class`	`Analyzer.GlobalReuseStrategy` Implementation of `Analyzer.ReuseStrategy` that reuses the same components for every field.
`static class`	`Analyzer.PerFieldReuseStrategy` Implementation of `Analyzer.ReuseStrategy` that reuses components per-field by maintaining a Map of TokenStreamComponent per field name.
`static class`	`Analyzer.ReuseStrategy` Strategy defining how TokenStreamComponents are reused per call to `tokenStream(String, java.io.Reader)`.
`static class`	`Analyzer.TokenStreamComponents` This class encapsulates the outer components of a token stream.

Constructor Summary
`Analyzer()`
`Analyzer(Analyzer.ReuseStrategy reuseStrategy)`

Method Summary
`void`	`close()` Frees persistent resources used by this Analyzer
`protected abstract Analyzer.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)` Creates a new `Analyzer.TokenStreamComponents` instance for this analyzer.
`int`	`getOffsetGap(IndexableField field)` Just like `getPositionIncrementGap(java.lang.String)`, except for Token offsets instead.
`int`	`getPositionIncrementGap(String fieldName)` Invoked before indexing a IndexableField instance if terms have already been added to that field.
`protected Reader`	`initReader(String fieldName, Reader reader)` Override this if you want to add a CharFilter chain.
`TokenStream`	`tokenStream(String fieldName, Reader reader)` Creates a TokenStream that is allowed to be re-use from the previous time that the same thread called this method.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

Analyzer

public Analyzer()

Analyzer

public Analyzer(Analyzer.ReuseStrategy reuseStrategy)

Method Detail

createComponents

protected abstract Analyzer.TokenStreamComponents createComponents(String fieldName,
                                                                   Reader reader)

Creates a new Analyzer.TokenStreamComponents instance for this analyzer.

Parameters:: fieldName - the name of the fields content passed to the Analyzer.TokenStreamComponents sink as a reader; reader - the reader passed to the Tokenizer constructor
Returns:: the Analyzer.TokenStreamComponents for this analyzer.

tokenStream

public final TokenStream tokenStream(String fieldName,
                                     Reader reader)
                              throws IOException

Creates a TokenStream that is allowed to be re-use from the previous time that the same thread called this method. Callers that do not need to use more than one TokenStream at the same time from this analyzer should use this method for better performance.

This method uses createComponents(String, Reader) to obtain an instance of Analyzer.TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through Analyzer.TokenStreamComponents.reset(Reader).

Parameters:: fieldName - the name of the field the created TokenStream is used for; reader - the reader the streams source reads from
Throws:: IOException

initReader

protected Reader initReader(String fieldName,
                            Reader reader)

Override this if you want to add a CharFilter chain.

getPositionIncrementGap

public int getPositionIncrementGap(String fieldName)

Invoked before indexing a IndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IndexableField instance boundaries.

Parameters:: fieldName - IndexableField name being indexed.
Returns:: position increment gap, added to the next token emitted from tokenStream(String,Reader)

getOffsetGap

public int getOffsetGap(IndexableField field)

Just like getPositionIncrementGap(java.lang.String), except for Token offsets instead. By default this returns 1 for tokenized fields and, as if the fields were joined with an extra space character, and 0 for un-tokenized fields. This method is only called if the field produced at least one token for indexing.

Parameters:: field - the field just indexed
Returns:: offset gap, added to the next token emitted from tokenStream(String,Reader)

close

public void close()

Frees persistent resources used by this Analyzer

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis Class Analyzer

Analyzer

Analyzer

createComponents

tokenStream

initReader

getPositionIncrementGap

getOffsetGap

close

org.apache.lucene.analysis
Class Analyzer