|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.lucene.analysis.Analyzer
public abstract class Analyzer
An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.
In order to define what analysis is done, subclasses must define their
Analyzer.TokenStreamComponents in createComponents(String, Reader).
The components are then reused in each call to tokenStream(String, Reader).
| Nested Class Summary | |
|---|---|
static class |
Analyzer.GlobalReuseStrategy
Implementation of Analyzer.ReuseStrategy that reuses the same components for
every field. |
static class |
Analyzer.PerFieldReuseStrategy
Implementation of Analyzer.ReuseStrategy that reuses components per-field by
maintaining a Map of TokenStreamComponent per field name. |
static class |
Analyzer.ReuseStrategy
Strategy defining how TokenStreamComponents are reused per call to tokenStream(String, java.io.Reader). |
static class |
Analyzer.TokenStreamComponents
This class encapsulates the outer components of a token stream. |
| Constructor Summary | |
|---|---|
Analyzer()
|
|
Analyzer(Analyzer.ReuseStrategy reuseStrategy)
|
|
| Method Summary | |
|---|---|
void |
close()
Frees persistent resources used by this Analyzer |
protected abstract Analyzer.TokenStreamComponents |
createComponents(String fieldName,
Reader reader)
Creates a new Analyzer.TokenStreamComponents instance for this analyzer. |
int |
getOffsetGap(IndexableField field)
Just like getPositionIncrementGap(java.lang.String), except for
Token offsets instead. |
int |
getPositionIncrementGap(String fieldName)
Invoked before indexing a IndexableField instance if terms have already been added to that field. |
protected Reader |
initReader(String fieldName,
Reader reader)
Override this if you want to add a CharFilter chain. |
TokenStream |
tokenStream(String fieldName,
Reader reader)
Creates a TokenStream that is allowed to be re-use from the previous time that the same thread called this method. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public Analyzer()
public Analyzer(Analyzer.ReuseStrategy reuseStrategy)
| Method Detail |
|---|
protected abstract Analyzer.TokenStreamComponents createComponents(String fieldName,
Reader reader)
Analyzer.TokenStreamComponents instance for this analyzer.
fieldName - the name of the fields content passed to the
Analyzer.TokenStreamComponents sink as a readerreader - the reader passed to the Tokenizer constructor
Analyzer.TokenStreamComponents for this analyzer.
public final TokenStream tokenStream(String fieldName,
Reader reader)
throws IOException
This method uses createComponents(String, Reader) to obtain an
instance of Analyzer.TokenStreamComponents. It returns the sink of the
components and stores the components internally. Subsequent calls to this
method will reuse the previously stored components after resetting them
through Analyzer.TokenStreamComponents.reset(Reader).
fieldName - the name of the field the created TokenStream is used forreader - the reader the streams source reads from
IOException
protected Reader initReader(String fieldName,
Reader reader)
public int getPositionIncrementGap(String fieldName)
fieldName - IndexableField name being indexed.
tokenStream(String,Reader)public int getOffsetGap(IndexableField field)
getPositionIncrementGap(java.lang.String), except for
Token offsets instead. By default this returns 1 for
tokenized fields and, as if the fields were joined
with an extra space character, and 0 for un-tokenized
fields. This method is only called if the field
produced at least one token for indexing.
field - the field just indexed
tokenStream(String,Reader)public void close()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||