public class AtomicTokenIndex extends AtomicIndex
AtomicIndex
implementation for indexing tokens.AtomicIndex.MG4JIndex, AtomicIndex.PostingsList
Modifier and Type | Field and Description |
---|---|
protected DocumentMetadataHelper[] |
docMetadataHelpers
An array of helpers for creating document metadata.
|
protected List<String> |
documentNonTokens
Stores the document non-tokens for writing to the zip collection;
|
protected List<String> |
documentTokens
Stores the document tokens for writing to the zip collection;
|
protected GATEDocumentFactory |
factory
GATE document factory used by the zip builder, and also to
translate field indexes to field names.
|
protected String |
featureName
The feature name corresponding to the field.
|
protected CharsetDecoder |
UTF8_CHARSET_DECODER |
protected CharsetEncoder |
UTF8_CHARSET_ENCODER |
protected boolean |
zipCollectionEnabled
Is this token index responsible for writing the zip collection?
|
additionalDirectProperties, additionalProperties, batches, batchWriteTask, compactIndexTask, currentTerm, DIRECT_INDEX_NAME_SUFFIX, DIRECT_TERMS_FILENAME, directIndex, directTermIds, directTerms, DOCUMENTS_QUEUE_FILE_NAME, documentsInRAM, documentSizesInRAM, hasDirectIndex, HEAD_FILE_NAME, HEAD_NEW_EXT, HEAD_OLD_EXT, indexDirectory, indexingThread, inputQueue, invertedIndex, maxDocSizeInRAM, name, occurrencesInRAM, outputQueue, parent, TAIL_FILE_NAME_PREFIX, TAILS_FILENAME_FILTER, termMap, termProcessor, tokenPosition
Constructor and Description |
---|
AtomicTokenIndex(MimirIndex parent,
String name,
boolean hasDirectIndex,
BlockingQueue<GATEDocument> inputQueue,
BlockingQueue<GATEDocument> outputQueue,
IndexConfig.TokenIndexerConfig config,
boolean zipCollection)
Creates a new atomic index for indexing tokens.
|
Modifier and Type | Method and Description |
---|---|
protected void |
calculateStartPositionForAnnotation(gate.Annotation ann,
GATEDocument gateDocument)
This indexer always adds one posting per token, so the start
position for the next annotation is always one more than the
previous one.
|
protected String[] |
calculateTermStringForAnnotation(gate.Annotation ann,
GATEDocument gateDocument)
For a token annotation, the "string" we index is the feature value
corresponding to the name of the field to index.
|
protected void |
documentEnding(GATEDocument gateDocument)
If zipping, inform the collection builder that we finished
the current document.
|
protected void |
documentStarting(GATEDocument gateDocument)
If zipping, inform the collection builder that a new document
is about to start.
|
protected void |
flush()
Overridden to close the zip collection builder.
|
protected gate.Annotation[] |
getAnnotsToProcess(GATEDocument gateDocument)
Get the token annotations from this document, in increasing
order of offset.
|
close, combineDirectIndexes, compactIndex, generateTermMap, getBatchCount, getDirectIndex, getDirectTerm, getDirectTermOccurenceCount, getDirectTerms, getIndex, getIndexDirectory, getInputQueue, getName, getOutputQueue, getParent, hasDirectIndex, indexCurrentTerm, initIndex, longToTerm, newBatch, openDirectIndexCluster, openInvertedIndexCluster, openSubIndex, processAnnotation, processDocument, requestCompactIndex, requestSyncToDisk, run, writeCurrentBatch, writeDirectIndex
protected final CharsetEncoder UTF8_CHARSET_ENCODER
protected final CharsetDecoder UTF8_CHARSET_DECODER
protected boolean zipCollectionEnabled
protected List<String> documentTokens
protected List<String> documentNonTokens
protected DocumentMetadataHelper[] docMetadataHelpers
protected GATEDocumentFactory factory
protected String featureName
public AtomicTokenIndex(MimirIndex parent, String name, boolean hasDirectIndex, BlockingQueue<GATEDocument> inputQueue, BlockingQueue<GATEDocument> outputQueue, IndexConfig.TokenIndexerConfig config, boolean zipCollection) throws IOException, IndexException
parent
- the top level MimirIndex
to which this new atomic
index belongs.name
- the name for the new atomic index. This will be used as the
name of the top level directory for this atomic index (which is a
sub-directory of the parent) and as a base name for all the files of this
atomic index.hasDirectIndex
- should a direct index be created as well.inputQueue
- the queue where documents are submitted for indexing;outputQueue
- the queue where indexed documents are returned to;IndexException
IOException
protected void documentStarting(GATEDocument gateDocument) throws IndexException
documentStarting
in class AtomicIndex
IndexException
protected void documentEnding(GATEDocument gateDocument) throws IndexException
documentEnding
in class AtomicIndex
IndexException
protected gate.Annotation[] getAnnotsToProcess(GATEDocument gateDocument)
getAnnotsToProcess
in class AtomicIndex
protected void calculateStartPositionForAnnotation(gate.Annotation ann, GATEDocument gateDocument)
calculateStartPositionForAnnotation
in class AtomicIndex
ann
- gateDocument
- protected String[] calculateTermStringForAnnotation(gate.Annotation ann, GATEDocument gateDocument) throws IndexException
calculateTermStringForAnnotation
in class AtomicIndex
ann
- gateDocument
- IndexException
protected void flush() throws IOException
flush
in class AtomicIndex
IOException
Copyright © 2021 GATE. All rights reserved.