public class AtomicAnnotationIndex extends AtomicIndex
AtomicIndex.MG4JIndex, AtomicIndex.PostingsList
Modifier and Type | Field and Description |
---|---|
protected Map<String,SemanticAnnotationHelper> |
annotationHelpers
Helpers for each semantic annotation type.
|
protected List<SemanticAnnotationHelper> |
documentHelpers |
protected IndexConfig |
indexConfig
The
IndexConfig used by the MimirIndex that contains this
mentions index. |
protected gate.util.OffsetComparator |
offsetComparator
An
OffsetComparator used to sort the annotations by offset before
indexing. |
protected IndexConfig.SemanticIndexerConfig |
semIdxConfid |
additionalDirectProperties, additionalProperties, batches, batchWriteTask, compactIndexTask, currentTerm, DIRECT_INDEX_NAME_SUFFIX, DIRECT_TERMS_FILENAME, directIndex, directTermIds, directTerms, DOCUMENTS_QUEUE_FILE_NAME, documentsInRAM, documentSizesInRAM, hasDirectIndex, HEAD_FILE_NAME, HEAD_NEW_EXT, HEAD_OLD_EXT, indexDirectory, indexingThread, inputQueue, invertedIndex, maxDocSizeInRAM, name, occurrencesInRAM, outputQueue, parent, TAIL_FILE_NAME_PREFIX, TAILS_FILENAME_FILTER, termMap, termProcessor, tokenPosition
Constructor and Description |
---|
AtomicAnnotationIndex(MimirIndex parent,
String name,
boolean hasDirectIndex,
BlockingQueue<GATEDocument> inputQueue,
BlockingQueue<GATEDocument> outputQueue,
IndexConfig.SemanticIndexerConfig siConfig)
Creates a new atomic index for indexing annotations.
|
Modifier and Type | Method and Description |
---|---|
protected void |
calculateStartPositionForAnnotation(gate.Annotation ann,
GATEDocument gateDocument)
Calculate the starting position for the given annotation, storing
it in
AtomicIndex.tokenPosition . |
protected String[] |
calculateTermStringForAnnotation(gate.Annotation ann,
GATEDocument gateDocument)
Determine the string (or strings, if there are alternatives) that should
be stored in the index for the given annotation.
|
protected void |
documentEnding(GATEDocument gateDocument)
Hook for subclasses, called after annotations for this document
have been processed.
|
protected void |
documentStarting(GATEDocument gateDocument)
Hook for subclasses, called before processing the annotations
for this document.
|
protected void |
flush()
Closes all file-based resources.
|
protected gate.Annotation[] |
getAnnotsToProcess(GATEDocument gateDocument)
Get the annotations that are to be processed for a document,
in increasing order of offset.
|
close, combineDirectIndexes, compactIndex, generateTermMap, getBatchCount, getDirectIndex, getDirectTerm, getDirectTermOccurenceCount, getDirectTerms, getIndex, getIndexDirectory, getInputQueue, getName, getOutputQueue, getParent, hasDirectIndex, indexCurrentTerm, initIndex, longToTerm, newBatch, openDirectIndexCluster, openInvertedIndexCluster, openSubIndex, processAnnotation, processDocument, requestCompactIndex, requestSyncToDisk, run, writeCurrentBatch, writeDirectIndex
protected IndexConfig indexConfig
IndexConfig
used by the MimirIndex
that contains this
mentions index.protected IndexConfig.SemanticIndexerConfig semIdxConfid
protected Map<String,SemanticAnnotationHelper> annotationHelpers
protected List<SemanticAnnotationHelper> documentHelpers
protected gate.util.OffsetComparator offsetComparator
OffsetComparator
used to sort the annotations by offset before
indexing.public AtomicAnnotationIndex(MimirIndex parent, String name, boolean hasDirectIndex, BlockingQueue<GATEDocument> inputQueue, BlockingQueue<GATEDocument> outputQueue, IndexConfig.SemanticIndexerConfig siConfig) throws IOException, IndexException
parent
- the top level MimirIndex
to which this new atomic
index belongs.name
- the name for the new atomic index. This will be used as the
name of the top level directory for this atomic index (which is a
sub-directory of the parent) and as a base name for all the files of this
atomic index.hasDirectIndex
- should a direct index be created as well.inputQueue
- the queue where documents are submitted for indexing;outputQueue
- the queue where indexed documents are returned to;IndexException
IOException
protected void documentStarting(GATEDocument gateDocument) throws IndexException
AtomicIndex
documentStarting
in class AtomicIndex
IndexException
protected void documentEnding(GATEDocument gateDocument) throws IndexException
AtomicIndex
documentEnding
in class AtomicIndex
IndexException
protected gate.Annotation[] getAnnotsToProcess(GATEDocument gateDocument) throws IndexException
AtomicIndex
getAnnotsToProcess
in class AtomicIndex
IndexException
protected void calculateStartPositionForAnnotation(gate.Annotation ann, GATEDocument gateDocument) throws IndexException
AtomicIndex
AtomicIndex.tokenPosition
. The starting position is the
index of the token within the document where the annotation starts,
and must be >= the previous value of tokenPosition.calculateStartPositionForAnnotation
in class AtomicIndex
IndexException
protected String[] calculateTermStringForAnnotation(gate.Annotation ann, GATEDocument gateDocument) throws IndexException
AtomicIndex
AtomicIndex.currentTerm
, in which case null
should
be returned instead.
If the current term should not be indexed (e.g. it's a stop word), then
the implementation should return an empty String array.calculateTermStringForAnnotation
in class AtomicIndex
IndexException
protected void flush() throws IOException
AtomicIndex
flush
in class AtomicIndex
IOException
Copyright © 2021 GATE. All rights reserved.