AtomicIndex (mimir-core 6.2 API)

java.lang.Object
- gate.mimir.index.AtomicIndex

All Implemented Interfaces:

Runnable

Direct Known Subclasses:

AtomicAnnotationIndex, AtomicTokenIndex
```
public abstract class AtomicIndex
extends Object
implements Runnable
```
An inverted index associating terms with documents. Terms can be either token feature values, or annotations. Optionally, a direct index may also be present.

An atomic index manages a head index (the principal data) and a set of tail indexes (batches containing updates). Additionally, the data representing all the new documents that have been queued for indexing since the last tail was written are stored in RAM.

When direct indexing is enabled, the term IDs in the direct index are different from the term IDs in the inverted index. In the inverted index the term IDs are their position in the lexicographically sorted list of all terms. In the directed index, the term IDs are their position in the list sorted by the time they were first seen during indexing.

The head and tail batches can be combined into a new head by a compact operation.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`protected static class`	`AtomicIndex.MG4JIndex` Class representing an MG4J index batch, such as the head or any of the tails.
`protected static class`	`AtomicIndex.PostingsList` An in-RAM representation of a postings list

Field Summary

Fields
Modifier and Type	Field and Description
`protected it.unimi.dsi.util.Properties`	`additionalDirectProperties` A set of properties added to the ones obtained from the direct index writer when writing out batches.
`protected it.unimi.dsi.util.Properties`	`additionalProperties` A set of properties added to the ones obtained from the index writer when writing out batches.
`protected List<AtomicIndex.MG4JIndex>`	`batches` A list containing the head and tails of this index.
`protected RunnableFuture<Long>`	`batchWriteTask` If a request was made to write the in-RAM index data to disk this value will be not null.
`protected RunnableFuture<Void>`	`compactIndexTask` If a request was made to compress the index (combine all sub-indexes into a new head) this value will be non-null.
`protected it.unimi.dsi.lang.MutableString`	`currentTerm` A mutable string used to create instances of MutableString on the cheap.
`static String`	`DIRECT_INDEX_NAME_SUFFIX` FIles belonging to teh direct index get this suffix added to their basename.
`static String`	`DIRECT_TERMS_FILENAME`
`protected it.unimi.di.big.mg4j.index.Index`	`directIndex` The direct index for this atomic index.
`protected it.unimi.dsi.fastutil.objects.Object2LongMap<String>`	`directTermIds` This map associates direct index terms with their IDs.
`protected it.unimi.dsi.fastutil.objects.ObjectBigList<String>`	`directTerms` The terms in the direct index, in the order they were first seen during indexing.
`static String`	`DOCUMENTS_QUEUE_FILE_NAME` The file name (under the current directory for this atomic index) for the directory containing the documents that have been queued for indexing, but not yet indexed.
`protected int`	`documentsInRAM` The number of documents currently stored in RAM.
`protected it.unimi.dsi.fastutil.ints.IntArrayList`	`documentSizesInRAM` The sizes (numbers of terms) for all the documents indexed in RAM.
`protected boolean`	`hasDirectIndex` Is the direct indexing enabled? Direct indexes are used to find terms occurring in given documents.
`static String`	`HEAD_FILE_NAME` The file name (under the current directory for this atomic index) which stores the principal index.
`static String`	`HEAD_NEW_EXT` The file extension used for the temporary directory where the updated head is being built.
`static String`	`HEAD_OLD_EXT` The file extension used for the temporary directory where the old head index is being stored while the newly updated one is being installed.
`protected File`	`indexDirectory` The directory where this atomic index stores its files.
`protected Thread`	`indexingThread` The single thread used to index documents.
`protected BlockingQueue<GATEDocument>`	`inputQueue` Documents to be indexed are queued in this queue.
`protected it.unimi.di.big.mg4j.index.Index`	`invertedIndex` The cluster-view of all the MG4J indexes that are part of this index (i.e.
`protected int`	`maxDocSizeInRAM` The size (number of terms) for the longest document indexed but not yet saved.
`protected String`	`name` The name of this atomic index.
`protected long`	`occurrencesInRAM` The number of occurrences represented in RAM and not yet written to disk.
`protected BlockingQueue<GATEDocument>`	`outputQueue` Documents that have been indexed are passed on to this queue.
`protected MimirIndex`	`parent` The `MimirIndex` that this atomic index is a member of.
`static String`	`TAIL_FILE_NAME_PREFIX` The prefix used for file names (under the current directory for this atomic index) for updates to the head index.
`protected static com.google.common.io.PatternFilenameFilter`	`TAILS_FILENAME_FILTER`
`protected it.unimi.dsi.fastutil.objects.Object2ReferenceOpenHashMap<it.unimi.dsi.lang.MutableString,AtomicIndex.PostingsList>`	`termMap` An in-memory inverted index that gets dumped to files for each batch.
`protected it.unimi.di.big.mg4j.index.TermProcessor`	`termProcessor` The term processor used to process the feature values being indexed.
`protected int`	`tokenPosition` The position of the current (or most-recently used) token in the current document.

Constructor Summary

Constructors
Modifier	Constructor and Description
`protected`	`AtomicIndex(MimirIndex parent, String name, boolean hasDirectIndex, it.unimi.di.big.mg4j.index.TermProcessor termProcessor, BlockingQueue<GATEDocument> inputQueue, BlockingQueue<GATEDocument> outputQueue)` Creates a new AtomicIndex

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected abstract void`	`calculateStartPositionForAnnotation(gate.Annotation ann, GATEDocument gateDocument)` Calculate the starting position for the given annotation, storing it in `tokenPosition`.
`protected abstract String[]`	`calculateTermStringForAnnotation(gate.Annotation ann, GATEDocument gateDocument)` Determine the string (or strings, if there are alternatives) that should be stored in the index for the given annotation.
`void`	`close()` Notifies this index to stop its indexing operations, and waits for all data to be written.
`protected static void`	`combineDirectIndexes(List<AtomicIndex.MG4JIndex> inputIndexes, String outputBasename)` Given a set of direct indexes (MG4J indexes, with counts, but no positions, that form a lexical cluster) this method produces one single output index containing the data from all the input indexes.
`protected void`	`compactIndex()` Combines all the currently existing batches, generating a new head index.
`protected void`	`documentEnding(GATEDocument gateDocument)` Hook for subclasses, called after annotations for this document have been processed.
`protected void`	`documentStarting(GATEDocument gateDocument)` Hook for subclasses, called before processing the annotations for this document.
`protected abstract void`	`flush()` Closes all file-based resources.
`static void`	`generateTermMap(File termsFile, File termmapFile, File bloomFilterFile)` Given a terms file (text file with one term per line) this method generates the corresponding termmap file (binary representation of a StringMap).
`protected abstract gate.Annotation[]`	`getAnnotsToProcess(GATEDocument gateDocument)` Get the annotations that are to be processed for a document, in increasing order of offset.
`int`	`getBatchCount()` Returns the number of batches in this atomic index.
`it.unimi.di.big.mg4j.index.Index`	`getDirectIndex()` Gets the direct index for this atomic index.
`CharSequence`	`getDirectTerm(long termId)` Gets the term string for a given direct term ID.
`long`	`getDirectTermOccurenceCount(long directTermId)` Gets the occurrence count in the whole index for a given direct term, specified by a direct term ID (which must have been obtained from the direct index of this index).
`it.unimi.dsi.fastutil.objects.ObjectBigList<? extends CharSequence>`	`getDirectTerms()` Gets the list of direct terms for this index.
`it.unimi.di.big.mg4j.index.Index`	`getIndex()` Gets the inverted index (an `Index` value) that can be used to search this atomic index.
`File`	`getIndexDirectory()` Gets the top level directory for this atomic index.
`BlockingQueue<GATEDocument>`	`getInputQueue()` Gets the input queue used by this atomic index.
`String`	`getName()` Gets the name of this atomic index.
`BlockingQueue<GATEDocument>`	`getOutputQueue()` Gets the output queue used by this atomic index.
`MimirIndex`	`getParent()` Gets the top level `MimirIndex` to which this atomic index belongs.
`boolean`	`hasDirectIndex()` Is a direct index configured for this atomic index.
`protected void`	`indexCurrentTerm()` Adds the value in `currentTerm` to the index.
`protected void`	`initIndex()` Opens the index and prepares it for indexing and searching.
`static String`	`longToTerm(long value)` Converts a long value into a String containing a zero-padded Hex representation of the input value.
`protected void`	`newBatch()` Starts a new MG4J batch.
`protected static it.unimi.di.big.mg4j.index.Index`	`openDirectIndexCluster(List<AtomicIndex.MG4JIndex> batches)` Opens the direct index files from all the batches and combines them into a `LexicalCluster`.
`protected static it.unimi.di.big.mg4j.index.Index`	`openInvertedIndexCluster(List<AtomicIndex.MG4JIndex> batches, it.unimi.di.big.mg4j.index.TermProcessor termProcessor)` Creates a documental cluster from a list of `AtomicIndex.MG4JIndex` values.
`protected AtomicIndex.MG4JIndex`	`openSubIndex(String subIndexDirname)` Opens one sub-index, specified as a directory inside this Atomic Index's index directory.
`protected void`	`processAnnotation(gate.Annotation ann, GATEDocument gateDocument)` Indexes one annotation (either a Token or a semantic annotation).
`protected void`	`processDocument(GATEDocument gateDocument)` Adds the supplied document to the in-RAM index.
`Future<Void>`	`requestCompactIndex()` Requests this atomic index to compact its on-disk batches into a single batch.
`Future<Long>`	`requestSyncToDisk()` Instructs this index to dump to disk all the in-RAM index data at the fist opportunity.
`void`	`run()` Runnable implementation: the logic of this run method is simply indexing documents queued to the input queue.
`protected long`	`writeCurrentBatch()` Writes all the data currently stored in RAM to a new index batch.
`protected void`	`writeDirectIndex(File batchDir)` Writes the in-RAM data to a new direct index batch.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - HEAD_FILE_NAME
```
public static final String HEAD_FILE_NAME
```
    The file name (under the current directory for this atomic index) which stores the principal index.
    
    See Also:
    
    Constant Field Values
  - HEAD_NEW_EXT
```
public static final String HEAD_NEW_EXT
```
    The file extension used for the temporary directory where the updated head is being built.
    
    See Also:
    
    Constant Field Values
  - HEAD_OLD_EXT
```
public static final String HEAD_OLD_EXT
```
    The file extension used for the temporary directory where the old head index is being stored while the newly updated one is being installed.
    
    See Also:
    
    Constant Field Values
  - TAIL_FILE_NAME_PREFIX
```
public static final String TAIL_FILE_NAME_PREFIX
```
    The prefix used for file names (under the current directory for this atomic index) for updates to the head index.
    
    See Also:
    
    Constant Field Values
  - DIRECT_TERMS_FILENAME
```
public static final String DIRECT_TERMS_FILENAME
```
    See Also:
    
    Constant Field Values
  - DIRECT_INDEX_NAME_SUFFIX
```
public static final String DIRECT_INDEX_NAME_SUFFIX
```
    FIles belonging to teh direct index get this suffix added to their basename.
    
    See Also:
    
    Constant Field Values
  - DOCUMENTS_QUEUE_FILE_NAME
```
public static final String DOCUMENTS_QUEUE_FILE_NAME
```
    The file name (under the current directory for this atomic index) for the directory containing the documents that have been queued for indexing, but not yet indexed.
    
    See Also:
    
    Constant Field Values
  - TAILS_FILENAME_FILTER
```
protected static final com.google.common.io.PatternFilenameFilter TAILS_FILENAME_FILTER
```
  - name
```
protected String name
```
    The name of this atomic index.
  - indexDirectory
```
protected File indexDirectory
```
    The directory where this atomic index stores its files.
  - termProcessor
```
protected it.unimi.di.big.mg4j.index.TermProcessor termProcessor
```
    The term processor used to process the feature values being indexed.
  - maxDocSizeInRAM
```
protected int maxDocSizeInRAM
```
    The size (number of terms) for the longest document indexed but not yet saved.
  - occurrencesInRAM
```
protected long occurrencesInRAM
```
    The number of occurrences represented in RAM and not yet written to disk.
  - parent
```
protected MimirIndex parent
```
    The MimirIndex that this atomic index is a member of.
  - batches
```
protected List<AtomicIndex.MG4JIndex> batches
```
    A list containing the head and tails of this index.
  - invertedIndex
```
protected it.unimi.di.big.mg4j.index.Index invertedIndex
```
    The cluster-view of all the MG4J indexes that are part of this index (i.e. the head and all the tails).
  - directIndex
```
protected it.unimi.di.big.mg4j.index.Index directIndex
```
    The direct index for this atomic index. If hasDirectIndex() is false, then this index will be null.
  - additionalProperties
```
protected it.unimi.dsi.util.Properties additionalProperties
```
    A set of properties added to the ones obtained from the index writer when writing out batches.
  - additionalDirectProperties
```
protected it.unimi.dsi.util.Properties additionalDirectProperties
```
    A set of properties added to the ones obtained from the direct index writer when writing out batches.
  - hasDirectIndex
```
protected boolean hasDirectIndex
```
    Is the direct indexing enabled? Direct indexes are used to find terms occurring in given documents. This is the reverse operation to the typical search, which finds documents containing a given a set of terms.
  - directTermIds
```
protected it.unimi.dsi.fastutil.objects.Object2LongMap<String> directTermIds
```
    This map associates direct index terms with their IDs. See the note at the top-level javadocs for this class for a discussion on direct and inverted term IDs.
  - directTerms
```
protected it.unimi.dsi.fastutil.objects.ObjectBigList<String> directTerms
```
    The terms in the direct index, in the order they were first seen during indexing.
  - indexingThread
```
protected Thread indexingThread
```
    The single thread used to index documents. All writes to the index files are done from this thread.
  - inputQueue
```
protected BlockingQueue<GATEDocument> inputQueue
```
    Documents to be indexed are queued in this queue.
  - outputQueue
```
protected BlockingQueue<GATEDocument> outputQueue
```
    Documents that have been indexed are passed on to this queue.
  - tokenPosition
```
protected int tokenPosition
```
    The position of the current (or most-recently used) token in the current document.
  - currentTerm
```
protected it.unimi.dsi.lang.MutableString currentTerm
```
    A mutable string used to create instances of MutableString on the cheap.
  - documentsInRAM
```
protected int documentsInRAM
```
    The number of documents currently stored in RAM.
  - termMap
```
protected it.unimi.dsi.fastutil.objects.Object2ReferenceOpenHashMap<it.unimi.dsi.lang.MutableString,AtomicIndex.PostingsList> termMap
```
    An in-memory inverted index that gets dumped to files for each batch.
  - documentSizesInRAM
```
protected it.unimi.dsi.fastutil.ints.IntArrayList documentSizesInRAM
```
    The sizes (numbers of terms) for all the documents indexed in RAM.
  - compactIndexTask
```
protected RunnableFuture<Void> compactIndexTask
```
    If a request was made to compress the index (combine all sub-indexes into a new head) this value will be non-null. The operation will be performed on the indexing thread at the first opportunity. At that point this future will complete, and the value will be set back to null.
  - batchWriteTask
```
protected RunnableFuture<Long> batchWriteTask
```
    If a request was made to write the in-RAM index data to disk this value will be not null. The operation will be performed on the indexing thread at the first opportunity. At that point the Future will complete, and the value will be set back to null.
- Constructor Detail
  - AtomicIndex
```
protected AtomicIndex(MimirIndex parent,
                      String name,
                      boolean hasDirectIndex,
                      it.unimi.di.big.mg4j.index.TermProcessor termProcessor,
                      BlockingQueue<GATEDocument> inputQueue,
                      BlockingQueue<GATEDocument> outputQueue)
               throws IOException,
                      IndexException
```
    Creates a new AtomicIndex
    
    Parameters:
    
    parent - the MimirIndex containing this atomic index.
    
    name - the name of the sub-index, e.g. token-i or mentions-j
    
    indexDirectory - the directory where this index should store all its files.
    
    hasDirectIndex - should a direct index be used?
    
    inputQueue - the input queue for documents to be indexed.
    
    outputQueue - the output queue for documents that have been indexed.
    
    Throws:
    
    IndexException
    
    IOException
- Method Detail
  - generateTermMap
```
public static void generateTermMap(File termsFile,
                                   File termmapFile,
                                   File bloomFilterFile)
                            throws IOException
```
    Given a terms file (text file with one term per line) this method generates the corresponding termmap file (binary representation of a StringMap). Optionally, a BloomFilter can also be generated, if the suitable target file is provided.
    
    Parameters:
    
    termsFile - the input file
    
    termmapFile - the output termmap file, or null if a termmap is not required.
    
    bloomFilterFile - the file to be used for writing the BloomFilter for the index, or null if a Bloom filter is not required.
    
    Throws:
    
    IOException
  - openInvertedIndexCluster
```
protected static final it.unimi.di.big.mg4j.index.Index openInvertedIndexCluster(List<AtomicIndex.MG4JIndex> batches,
                                                                                 it.unimi.di.big.mg4j.index.TermProcessor termProcessor)
```
    Creates a documental cluster from a list of AtomicIndex.MG4JIndex values.
    
    Parameters:
    
    batches - the indexes to be combined into a cluster
    
    termProcessor - the term processor to be used (can be null)
    
    Returns:
    
    a documental cluster view of the list of indexes provided.
  - openDirectIndexCluster
```
protected static final it.unimi.di.big.mg4j.index.Index openDirectIndexCluster(List<AtomicIndex.MG4JIndex> batches)
```
    Opens the direct index files from all the batches and combines them into a LexicalCluster.
    
    Parameters:
    
    batches - the batches to be opened.
    
    Returns:
  - longToTerm
```
public static final String longToTerm(long value)
```
    Converts a long value into a String containing a zero-padded Hex representation of the input value. The lexicographic ordering of the generated strings is the same as the natural order of the corresponding long values.
    
    Parameters:
    
    value - the value to convert.
    
    Returns:
    
    the string representation.
  - initIndex
```
protected void initIndex()
                  throws IOException,
                         IndexException
```
    Opens the index and prepares it for indexing and searching.
    
    Throws:
    
    IndexException
    
    IOException
  - getName
```
public String getName()
```
    Gets the name of this atomic index. This is used as the file name for the directory storing the index files.
    
    Returns:
  - hasDirectIndex
```
public boolean hasDirectIndex()
```
    Is a direct index configured for this atomic index.
    
    Returns:
  - newBatch
```
protected void newBatch()
```
    Starts a new MG4J batch. First time around this will be the head, subsequent calls will start a new tail.
  - writeCurrentBatch
```
protected long writeCurrentBatch()
                          throws IOException,
                                 IndexException
```
    Writes all the data currently stored in RAM to a new index batch. The first batch is the head index, all other batches are tail indexes.
    
    Returns:
    
    the number of occurrences written to disk
    
    Throws:
    
    IOException
    
    IndexException
  - writeDirectIndex
```
protected void writeDirectIndex(File batchDir)
                         throws IOException,
                                IndexException
```
    Writes the in-RAM data to a new direct index batch.
    
    Parameters:
    
    batchDir -
    
    Throws:
    
    IOException
    
    IndexException
  - compactIndex
```
protected void compactIndex()
                     throws IndexException,
                            IOException,
                            org.apache.commons.configuration.ConfigurationException
```
    Combines all the currently existing batches, generating a new head index.
    
    Throws:
    
    IndexException
    
    IOException
    
    org.apache.commons.configuration.ConfigurationException
  - combineDirectIndexes
```
protected static void combineDirectIndexes(List<AtomicIndex.MG4JIndex> inputIndexes,
                                           String outputBasename)
                                    throws IOException,
                                           org.apache.commons.configuration.ConfigurationException
```
    Given a set of direct indexes (MG4J indexes, with counts, but no positions, that form a lexical cluster) this method produces one single output index containing the data from all the input indexes.
    
    Parameters:
    
    inputIndexes -
    
    outputBasename -
    
    Throws:
    
    IOException
    
    org.apache.commons.configuration.ConfigurationException
  - requestSyncToDisk
```
public Future<Long> requestSyncToDisk()
                               throws InterruptedException
```
    Instructs this index to dump to disk all the in-RAM index data at the fist opportunity.
    
    Returns:
    
    a Future value that, upon completion, will return the number of occurrences written to disk.
    
    Throws:
    
    InterruptedException - if this thread is interrupted while trying to queue the dump request.
  - requestCompactIndex
```
public Future<Void> requestCompactIndex()
                                 throws InterruptedException
```
    Requests this atomic index to compact its on-disk batches into a single batch.
    
    Returns:
    
    a Future which can be used to find out when the compaction operation has completed.
    
    Throws:
    
    InterruptedException - if this thread is interrupted while trying to queue the compaction request.
  - openSubIndex
```
protected AtomicIndex.MG4JIndex openSubIndex(String subIndexDirname)
                                      throws IOException,
                                             IndexException
```
    Opens one sub-index, specified as a directory inside this Atomic Index's index directory.
    
    Parameters:
    
    subIndexDirname -
    
    Returns:
    
    Throws:
    
    IOException
    
    IndexException
  - run
```
public void run()
```
    Runnable implementation: the logic of this run method is simply indexing documents queued to the input queue. To stop it, send a GATEDocument.END_OF_QUEUE value to the input queue.
    
    Specified by:
    
    run in interface Runnable
  - flush
```
protected abstract void flush()
                       throws IOException
```
    Closes all file-based resources.
    
    Throws:
    
    IOException
  - close
```
public void close()
           throws InterruptedException
```
    Notifies this index to stop its indexing operations, and waits for all data to be written.
    
    Throws:
    
    InterruptedException - is the waiting thread is interrupted before the indexing thread has finished writing all the data.
  - documentStarting
```
protected void documentStarting(GATEDocument gateDocument)
                         throws IndexException
```
    Hook for subclasses, called before processing the annotations for this document. The default implementation is a no-op.
    
    Throws:
    
    IndexException
  - documentEnding
```
protected void documentEnding(GATEDocument gateDocument)
                       throws IndexException
```
    Hook for subclasses, called after annotations for this document have been processed. The default implementation is a no-op.
    
    Throws:
    
    IndexException
  - getAnnotsToProcess
```
protected abstract gate.Annotation[] getAnnotsToProcess(GATEDocument gateDocument)
                                                 throws IndexException
```
    Get the annotations that are to be processed for a document, in increasing order of offset.
    
    Throws:
    
    IndexException
  - calculateStartPositionForAnnotation
```
protected abstract void calculateStartPositionForAnnotation(gate.Annotation ann,
                                                            GATEDocument gateDocument)
                                                     throws IndexException
```
    Calculate the starting position for the given annotation, storing it in tokenPosition. The starting position is the index of the token within the document where the annotation starts, and must be >= the previous value of tokenPosition.
    
    Parameters:
    
    ann -
    
    gateDocument -
    
    Throws:
    
    IndexException
  - calculateTermStringForAnnotation
```
protected abstract String[] calculateTermStringForAnnotation(gate.Annotation ann,
                                                             GATEDocument gateDocument)
                                                      throws IndexException
```
    Determine the string (or strings, if there are alternatives) that should be stored in the index for the given annotation. If a single string value should be returned, it is more efficient to store the value in currentTerm, in which case null should be returned instead. If the current term should not be indexed (e.g. it's a stop word), then the implementation should return an empty String array.
    
    Parameters:
    
    ann -
    
    gateDocument -
    
    Throws:
    
    IndexException
  - processDocument
```
protected void processDocument(GATEDocument gateDocument)
                        throws IndexException
```
    Adds the supplied document to the in-RAM index.
    
    Parameters:
    
    gateDocument - the document to index
    
    Throws:
    
    IndexException
  - processAnnotation
```
protected void processAnnotation(gate.Annotation ann,
                                 GATEDocument gateDocument)
                          throws IndexException
```
    Indexes one annotation (either a Token or a semantic annotation).
    
    Parameters:
    
    ann - the annotation to be indexed
    
    gateDocument - the GATEDocument containing the annotation
    
    Throws:
    
    IndexException
    
    IOException
  - indexCurrentTerm
```
protected void indexCurrentTerm()
```
    Adds the value in currentTerm to the index.
    
    Throws:
    
    IOException
  - getIndexDirectory
```
public File getIndexDirectory()
```
    Gets the top level directory for this atomic index. This will be a directory contained in the top level directory of the MimirIndex which includes this atomic index.
    
    Returns:
  - getParent
```
public MimirIndex getParent()
```
    Gets the top level MimirIndex to which this atomic index belongs.
    
    Returns:
  - getInputQueue
```
public BlockingQueue<GATEDocument> getInputQueue()
```
    Gets the input queue used by this atomic index. This queue is used to submit documents for indexing.
    
    Returns:
  - getOutputQueue
```
public BlockingQueue<GATEDocument> getOutputQueue()
```
    Gets the output queue used by this atomic index. This is used to "return" documents that have finished indexing. Notably, values in this queue will have their occurrences value (see GATEDocument.getOccurrences()) increased by the number of occurrences generated by indexing the document in this atomic index.
    
    Returns:
  - getIndex
```
public it.unimi.di.big.mg4j.index.Index getIndex()
```
    Gets the inverted index (an Index value) that can be used to search this atomic index. This will normally be a DocumentalCluster view over all the batches contained.
    
    Returns:
  - getDirectIndex
```
public it.unimi.di.big.mg4j.index.Index getDirectIndex()
```
    Gets the direct index for this atomic index. The returned value is non-null only if the atomic index was configured to have a direct index upon its construction (see #AtomicIndex(MimirIndex, String, File, boolean, TermProcessor, BlockingQueue, BlockingQueue).). You can check if a direct index has been configured by calling hasDirectIndex().
    
    Returns:
    
    an Index in which terms and documents are reversed. When querying the returned index, the "terms" provided should be String representations of document IDs (as produced by longToTerm(long)). The search results is a set of "document IDs", which are actually term IDs. The actual term string corresponding to the returned term IDs can be obtained by calling getDirectTerm(long).
  - getDirectTerm
```
public CharSequence getDirectTerm(long termId)
```
    Gets the term string for a given direct term ID. The term ID must have been obtained from the direct index of this index.
    
    Parameters:
    
    termId - the ID for the term being sought.
    
    Returns:
    
    the string for the given term.
  - getDirectTerms
```
public it.unimi.dsi.fastutil.objects.ObjectBigList<? extends CharSequence> getDirectTerms()
```
    Gets the list of direct terms for this index. The terms are sorted by the first they were seen, and not lexicographically.
    
    Returns:
  - getDirectTermOccurenceCount
```
public long getDirectTermOccurenceCount(long directTermId)
                                 throws IOException
```
    Gets the occurrence count in the whole index for a given direct term, specified by a direct term ID (which must have been obtained from the direct index of this index).
    
    Parameters:
    
    directTermId -
    
    Returns:
    
    Throws:
    
    IOException
  - getBatchCount
```
public int getBatchCount()
```
    Returns the number of batches in this atomic index.
    
    Returns:

Class AtomicIndex

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

HEAD_FILE_NAME

HEAD_NEW_EXT

HEAD_OLD_EXT

TAIL_FILE_NAME_PREFIX

DIRECT_TERMS_FILENAME

DIRECT_INDEX_NAME_SUFFIX

DOCUMENTS_QUEUE_FILE_NAME

TAILS_FILENAME_FILTER

name

indexDirectory

termProcessor

maxDocSizeInRAM

occurrencesInRAM

parent

batches

invertedIndex

directIndex

additionalProperties

additionalDirectProperties

hasDirectIndex

directTermIds

directTerms

indexingThread

inputQueue

outputQueue

tokenPosition

currentTerm

documentsInRAM

termMap

documentSizesInRAM

compactIndexTask

batchWriteTask

Constructor Detail

AtomicIndex

Method Detail

generateTermMap

openInvertedIndexCluster

openDirectIndexCluster

longToTerm

initIndex

getName

hasDirectIndex

newBatch

writeCurrentBatch

writeDirectIndex

compactIndex

combineDirectIndexes

requestSyncToDisk

requestCompactIndex

openSubIndex

run

flush

close

documentStarting

documentEnding

getAnnotsToProcess

calculateStartPositionForAnnotation

calculateTermStringForAnnotation

processDocument

processAnnotation

indexCurrentTerm

getIndexDirectory

getParent

getInputQueue

getOutputQueue

getIndex

getDirectIndex

getDirectTerm

getDirectTerms

getDirectTermOccurenceCount

getBatchCount