public class QueryEngine extends Object
Modifier and Type | Class and Description |
---|---|
static class |
QueryEngine.IndexType
Represents the type of index that should be searched.
|
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_DOCUMENT_BLOCK_SIZE
The default value for the document block size.
|
protected Executor |
executor
The executor used to run tasks for query execution.
|
protected MimirIndex |
index
The index being searched.
|
protected IndexConfig |
indexConfig
The index configuration this index was built from.
|
protected static org.slf4j.Logger |
logger |
static long |
MAX_IN_MEMORY_INDEX
The maximum size of an index that can be loaded in memory (by default 64
MB).
|
protected gate.LanguageAnalyser |
queryTokeniser
The tokeniser (technically any GATE LA) used to split the text segments
found in queries into individual tokens.
|
protected Callable<MimirScorer> |
scorerSource
A callable that produces new
MimirScorer instances on request. |
protected boolean |
subBindingsEnabled
Should sub-bindings be generated when searching?
|
Constructor and Description |
---|
QueryEngine(MimirIndex index)
Constructs a new query engine for a
MimirIndex . |
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes this
QueryEngine and releases all resources. |
SemanticAnnotationHelper |
getAnnotationHelper(AnnotationQuery query)
Get the
SemanticAnnotationHelper corresponding to a query's
annotation type. |
SemanticAnnotationHelper |
getAnnotationHelper(String annotationType) |
AtomicAnnotationIndex |
getAnnotationIndex(String annotationType)
Returns the index that stores the data for a particular semantic annotation
type.
|
int |
getDocumentBlockSize()
Gets the configuration parameter specifying the number of documents that
get processed as a block.
|
Serializable |
getDocumentMetadataField(long docID,
String fieldName)
Obtains an arbitrary document metadata field from the stored document data.
|
String |
getDocumentTitle(long docID) |
String |
getDocumentURI(long docID) |
Executor |
getExecutor()
Gets the executor used by this query engine.
|
String[][] |
getHitText(Binding hit)
Gets the text covered by a given binding.
|
String[][] |
getHitText(Binding hit,
int leftContext,
int rightContext)
Obtains the document text for a given search hit.
|
MimirIndex |
getIndex()
Gets the index this query engine is searching.
|
IndexConfig |
getIndexConfig() |
String[][] |
getLeftContext(Binding hit,
int numTokens)
Get the text to the left of the given binding.
|
QueryRunner |
getQueryRunner(QueryNode query)
Obtains a query executor for a given
QueryNode . |
QueryRunner |
getQueryRunner(String query)
Obtains a query executor for a given query, expressed as a String.
|
String[][] |
getRightContext(Binding hit,
int numTokens)
Get the text to the right of the given binding.
|
Callable<MimirScorer> |
getScorerSource()
Gets the current source of scorers.
|
int |
getSubIndexPosition(QueryEngine.IndexType indexType,
String indexName)
Finds the location for a given sub-index in the arrays returned by
#getIndexes() and #getDirectIndexes() . |
String[][] |
getText(long documentID,
int termPosition,
int length)
Obtains the text for a specified region of a document.
|
AtomicTokenIndex |
getTokenIndex(String featureName)
Returns the index that stores the data for a particular feature of token
annotations.
|
boolean |
isSubBindingsEnabled()
Are sub-bindings used in this query engine.
|
void |
releaseQueryRunner(QueryRunner qRunner)
Notifies the QueryEngine that the given QueryRunner has been closed.
|
void |
renderDocument(long docID,
List<Binding> hits,
Appendable output)
Renders a document and a list of hits.
|
void |
setDocumentBlockSize(int documentBlockSize)
Sets the configuration parameter specifying the number of documents that
get processed in one go (e.g.
|
void |
setExecutor(Executor executor)
Sets the
Executor used for executing tasks required for running
queries. |
void |
setQueryTokeniser(gate.LanguageAnalyser queryTokeniser)
Sets the tokeniser (technically any GATE analyser) used to split the text
segments found in queries into individual tokens.
|
void |
setScorerSource(Callable<MimirScorer> scorerSource)
Provides a
Callable that the Query Engine can use for obtaining
new instances of MimirScorer to be used for ranking new queries. |
void |
setSubBindingsEnabled(boolean subBindingsEnabled) |
public static final long MAX_IN_MEMORY_INDEX
public static final int DEFAULT_DOCUMENT_BLOCK_SIZE
setDocumentBlockSize(int)
,
Constant Field Valuesprotected final MimirIndex index
protected IndexConfig indexConfig
protected boolean subBindingsEnabled
protected Callable<MimirScorer> scorerSource
MimirScorer
instances on request.protected static final org.slf4j.Logger logger
protected gate.LanguageAnalyser queryTokeniser
protected Executor executor
public QueryEngine(MimirIndex index)
MimirIndex
.index
- the index to be searched.public boolean isSubBindingsEnabled()
public void setSubBindingsEnabled(boolean subBindingsEnabled)
subBindingsEnabled
- the subBindingsEnabled to setpublic int getDocumentBlockSize()
public void setDocumentBlockSize(int documentBlockSize)
DEFAULT_DOCUMENT_BLOCK_SIZE
.documentBlockSize
- public Callable<MimirScorer> getScorerSource()
setScorerSource(Callable)
public void setScorerSource(Callable<MimirScorer> scorerSource)
Callable
that the Query Engine can use for obtaining
new instances of MimirScorer
to be used for ranking new queries.scorerSource
- public Executor getExecutor()
public void setExecutor(Executor executor)
Executor
used for executing tasks required for running
queries. This allows the use of some type thread pooling, is needed. If
this value is not set, then new threads are created as required.executor
- public void setQueryTokeniser(gate.LanguageAnalyser queryTokeniser)
queryTokeniser
- the new tokeniser to be used for parsing queries.public int getSubIndexPosition(QueryEngine.IndexType indexType, String indexName)
#getIndexes()
and #getDirectIndexes()
.indexType
- the IndexType of the requested sub-index (tokens or
annotations).indexName
- the "name" of the requested sub-index (the
indexed feature name for QueryEngine.IndexType.TOKENS
indexes, or the
annotation type in the case of QueryEngine.IndexType.ANNOTATIONS
indexes).public AtomicTokenIndex getTokenIndex(String featureName)
featureName
- public AtomicAnnotationIndex getAnnotationIndex(String annotationType)
annotationType
- public SemanticAnnotationHelper getAnnotationHelper(String annotationType)
public MimirIndex getIndex()
public IndexConfig getIndexConfig()
public SemanticAnnotationHelper getAnnotationHelper(AnnotationQuery query)
SemanticAnnotationHelper
corresponding to a query's
annotation type.IllegalArgumentException
- if the annotation helper for this
type cannot be found.public QueryRunner getQueryRunner(QueryNode query) throws IOException
QueryNode
.query
- the query to be executed.QueryExecutor
for the provided query, running over the
indexes in this query engine.IOException
- if the index files cannot be accessed.public void releaseQueryRunner(QueryRunner qRunner)
qRunner
- public QueryRunner getQueryRunner(String query) throws IOException, ParseException
query
- the query to be executed.QueryExecutor
for the provided query, running over the
indexes in this query engine.IOException
- if the index files cannot be accessed.ParseException
- if the string provided for the query cannot be parsed.public String[][] getHitText(Binding hit, int leftContext, int rightContext) throws IndexException
hit
- the search hit for which the text is sought.leftContext
- the number of tokens to the left of the hit to be included in the
result.rightContext
- the number of tokens to the right of the hit to be included in the
result.String
s, representing the tokens and
spaces at the location of the search hit. The first element of the
array is an array of tokens, the second element contains the
spaces.The first element of each array corresponds to the first
token of the left context.IOException
IndexException
public String[][] getHitText(Binding hit) throws IndexException
hit
- the binding.IOException
IndexException
public String[][] getLeftContext(Binding hit, int numTokens) throws IndexException
hit
- the binding.numTokens
- the maximum number of tokens of context to return. The actual
number of tokens returned may be smaller than this if the hit
starts within numTokens
tokens of the start of the
document.IOException
IndexException
public String[][] getRightContext(Binding hit, int numTokens) throws IndexException
hit
- the binding.numTokens
- the maximum number of tokens of context to return. The actual
number of tokens returned may be smaller than this if the hit ends
within numTokens
tokens of the end of the document.IOException
IndexException
public String[][] getText(long documentID, int termPosition, int length) throws IndexException
length >= 0
, the two parallel arrays will
always be exactly length
items long, but any token positions
that do not exist in the document (i.e. before the start or beyond the end
of the text) will be null
. If length < 0
the
arrays will be of sufficient length to hold all the tokens from
termPosition
to the end of the document, with no trailing
null
s (there may be leading null
s if
termPosition < 0
).documentID
- the document IDtermPosition
- the position of the first term requiredlength
- the number of terms to return. May be negativem, in which case all
terms from termPosition to the end of the document will be
returned.IndexException
public void renderDocument(long docID, List<Binding> hits, Appendable output) throws IOException, IndexException
docID
- the document to be rendered.hits
- the list of hits to be rendered.output
- the Appendable
used to write the output.IOException
- if the output cannot be written to.IndexException
- if no document renderer is available.public String getDocumentTitle(long docID) throws IndexException
IndexException
public String getDocumentURI(long docID) throws IndexException
IndexException
public Serializable getDocumentMetadataField(long docID, String fieldName) throws IndexException
DocumentMetadataHelper
s used at indexing time can add arbitrary
Serializable
values as metadata fields for the documents being
indexed. This method is used at search time to retrieve those values.docID
- the ID of document for which the metadata is sought.fieldName
- the name of the metadata filed to be obtainedIndexException
public void close()
QueryEngine
and releases all resources.Copyright © 2021 GATE. All rights reserved.