public class RankingQueryRunnerImpl extends Object implements QueryRunner
MimirScorer
is provided during construction
or not.
All documents are referred to using their rank (i.e. position in the list of
results). When working in non-ranking mode, ranking order is the same as
document ID order.Modifier and Type | Class and Description |
---|---|
protected class |
RankingQueryRunnerImpl.BackgroundRunner
The background thread implementation: simply collects
Runnable s
from the backgroundTasks queue and runs them. |
protected class |
RankingQueryRunnerImpl.DocIdsCollector
The first action started when a new
RankingQueryRunnerImpl is
created. |
protected class |
RankingQueryRunnerImpl.HitsCollector
Collects the document hits (i.e.
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
allDocIdsCollected
Flag used to mark that all results documents have been counted.
|
protected BlockingQueue<Runnable> |
backgroundTasks
A queue with tasks to be executed by the background thread.
|
protected boolean |
closed
Internal flag used to mark when this query runner has been closed.
|
protected int |
docBlockSize
The number of documents to be ranked (of have their hits collected) as a
block.
|
protected FutureTask<Object> |
docIdCollectorFuture
The task that's working on collecting all the document IDs.
|
protected it.unimi.dsi.fastutil.objects.ObjectBigList<List<Binding>> |
documentHits
The sets of hits for each returned document.
|
protected it.unimi.dsi.fastutil.longs.LongBigList |
documentIds
The document IDs for the documents found to contain hits.
|
protected it.unimi.dsi.fastutil.doubles.DoubleBigArrayBigList |
documentScores
If scoring is enabled (
scorer is not null ), this list
contains the scores for the documents found to contain hits. |
protected it.unimi.dsi.fastutil.longs.LongBigList |
documentsOrder
The order the documents should be returned in (elements in this list are
indexes in
documentIds ). |
protected SortedMap<long[],Future<?>> |
hitCollectors
Data structure holding references to
Future s that are currently
working (or have worked) on collecting hits for a range of document
indexes. |
protected static org.slf4j.Logger |
logger
Shared logger instance.
|
protected QueryEngine |
queryEngine
The QueryEngine we run inside.
|
protected QueryExecutor |
queryExecutor
The
QueryExecutor for the query being run. |
protected Thread |
runningThread
The background thread used for collecting hits.
|
protected MimirScorer |
scorer
The
MimirScorer to be used for ranking documents. |
DEFAULT_SCORE
Constructor and Description |
---|
RankingQueryRunnerImpl(QueryExecutor executor,
MimirScorer scorer)
Creates a query runner in ranking mode.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes this
QueryExecutor and releases all resources used. |
protected Future<?> |
collectHits(long[] interval)
Makes sure all the documents in the specified range are queued for hit
collection.
|
protected long |
findRank(double documentScore,
long start,
long end)
Given a document score, finds the correct insertion point into the
documentsOrder list, within a given range of ranks. |
List<Binding> |
getDocumentHits(long rank)
Retrieves the hits within a given result document.
|
long |
getDocumentID(long rank)
Gets the ID of a result document.
|
protected long |
getDocumentIndex(long rank)
Given a document rank, return its index in the
documentIds list. |
Serializable |
getDocumentMetadataField(long rank,
String fieldName)
Obtains an arbitrary document metadata field from the stored document data.
|
Map<String,Serializable> |
getDocumentMetadataFields(long rank,
Set<String> fieldNames)
Obtains a set of arbitrary document metadata fields from the stored
document data.
|
double |
getDocumentScore(long rank)
Get the score for a given result document.
|
long |
getDocumentsCount()
Gets the number of result documents.
|
long |
getDocumentsCountSync()
Synchronous version of
getDocumentsCount() that waits if necessary
before returning the correct result (instead of returning -1
of the value is not yet known). |
long |
getDocumentsCurrentCount()
Gets the number of result documents found so far.
|
String[][] |
getDocumentText(long rank,
int termPosition,
int length)
Gets a segment of the document text for a given document.
|
String |
getDocumentTitle(long rank)
Obtains the title for a given document.
|
String |
getDocumentURI(long rank)
Obtains the URI for a given document.
|
protected long |
nextNotDeleted()
Find the next document ID for the current query executor which is not
marked as deleted in the index.
|
protected void |
rankDocuments(long rank)
Ranks some more documents (i.e.
|
void |
renderDocument(long rank,
Appendable out)
Render the content of the given document, with the hits for this query
highlighted.
|
protected static org.slf4j.Logger logger
protected QueryExecutor queryExecutor
QueryExecutor
for the query being run.protected QueryEngine queryEngine
protected MimirScorer scorer
MimirScorer
to be used for ranking documents.protected int docBlockSize
protected it.unimi.dsi.fastutil.longs.LongBigList documentIds
protected it.unimi.dsi.fastutil.doubles.DoubleBigArrayBigList documentScores
scorer
is not null
), this list
contains the scores for the documents found to contain hits. This list is
aligned to documentIds
.protected it.unimi.dsi.fastutil.objects.ObjectBigList<List<Binding>> documentHits
documentIds
.protected it.unimi.dsi.fastutil.longs.LongBigList documentsOrder
documentIds
).protected SortedMap<long[],Future<?>> hitCollectors
Future
s that are currently
working (or have worked) on collecting hits for a range of document
indexes.protected Thread runningThread
protected BlockingQueue<Runnable> backgroundTasks
protected volatile boolean allDocIdsCollected
protected volatile FutureTask<Object> docIdCollectorFuture
protected volatile boolean closed
public RankingQueryRunnerImpl(QueryExecutor executor, MimirScorer scorer) throws IOException
qNode
- the QueryNode
for the query being executed.scorer
- the MimirScorer
to use for ranking.qEngine
- the QueryEngine
used for executing the queries.IOException
public long getDocumentsCount()
QueryRunner
getDocumentsCount
in interface QueryRunner
-1
if the search has not yet completed, the total
number of result document otherwise.public long getDocumentsCountSync()
getDocumentsCount()
that waits if necessary
before returning the correct result (instead of returning -1
of the value is not yet known).getDocumentsCountSync
in interface QueryRunner
public long getDocumentsCurrentCount()
QueryRunner
QueryRunner.getDocumentsCount()
.getDocumentsCurrentCount
in interface QueryRunner
public long getDocumentID(long rank) throws IndexOutOfBoundsException, IOException
QueryRunner
getDocumentID
in interface QueryRunner
rank
- the index of the desired document in the list of documents.
This should be a value between 0 and QueryRunner.getDocumentsCount()
-1.
If the requested document position has not yet been ranked (i.e. we know
there is a document at that position, but we don't yet know which one) then
the necessary ranking is performed before this method returns.IndexOutOfBoundsException
- is the index provided is less than zero,
or greater than QueryRunner.getDocumentsCount()
-1.IOException
public double getDocumentScore(long rank) throws IndexOutOfBoundsException, IOException
QueryRunner
QueryEngine
(see
QueryEngine.setScorerSource(java.util.concurrent.Callable)
).getDocumentScore
in interface QueryRunner
rank
- the index of the desired document in the list of documents.
This should be a value between 0 and QueryRunner.getDocumentsCount()
-1.IndexOutOfBoundsException
IOException
public List<Binding> getDocumentHits(long rank) throws IndexOutOfBoundsException, IOException
QueryRunner
getDocumentHits
in interface QueryRunner
rank
- the index of the desired document in the list of documents.
This should be a value between 0 and QueryRunner.getDocumentsCount()
-1.
This method call waits until the requested data is available before
returning (document hits are being collected by a background thread).IndexOutOfBoundsException
IOException
protected long getDocumentIndex(long rank) throws IOException, IndexOutOfBoundsException
documentIds
list.
If ranking is not being performed, then the rank is interpreted as an index
against the documentIds
list and is simply returned.rank
- IOException,
- IndexOutOfBoundsExceptionIOException
IndexOutOfBoundsException
protected void rankDocuments(long rank) throws IOException
documentsOrder
list, making sure that the document at provided
rank is included (if such a document exists). If the provided rank is
larger than the number of result documents, then all documents will be
ranked before this method returns.
This is the only method that writes to the documentsOrder
list.
This method is executed synchronously in the client thread.rank
- IOException
protected long findRank(double documentScore, long start, long end)
documentsOrder
list, within a given range of ranks.
This method performs binary search followed by a linear scan so that the
returned insertion point is the largest correct one (i.e. later documents
with the same score get sorted after earlier ones, thus keeping the sorting
stable).documentScore
- the score for the new document.start
- the start of the search range within documentsOrder
end
- the end of the search range within documentsOrder
protected Future<?> collectHits(long[] interval)
interval
- the interval specified by 2 document ranks. The interval is
defined as the elements in documentsOrder
between ranks
interval[0] and (interval[1]-1) inclusive.public String[][] getDocumentText(long rank, int termPosition, int length) throws IndexException, IndexOutOfBoundsException, IOException
QueryRunner
getDocumentText
in interface QueryRunner
rank
- the rank of the requested document. This should be a value
between 0 and QueryRunner.getDocumentsCount()
-1.termPosition
- the first term requested.length
- the number of terms requested.IndexException
IndexOutOfBoundsException
IOException
public String getDocumentURI(long rank) throws IndexException, IndexOutOfBoundsException, IOException
QueryRunner
getDocumentURI
in interface QueryRunner
rank
- the rank for the requested document. This should be a value
between 0 and QueryRunner.getDocumentsCount()
-1.IndexException
IndexOutOfBoundsException
IOException
public String getDocumentTitle(long rank) throws IndexException, IndexOutOfBoundsException, IOException
QueryRunner
getDocumentTitle
in interface QueryRunner
rank
- the rank of the requested document. This should be a value
between 0 and QueryRunner.getDocumentsCount()
-1.IndexException
IndexOutOfBoundsException
IOException
public Serializable getDocumentMetadataField(long rank, String fieldName) throws IndexException, IndexOutOfBoundsException, IOException
QueryRunner
DocumentMetadataHelper
s used at indexing time can add arbitrary
Serializable
values as metadata fields for the documents being
indexed. This method is used at search time to retrieve those values.getDocumentMetadataField
in interface QueryRunner
rank
- the rank for the requested document. This should be a value
between 0 and QueryRunner.getDocumentsCount()
-1.fieldName
- the field name for which the value is sought.IndexException
IndexOutOfBoundsException
IOException
public Map<String,Serializable> getDocumentMetadataFields(long rank, Set<String> fieldNames) throws IndexException, IndexOutOfBoundsException, IOException
QueryRunner
DocumentMetadataHelper
s used at indexing time can add arbitrary
Serializable
values as metadata fields for the documents being
indexed. This method is used at search time to retrieve those values.getDocumentMetadataFields
in interface QueryRunner
rank
- the rank for the requested document. This should be a value
between 0 and QueryRunner.getDocumentsCount()
-1.fieldNames
- the names of the metadata fields for which the values are
requested.Map
linking field names with their values.IndexException
IndexOutOfBoundsException
IOException
public void renderDocument(long rank, Appendable out) throws IOException, IndexException
QueryRunner
renderDocument
in interface QueryRunner
rank
- the rank for the requested document. This should be a value
between 0 and QueryRunner.getDocumentsCount()
-1.out
- an Appendable
to which the output is written.IOException
IndexException
public void close() throws IOException
QueryRunner
QueryExecutor
and releases all resources used.close
in interface QueryRunner
IOException
protected long nextNotDeleted() throws IOException
IOException
Copyright © 2021 GATE. All rights reserved.