public class DocTermOrds extends Object
getOrdTermsEnum(org.apache.lucene.index.AtomicReader)
method, and then seek-by-ord to get the term's bytes.
While normally term ords are type long, in this API they are
int as the internal representation here cannot address
more than MAX_INT unique terms. Also, typically this
class is used on fields with relatively few unique terms
vs the number of documents. In addition, there is an
internal limit (16 MB) on how many bytes each chunk of
documents may consume. If you trip this limit you'll hit
an IllegalStateException.
Deleted documents are skipped during uninversion, and if
you look them up you'll get 0 ords.
The returned per-document ords do not retain their
original order in the document. Instead they are returned
in sorted (by ord, ie term's BytesRef comparator) order. They
are also de-dup'd (ie if doc has same term more than once
in this field, you'll only get that ord back once).
This class tests whether the provided reader is able to
retrieve terms by ord (ie, it's single segment, and it
uses an ord-capable terms index). If not, this class
will create its own term index internally, allowing to
create a wrapped TermsEnum that can handle ord. The
getOrdTermsEnum(org.apache.lucene.index.AtomicReader)
method then provides this
wrapped enum, if necessary.
The RAM consumption of this class can be high!Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_INDEX_INTERVAL_BITS
Every 128th term is indexed, by default.
|
Constructor and Description |
---|
DocTermOrds(AtomicReader reader,
Bits liveDocs,
String field)
Inverts all terms
|
DocTermOrds(AtomicReader reader,
Bits liveDocs,
String field,
BytesRef termPrefix)
Inverts only terms starting w/ prefix
|
DocTermOrds(AtomicReader reader,
Bits liveDocs,
String field,
BytesRef termPrefix,
int maxTermDocFreq)
Inverts only terms starting w/ prefix, and only terms
whose docFreq (not taking deletions into account) is
<= maxTermDocFreq
|
DocTermOrds(AtomicReader reader,
Bits liveDocs,
String field,
BytesRef termPrefix,
int maxTermDocFreq,
int indexIntervalBits)
Inverts only terms starting w/ prefix, and only terms
whose docFreq (not taking deletions into account) is
<= maxTermDocFreq, with a custom indexing interval
(default is every 128nd term).
|
Modifier and Type | Method and Description |
---|---|
TermsEnum |
getOrdTermsEnum(AtomicReader reader)
Returns a TermsEnum that implements ord.
|
boolean |
isEmpty()
Returns
true if no terms were indexed. |
SortedSetDocValues |
iterator(AtomicReader reader)
Returns a SortedSetDocValues view of this instance
|
BytesRef |
lookupTerm(TermsEnum termsEnum,
int ord)
Returns the term (
BytesRef ) corresponding to
the provided ordinal. |
int |
numTerms()
Returns the number of terms in this field
|
long |
ramUsedInBytes()
Returns total bytes used.
|
public static final int DEFAULT_INDEX_INTERVAL_BITS
public DocTermOrds(AtomicReader reader, Bits liveDocs, String field) throws IOException
IOException
public DocTermOrds(AtomicReader reader, Bits liveDocs, String field, BytesRef termPrefix) throws IOException
IOException
public DocTermOrds(AtomicReader reader, Bits liveDocs, String field, BytesRef termPrefix, int maxTermDocFreq) throws IOException
IOException
public DocTermOrds(AtomicReader reader, Bits liveDocs, String field, BytesRef termPrefix, int maxTermDocFreq, int indexIntervalBits) throws IOException
IOException
public long ramUsedInBytes()
public TermsEnum getOrdTermsEnum(AtomicReader reader) throws IOException
NOTE: you must pass the same reader that was used when creating this class
IOException
public int numTerms()
public boolean isEmpty()
true
if no terms were indexed.public BytesRef lookupTerm(TermsEnum termsEnum, int ord) throws IOException
BytesRef
) corresponding to
the provided ordinal.IOException
public SortedSetDocValues iterator(AtomicReader reader) throws IOException
IOException
Copyright © 2010 - 2020 Adobe. All Rights Reserved