public abstract class SimilarityBase extends Similarity
Similarity
that provides a simplified API for its
descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, float, float)
and toString()
methods. Implementing
explain(Explanation, BasicStats, int, float, float)
is optional,
inasmuch as SimilarityBase already provides a basic explanation of the score
and the term frequency. However, implementers of a subclass are encouraged to
include as much detail about the scoring method as possible.
Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.
Similarity.SimScorer, Similarity.SimWeight
Constructor and Description |
---|
SimilarityBase()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
long |
computeNorm(FieldInvertState state)
Encodes the document length in the same way as
TFIDFSimilarity . |
Similarity.SimWeight |
computeWeight(float queryBoost,
CollectionStatistics collectionStats,
TermStatistics... termStats)
Compute any collection-level weight (e.g.
|
boolean |
getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length.
|
static double |
log2(double x)
Returns the base two logarithm of
x . |
void |
setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with
0 position increment) are ignored when computing
norm.
|
Similarity.SimScorer |
simScorer(Similarity.SimWeight stats,
AtomicReaderContext context)
Creates a new
Similarity.SimScorer to score matching documents from a segment of the inverted index. |
abstract String |
toString()
Subclasses must override this method to return the name of the Similarity
and preferably the values of parameters (if any) as well.
|
coord, queryNorm
public SimilarityBase()
public void setDiscountOverlaps(boolean v)
public boolean getDiscountOverlaps()
setDiscountOverlaps(boolean)
public final Similarity.SimWeight computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats)
Similarity
computeWeight
in class Similarity
queryBoost
- the query-time boost.collectionStats
- collection-level statistics, such as the number of tokens in the collection.termStats
- term-level statistics, such as the document frequency of a term across the collection.public Similarity.SimScorer simScorer(Similarity.SimWeight stats, AtomicReaderContext context) throws IOException
Similarity
Similarity.SimScorer
to score matching documents from a segment of the inverted index.simScorer
in class Similarity
stats
- collection information from Similarity.computeWeight(float, CollectionStatistics, TermStatistics...)
context
- segment of the inverted index to be scored.context
IOException
- if there is a low-level I/O errorpublic abstract String toString()
public long computeNorm(FieldInvertState state)
TFIDFSimilarity
.computeNorm
in class Similarity
state
- current processing state for this fieldpublic static double log2(double x)
x
.Copyright © 2010 - 2020 Adobe. All Rights Reserved