gate.creole.annic.apache.lucene.search
Class DefaultSimilarity

java.lang.Object
  extended by gate.creole.annic.apache.lucene.search.Similarity
      extended by gate.creole.annic.apache.lucene.search.DefaultSimilarity

public class DefaultSimilarity
extends Similarity

Expert: Default scoring implementation.


Constructor Summary
DefaultSimilarity()
           
 
Method Summary
 float coord(int overlap, int maxOverlap)
          Implemented as overlap / maxOverlap.
 float idf(int docFreq, int numDocs)
          Implemented as log(numDocs/(docFreq+1)) + 1.
 float lengthNorm(String fieldName, int numTerms)
          Implemented as 1/sqrt(numTerms).
 float queryNorm(float sumOfSquaredWeights)
          Implemented as 1/sqrt(sumOfSquaredWeights).
 float sloppyFreq(int distance)
          Implemented as 1 / (distance + 1).
 float tf(float freq)
          Implemented as sqrt(freq).
 
Methods inherited from class gate.creole.annic.apache.lucene.search.Similarity
decodeNorm, encodeNorm, getDefault, idf, idf, setDefault, tf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DefaultSimilarity

public DefaultSimilarity()
Method Detail

lengthNorm

public float lengthNorm(String fieldName,
                        int numTerms)
Implemented as 1/sqrt(numTerms).

Specified by:
lengthNorm in class Similarity
Parameters:
fieldName - the name of the field
numTerms - the total number of tokens contained in fields named fieldName of doc.
Returns:
a normalization factor for hits on this field of this document
See Also:
Field.setBoost(float)

queryNorm

public float queryNorm(float sumOfSquaredWeights)
Implemented as 1/sqrt(sumOfSquaredWeights).

Specified by:
queryNorm in class Similarity
Parameters:
sumOfSquaredWeights - the sum of the squares of query term weights
Returns:
a normalization factor for query weights

tf

public float tf(float freq)
Implemented as sqrt(freq).

Specified by:
tf in class Similarity
Parameters:
freq - the frequency of a term within a document
Returns:
a score factor based on a term's within-document frequency

sloppyFreq

public float sloppyFreq(int distance)
Implemented as 1 / (distance + 1).

Specified by:
sloppyFreq in class Similarity
Parameters:
distance - the edit distance of this sloppy phrase match
Returns:
the frequency increment for this match
See Also:
PhraseQuery.setSlop(int)

idf

public float idf(int docFreq,
                 int numDocs)
Implemented as log(numDocs/(docFreq+1)) + 1.

Specified by:
idf in class Similarity
Parameters:
docFreq - the number of documents which contain the term
numDocs - the total number of documents in the collection
Returns:
a score factor based on the term's document frequency

coord

public float coord(int overlap,
                   int maxOverlap)
Implemented as overlap / maxOverlap.

Specified by:
coord in class Similarity
Parameters:
overlap - the number of query terms matched in the document
maxOverlap - the total number of terms in the query
Returns:
a score factor based on term overlap with the query