org.apache.lucene.search.postingshighlight.PostingsHighlighter

public class PostingsHighlighter extends Object

Simple highlighter that does not analyze fields nor use term vectors. Instead it requires FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using getSentenceInstance(Locale.ROOT). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

getBreakIterator(String): Customize how the text is divided into passages.
getScorer(String): Customize how passages are ranked.
getFormatter(String): Customize how snippets are formatted.
getIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

   // configure field with offsets at index time
   FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
   offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
   Field body = new Field("body", "foobar", offsetsType);

   // retrieve highlights at query time 
   PostingsHighlighter highlighter = new PostingsHighlighter();
   Query query = new TermQuery(new Term("body", "highlighting"));
   TopDocs topDocs = searcher.search(query, n);
   String highlights[] = highlighter.highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers.

Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_MAX_LENGTH

Default maximum content size to process.
Constructor Summary

Constructors

Constructor

Description

PostingsHighlighter()

Creates a new highlighter with DEFAULT_MAX_LENGTH.

PostingsHighlighter(int maxLength)

Creates a new highlighter, specifying maximum content length.
Method Summary

Modifier and Type

Method

Description

String[]

highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs)

Highlights the top passages from a single field.

String[]

highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)

Highlights the top-N passages from a single field.

Map<String,String[]>

highlightFields(String[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)

Highlights the top-N passages from multiple fields, for the provided int[] docids.

Map<String,String[]>

highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)

Highlights the top passages from multiple fields.

Map<String,String[]>

highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)

Highlights the top-N passages from multiple fields.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- DEFAULT_MAX_LENGTH
  
  public static final int DEFAULT_MAX_LENGTH
  
  Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content
  See Also:
  
  Constant Field Values
Constructor Details
- PostingsHighlighter
  
  public PostingsHighlighter()
  
  Creates a new highlighter with DEFAULT_MAX_LENGTH.
- PostingsHighlighter
  
  public PostingsHighlighter(int maxLength)
  
  Creates a new highlighter, specifying maximum content length.
  
  Parameters:
  
  maxLength - maximum content size to process.
  
  Throws:
  
  IllegalArgumentException - if maxLength is negative or Integer.MAX_VALUE
Method Details
- highlight
  
  public String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs) throws IOException
  
  Highlights the top passages from a single field.
  
  Parameters:
  
  field - field name to highlight. Must have a stored string value and also be indexed with offsets.
  
  query - query to highlight.
  
  searcher - searcher that was previously used to execute the query.
  
  topDocs - TopDocs containing the summary result documents to highlight.
  
  Returns:
  
  Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence for the field will be returned.
  
  Throws:
  
  IOException - if an I/O error occurred during processing
  
  IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
- highlight
  
  public String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages) throws IOException
  
  Highlights the top-N passages from a single field.
  
  Parameters:
  
  field - field name to highlight. Must have a stored string value and also be indexed with offsets.
  
  query - query to highlight.
  
  searcher - searcher that was previously used to execute the query.
  
  topDocs - TopDocs containing the summary result documents to highlight.
  
  maxPassages - The maximum number of top-N ranked passages used to form the highlighted snippets.
  
  Returns:
  
  Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.
  
  Throws:
  
  IOException - if an I/O error occurred during processing
  
  IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
- highlightFields
  
  public Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs) throws IOException
  Highlights the top passages from multiple fields.
  Conceptually, this behaves as a more efficient form of:
  Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, searcher, topDocs)); } return m;
  Parameters:
  
  fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
  
  query - query to highlight.
  
  searcher - searcher that was previously used to execute the query.
  
  topDocs - TopDocs containing the summary result documents to highlight.
  
  Returns:
  
  Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence from the field will be returned.
  
  Throws:
  
  IOException - if an I/O error occurred during processing
  
  IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
- highlightFields
  
  public Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages) throws IOException
  Highlights the top-N passages from multiple fields.
  Conceptually, this behaves as a more efficient form of:
  Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, searcher, topDocs, maxPassages)); } return m;
  Parameters:
  
  fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
  
  query - query to highlight.
  
  searcher - searcher that was previously used to execute the query.
  
  topDocs - TopDocs containing the summary result documents to highlight.
  
  maxPassages - The maximum number of top-N ranked passages per-field used to form the highlighted snippets.
  
  Returns:
  
  Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.
  
  Throws:
  
  IOException - if an I/O error occurred during processing
  
  IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
- highlightFields
  
  public Map<String,String[]> highlightFields(String[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn) throws IOException
  
  Highlights the top-N passages from multiple fields, for the provided int[] docids.
  
  Parameters:
  
  fieldsIn - field names to highlight. Must have a stored string value and also be indexed with offsets.
  
  query - query to highlight.
  
  searcher - searcher that was previously used to execute the query.
  
  docidsIn - containing the document IDs to highlight.
  
  maxPassagesIn - The maximum number of top-N ranked passages per-field used to form the highlighted snippets.
  
  Returns:
  
  Map keyed on field name, containing the array of formatted snippets corresponding to the documents in docidsIn. If no highlights were found for a document, the first maxPassages from the field will be returned.
  
  Throws:
  
  IOException - if an I/O error occurred during processing
  
  IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

Class PostingsHighlighter

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

DEFAULT_MAX_LENGTH

Constructor Details

PostingsHighlighter

PostingsHighlighter

Method Details

highlight

highlight

highlightFields

highlightFields

highlightFields