Class BoundedBreakIteratorScanner
java.lang.Object
java.text.BreakIterator
org.apache.lucene.search.uhighlight.BoundedBreakIteratorScanner
- All Implemented Interfaces:
java.lang.Cloneable
public class BoundedBreakIteratorScanner
extends java.text.BreakIterator
A custom break iterator that is used to find break-delimited passages bounded by
a provided maximum length in the
UnifiedHighlighter
context.
This class uses a BreakIterator
to find the last break after the provided offset
that would create a passage smaller than maxLen
.
If the BreakIterator
cannot find a passage smaller than the maximum length,
a secondary break iterator is used to re-split the passage at the first boundary after
maximum length.
This is useful to split passages created by BreakIterator
s like `sentence` that
can create big outliers on semi-structured text.
WARNING: This break iterator is designed to work with the UnifiedHighlighter
.
TODO: We should be able to create passages incrementally, starting from the offset of the first match and expanding or not
depending on the offsets of subsequent matches. This is currently impossible because FieldHighlighter
uses
only the first matching offset to derive the start and end of each passage.-
Field Summary
-
Method Summary
Modifier and Type Method Description int
current()
int
first()
int
following(int offset)
Can be invoked only after a call to preceding(offset+1).static java.text.BreakIterator
getSentence(java.util.Locale locale, int maxLen)
Returns aBreakIterator.getSentenceInstance(Locale)
bounded to maxLen.java.text.CharacterIterator
getText()
int
last()
int
next()
int
next(int n)
int
preceding(int offset)
Must be called with increasing offset.int
previous()
void
setText(java.lang.String newText)
void
setText(java.text.CharacterIterator newText)
-
Method Details
-
getText
public java.text.CharacterIterator getText()- Specified by:
getText
in classjava.text.BreakIterator
-
setText
public void setText(java.text.CharacterIterator newText)- Specified by:
setText
in classjava.text.BreakIterator
-
setText
public void setText(java.lang.String newText)- Overrides:
setText
in classjava.text.BreakIterator
-
preceding
public int preceding(int offset)Must be called with increasing offset. SeeFieldHighlighter
for usage.- Overrides:
preceding
in classjava.text.BreakIterator
-
following
public int following(int offset)Can be invoked only after a call to preceding(offset+1). SeeFieldHighlighter
for usage.- Specified by:
following
in classjava.text.BreakIterator
-
getSentence
public static java.text.BreakIterator getSentence(java.util.Locale locale, int maxLen)Returns aBreakIterator.getSentenceInstance(Locale)
bounded to maxLen. Secondary boundaries are found using aBreakIterator.getWordInstance(Locale)
. -
current
public int current()- Specified by:
current
in classjava.text.BreakIterator
-
first
public int first()- Specified by:
first
in classjava.text.BreakIterator
-
next
public int next()- Specified by:
next
in classjava.text.BreakIterator
-
last
public int last()- Specified by:
last
in classjava.text.BreakIterator
-
next
public int next(int n)- Specified by:
next
in classjava.text.BreakIterator
-
previous
public int previous()- Specified by:
previous
in classjava.text.BreakIterator
-