com.ibm.icu.text
Class SearchIterator

java.lang.Object
  extended by com.ibm.icu.text.SearchIterator
Direct Known Subclasses:
StringSearch

public abstract class SearchIterator
extends Object

SearchIterator is an abstract base class that defines a protocol for text searching. Subclasses provide concrete implementations of various search algorithms. A concrete subclass, StringSearch, is provided that implements language-sensitive pattern matching based on the comparison rules defined in a RuleBasedCollator object. Instances of SearchIterator maintain a current position and scan over the target text, returning the indices where a match is found and the length of each match. Generally, the sequence of forward matches will be equivalent to the sequence of backward matches.One case where this statement may not hold is when non-overlapping mode is set on and there are continuous repetitive patterns in the text. Consider the case searching for pattern "aba" in the text "ababababa", setting overlapping mode off will produce forward matches at offsets 0, 4. However when a backwards search is done, the results will be at offsets 6 and 2.

If matches searched for have boundary restrictions. BreakIterators can be used to define the valid boundaries of such a match. Once a BreakIterator is set, potential matches will be tested against the BreakIterator to determine if the boundaries are valid and that all characters in the potential match are equivalent to the pattern searched for. For example, looking for the pattern "fox" in the text "foxy fox" will produce match results at offset 0 and 5 with length 3 if no BreakIterators were set. However if a WordBreakIterator is set, the only match that would be found will be at the offset 5. Since, the SearchIterator guarantees that if a BreakIterator is set, all its matches will match the given pattern exactly, a potential match that passes the BreakIterator might still not produce a valid match. For instance the pattern "e" will not be found in the string "\u00e9" (latin small letter e with acute) if a CharacterBreakIterator is used. Even though "e" is a part of the character "\u00e9" and the potential match at offset 0 length 1 passes the CharacterBreakIterator test, "\u00e9" is not equivalent to "e", hence the SearchIterator rejects the potential match. By default, the SearchIterator does not impose any boundary restriction on the matches, it will return all results that match the pattern. Illustrating with the above example, "e" will be found in the string "\u00e9" if no BreakIterator is specified.

SearchIterator also provides a means to handle overlapping matches via the API setOverlapping(boolean). For example, if overlapping mode is set, searching for the pattern "abab" in the text "ababab" will match at positions 0 and 2, whereas if overlapping is not set, SearchIterator will only match at position 0. By default, overlapping mode is not set.

The APIs in SearchIterator are similar to that of other text iteration classes such as BreakIterator. Using this class, it is easy to scan through text looking for all occurances of a match.

Example of use:

 String target = "The quick brown fox jumped over the lazy fox";
 String pattern = "fox";
 SearchIterator iter = new StringSearch(pattern, target);
 for (int pos = iter.first(); pos != SearchIterator.DONE; 
                                                       pos = iter.next()) {
     // println matches at offset 16 and 41 with length 3
     System.out.println("Found match at " + pos + ", length is " 
                        + iter.getMatchLength());
 }
 target = "ababababa";
 pattern = "aba";
 iter.setTarget(new StringCharacterIterator(pattern));
 iter.setOverlapping(false);
 System.out.println("Overlapping mode set to false");
 System.out.println("Forward matches of pattern " + pattern + " in text "
                    + text + ": ");
 for (int pos = iter.first(); pos != SearchIterator.DONE; 
                                                       pos = iter.next()) {
     // println matches at offset 0 and 4 with length 3
     System.out.println("offset " + pos + ", length " 
                        + iter.getMatchLength());
 }
 System.out.println("Backward matches of pattern " + pattern + " in text "
                    + text + ": ");
 for (int pos = iter.last(); pos != SearchIterator.DONE; 
                                                    pos = iter.previous()) {
     // println matches at offset 6 and 2 with length 3
     System.out.println("offset " + pos + ", length " 
                        + iter.getMatchLength());
 }
 System.out.println("Overlapping mode set to true");
 System.out.println("Index set to 2");
 iter.setIndex(2);
 iter.setOverlapping(true);
 System.out.println("Forward matches of pattern " + pattern + " in text "
                    + text + ": ");
 for (int pos = iter.first(); pos != SearchIterator.DONE; 
                                                       pos = iter.next()) {
     // println matches at offset 2, 4 and 6 with length 3
     System.out.println("offset " + pos + ", length " 
                        + iter.getMatchLength());
 }
 System.out.println("Index set to 2");
 iter.setIndex(2);
 System.out.println("Backward matches of pattern " + pattern + " in text "
                    + text + ": ");
 for (int pos = iter.last(); pos != SearchIterator.DONE; 
                                                    pos = iter.previous()) {
     // println matches at offset 0 with length 3
     System.out.println("offset " + pos + ", length " 
                        + iter.getMatchLength());
 }
 

Author:
Laura Werner, synwee
See Also:
BreakIterator
Status:
Stable ICU 2.0.

Field Summary
protected  BreakIterator breakIterator
          The BreakIterator to define the boundaries of a logical match.
static int DONE
          DONE is returned by previous() and next() after all valid matches have been returned, and by first() and last() if there are no matches at all.
protected  int matchLength
          Length of the most current match in target text.
protected  CharacterIterator targetText
          Target text for searching.
 
Constructor Summary
protected SearchIterator(CharacterIterator target, BreakIterator breaker)
          Protected constructor for use by subclasses.
 
Method Summary
 int first()
          Return the index of the first forward match in the target text.
 int following(int position)
          Return the index of the first forward match in target text that is at or after argument position.
 BreakIterator getBreakIterator()
          Returns the BreakIterator that is used to restrict the indexes at which matches are detected.
abstract  int getIndex()
          Return the index in the target text at which the iterator is currently positioned.
 String getMatchedText()
          Returns the text that was matched by the most recent call to first(), next(), previous(), or last().
 int getMatchLength()
           Returns the length of the most recent match in the target text.
 int getMatchStart()
           Returns the index of the most recent match in the target text.
 CharacterIterator getTarget()
          Return the target text that is being searched.
protected abstract  int handleNext(int start)
           Abstract method that subclasses override to provide the mechanism for finding the next forwards match in the target text.
protected abstract  int handlePrevious(int startAt)
           Abstract method which subclasses override to provide the mechanism for finding the next backwards match in the target text.
 boolean isOverlapping()
          Return true if the overlapping property has been set.
 int last()
          Return the index of the first backward match in target text.
 int next()
          Search forwards in the target text for the next valid match, starting the search from the current iterator position.
 int preceding(int position)
          Return the index of the first backwards match in target text that ends at or before argument position.
 int previous()
          Search backwards in the target text for the next valid match, starting the search from the current iterator position.
 void reset()
           Resets the search iteration.
 void setBreakIterator(BreakIterator breakiter)
          Set the BreakIterator that is used to restrict the points at which matches are detected.
 void setIndex(int position)
           Sets the position in the target text at which the next search will start.
protected  void setMatchLength(int length)
          Sets the length of the most recent match in the target text.
 void setOverlapping(boolean allowOverlap)
           Determines whether overlapping matches are returned.
 void setTarget(CharacterIterator text)
          Set the target text to be searched.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DONE

public static final int DONE
DONE is returned by previous() and next() after all valid matches have been returned, and by first() and last() if there are no matches at all.

See Also:
previous(), next(), Constant Field Values
Status:
Stable ICU 2.0.

breakIterator

protected BreakIterator breakIterator
The BreakIterator to define the boundaries of a logical match. This value can be a null. See class documentation for more information.

See Also:
setBreakIterator(BreakIterator), getBreakIterator(), BreakIterator
Status:
Stable ICU 2.0.

targetText

protected CharacterIterator targetText
Target text for searching.

See Also:
setTarget(CharacterIterator), getTarget()
Status:
Stable ICU 2.0.

matchLength

protected int matchLength
Length of the most current match in target text. Value 0 is the default value.

See Also:
setMatchLength(int), getMatchLength()
Status:
Stable ICU 2.0.
Constructor Detail

SearchIterator

protected SearchIterator(CharacterIterator target,
                         BreakIterator breaker)
Protected constructor for use by subclasses. Initializes the iterator with the argument target text for searching and sets the BreakIterator. See class documentation for more details on the use of the target text and BreakIterator.

Parameters:
target - The target text to be searched.
breaker - A BreakIterator that is used to determine the boundaries of a logical match. This argument can be null.
Throws:
IllegalArgumentException - thrown when argument target is null, or of length 0
See Also:
BreakIterator
Status:
Stable ICU 2.0.
Method Detail

setIndex

public void setIndex(int position)

Sets the position in the target text at which the next search will start. This method clears any previous match.

Parameters:
position - position from which to start the next search
Throws:
IndexOutOfBoundsException - thrown if argument position is out of the target text range.
See Also:
getIndex()
Status:
Stable ICU 2.8.

setOverlapping

public void setOverlapping(boolean allowOverlap)

Determines whether overlapping matches are returned. See the class documentation for more information about overlapping matches.

The default setting of this property is false

Parameters:
allowOverlap - flag indicator if overlapping matches are allowed
See Also:
isOverlapping()
Status:
Stable ICU 2.8.

setBreakIterator

public void setBreakIterator(BreakIterator breakiter)
Set the BreakIterator that is used to restrict the points at which matches are detected. Using null as the parameter is legal; it means that break detection should not be attempted. See class documentation for more information.

Parameters:
breakiter - A BreakIterator that will be used to restrict the points at which matches are detected.
See Also:
getBreakIterator(), BreakIterator
Status:
Stable ICU 2.0.

setTarget

public void setTarget(CharacterIterator text)
Set the target text to be searched. Text iteration will then begin at the start of the text string. This method is useful if you want to reuse an iterator to search within a different body of text.

Parameters:
text - new text iterator to look for match,
Throws:
IllegalArgumentException - thrown when text is null or has 0 length
See Also:
getTarget()
Status:
Stable ICU 2.4.

getMatchStart

public int getMatchStart()

Returns the index of the most recent match in the target text. This call returns a valid result only after a successful call to first(), next(), previous(), or last(). Just after construction, or after a searching method returns DONE, this method will return DONE.

Use getMatchLength to get the length of the matched text. getMatchedText will return the subtext in the searched target text from index getMatchStart() with length getMatchLength().

Returns:
index to a substring within the text string that is being searched.
See Also:
getMatchLength(), getMatchedText(), first(), next(), previous(), last(), DONE
Status:
Stable ICU 2.8.

getIndex

public abstract int getIndex()
Return the index in the target text at which the iterator is currently positioned. If the iteration has gone past the end of the target text, or past the beginning for a backwards search, DONE is returned.

Returns:
index in the target text at which the iterator is currently positioned.
See Also:
first(), next(), previous(), last(), DONE
Status:
Stable ICU 2.8.

getMatchLength

public int getMatchLength()

Returns the length of the most recent match in the target text. This call returns a valid result only after a successful call to first(), next(), previous(), or last(). Just after construction, or after a searching method returns DONE, this method will return 0. See getMatchStart() for more details.

Returns:
The length of the most recent match in the target text, or 0 if there is no match.
See Also:
getMatchStart(), getMatchedText(), first(), next(), previous(), last(), DONE
Status:
Stable ICU 2.0.

getBreakIterator

public BreakIterator getBreakIterator()
Returns the BreakIterator that is used to restrict the indexes at which matches are detected. This will be the same object that was passed to the constructor or to setBreakIterator. If the BreakIterator has not been set, null will be returned. See setBreakIterator for more information.

Returns:
the BreakIterator set to restrict logic matches
See Also:
setBreakIterator(com.ibm.icu.text.BreakIterator), BreakIterator
Status:
Stable ICU 2.0.

getTarget

public CharacterIterator getTarget()
Return the target text that is being searched.

Returns:
target text being searched.
See Also:
setTarget(java.text.CharacterIterator)
Status:
Stable ICU 2.0.

getMatchedText

public String getMatchedText()
Returns the text that was matched by the most recent call to first(), next(), previous(), or last(). If the iterator is not pointing at a valid match, for instance just after construction or after DONE has been returned, an empty String will be returned. See getMatchStart for more information

Returns:
the substring in the target text of the most recent match
See Also:
getMatchStart(), getMatchLength(), first(), next(), previous(), last(), DONE
Status:
Stable ICU 2.0.

next

public int next()
Search forwards in the target text for the next valid match, starting the search from the current iterator position. The iterator is adjusted so that its current index, as returned by getIndex(), is the starting position of the match if one was found. If a match is found, the index of the match is returned, otherwise DONE is returned. If overlapping mode is set, the beginning of the found match can be before the end of the current match, if any.

Returns:
The starting index of the next forward match after the current iterator position, or DONE if there are no more matches.
See Also:
getMatchStart(), getMatchLength(), getMatchedText(), following(int), preceding(int), previous(), first(), last(), DONE
Status:
Stable ICU 2.0.

previous

public int previous()
Search backwards in the target text for the next valid match, starting the search from the current iterator position. The iterator is adjusted so that its current index, as returned by getIndex(), is the starting position of the match if one was found. If a match is found, the index is returned, otherwise DONE is returned. If overlapping mode is set, the end of the found match can be after the beginning of the previous match, if any.

Returns:
The starting index of the next backwards match after the current iterator position, or DONE if there are no more matches.
See Also:
getMatchStart(), getMatchLength(), getMatchedText(), following(int), preceding(int), next(), first(), last(), DONE
Status:
Stable ICU 2.0.

isOverlapping

public boolean isOverlapping()
Return true if the overlapping property has been set. See setOverlapping(boolean) for more information.

Returns:
true if the overlapping property has been set, false otherwise
See Also:
setOverlapping(boolean)
Status:
Stable ICU 2.8.

reset

public void reset()

Resets the search iteration. All properties will be reset to their default values.

If a forward iteration is initiated, the next search will begin at the start of the target text. Otherwise, if a backwards iteration is initiated, the next search will begin at the end of the target text.

Status:
Stable ICU 2.8.

first

public final int first()
Return the index of the first forward match in the target text. This method sets the iteration to begin at the start of the target text and searches forward from there.

Returns:
The index of the first forward match, or DONE if there are no matches.
See Also:
getMatchStart(), getMatchLength(), getMatchedText(), following(int), preceding(int), next(), previous(), last(), DONE
Status:
Stable ICU 2.0.

following

public final int following(int position)
Return the index of the first forward match in target text that is at or after argument position. This method sets the iteration to begin at the specified position in the the target text and searches forward from there.

Returns:
The index of the first forward match, or DONE if there are no matches.
See Also:
getMatchStart(), getMatchLength(), getMatchedText(), first(), preceding(int), next(), previous(), last(), DONE
Status:
Stable ICU 2.0.

last

public final int last()
Return the index of the first backward match in target text. This method sets the iteration to begin at the end of the target text and searches backwards from there.

Returns:
The starting index of the first backward match, or DONE if there are no matches.
See Also:
getMatchStart(), getMatchLength(), getMatchedText(), first(), preceding(int), next(), previous(), following(int), DONE
Status:
Stable ICU 2.0.

preceding

public final int preceding(int position)
Return the index of the first backwards match in target text that ends at or before argument position. This method sets the iteration to begin at the argument position index of the target text and searches backwards from there.

Returns:
The starting index of the first backwards match, or DONE if there are no matches.
See Also:
getMatchStart(), getMatchLength(), getMatchedText(), first(), following(int), next(), previous(), last(), DONE
Status:
Stable ICU 2.0.

setMatchLength

protected void setMatchLength(int length)
Sets the length of the most recent match in the target text. Subclasses' handleNext() and handlePrevious() methods should call this after they find a match in the target text.

Parameters:
length - new length to set
See Also:
handleNext(int), handlePrevious(int)
Status:
Stable ICU 2.0.

handleNext

protected abstract int handleNext(int start)

Abstract method that subclasses override to provide the mechanism for finding the next forwards match in the target text. This allows different subclasses to provide different search algorithms.

If a match is found, this function must call setMatchLength(int) to set the length of the result match. The iterator is adjusted so that its current index, as returned by getIndex(), is the starting position of the match if one was found. If a match is not found, DONE will be returned.

Parameters:
start - index in the target text at which the forwards search should begin.
Returns:
the starting index of the next forwards match if found, DONE otherwise
See Also:
setMatchLength(int), handlePrevious(int), DONE
Status:
Stable ICU 2.0.

handlePrevious

protected abstract int handlePrevious(int startAt)

Abstract method which subclasses override to provide the mechanism for finding the next backwards match in the target text. This allows different subclasses to provide different search algorithms.

If a match is found, this function must call setMatchLength(int) to set the length of the result match. The iterator is adjusted so that its current index, as returned by getIndex(), is the starting position of the match if one was found. If a match is not found, DONE will be returned.

Parameters:
startAt - index in the target text at which the backwards search should begin.
Returns:
the starting index of the next backwards match if found, DONE otherwise
See Also:
setMatchLength(int), handleNext(int), DONE
Status:
Stable ICU 2.0.


Copyright (c) 2010 IBM Corporation and others.