|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.ibm.icu.text.BreakIterator com.ibm.icu.text.RuleBasedBreakIterator
public class RuleBasedBreakIterator
Rule Based Break Iterator This is a port of the C++ class RuleBasedBreakIterator from ICU4C.
Field Summary | |
---|---|
protected static String |
fDebugEnv
Deprecated. This API is ICU internal only. |
protected int |
fDictionaryCharCount
Deprecated. This API is ICU internal only. |
protected com.ibm.icu.text.RBBIDataWrapper |
fRData
Deprecated. This API is ICU internal only. |
static boolean |
fTrace
Deprecated. This API is ICU internal only. |
static int |
WORD_IDEO
Tag value for words containing ideographic characters, lower limit |
static int |
WORD_IDEO_LIMIT
Tag value for words containing ideographic characters, upper limit |
static int |
WORD_KANA
Tag value for words containing kana characters, lower limit |
static int |
WORD_KANA_LIMIT
Tag value for words containing kana characters, upper limit |
static int |
WORD_LETTER
Tag value for words that contain letters, excluding hiragana, katakana or ideographic characters, lower limit. |
static int |
WORD_LETTER_LIMIT
Tag value for words containing letters, upper limit |
static int |
WORD_NONE
Tag value for "words" that do not fit into any of other categories. |
static int |
WORD_NONE_LIMIT
Upper bound for tags for uncategorized words. |
static int |
WORD_NUMBER
Tag value for words that appear to be numbers, lower limit. |
static int |
WORD_NUMBER_LIMIT
Tag value for words that appear to be numbers, upper limit. |
Fields inherited from class com.ibm.icu.text.BreakIterator |
---|
DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD |
Constructor Summary | |
---|---|
RuleBasedBreakIterator()
Deprecated. This API is ICU internal only. |
|
RuleBasedBreakIterator(String rules)
Construct a RuleBasedBreakIterator from a set of rules supplied as a string. |
Method Summary | |
---|---|
protected static void |
checkOffset(int offset,
CharacterIterator text)
Throw IllegalArgumentException unless begin <= offset < end. |
Object |
clone()
Clones this iterator. |
static void |
compileRules(String rules,
OutputStream ruleBinary)
Compile a set of source break rules into the binary state tables used by the break iterator engine. |
int |
current()
Returns the current iteration position. |
void |
dump()
Deprecated. This API is ICU internal only. |
boolean |
equals(Object that)
Returns true if both BreakIterators are of the same class, have the same rules, and iterate over the same text. |
int |
first()
Sets the current iteration position to the beginning of the text. |
int |
following(int offset)
Sets the iterator to refer to the first boundary position following the specified position. |
static RuleBasedBreakIterator |
getInstanceFromCompiledRules(InputStream is)
Create a break iterator from a precompiled set of break rules. |
int |
getRuleStatus()
Return the status tag from the break rule that determined the most recently returned break position. |
int |
getRuleStatusVec(int[] fillInArray)
Get the status (tag) values from the break rule(s) that determined the most recently returned break position. |
CharacterIterator |
getText()
Return a CharacterIterator over the text being analyzed. |
int |
hashCode()
Compute a hashcode for this BreakIterator |
boolean |
isBoundary(int offset)
Returns true if the specfied position is a boundary position. |
int |
last()
Sets the current iteration position to the end of the text. |
int |
next()
Advances the iterator to the next boundary position. |
int |
next(int n)
Advances the iterator either forward or backward the specified number of steps. |
int |
preceding(int offset)
Sets the iterator to refer to the last boundary position before the specified position. |
int |
previous()
Moves the iterator backwards, to the last boundary preceding this one. |
void |
setText(CharacterIterator newText)
Set the iterator to analyze a new piece of text. |
String |
toString()
Returns the description (rules) used to create this iterator. |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final int WORD_NONE
public static final int WORD_NONE_LIMIT
public static final int WORD_NUMBER
public static final int WORD_NUMBER_LIMIT
public static final int WORD_LETTER
public static final int WORD_LETTER_LIMIT
public static final int WORD_KANA
public static final int WORD_KANA_LIMIT
public static final int WORD_IDEO
public static final int WORD_IDEO_LIMIT
protected com.ibm.icu.text.RBBIDataWrapper fRData
protected int fDictionaryCharCount
public static boolean fTrace
protected static String fDebugEnv
Constructor Detail |
---|
public RuleBasedBreakIterator()
public RuleBasedBreakIterator(String rules)
rules
- The break rules to be used.Method Detail |
---|
public static RuleBasedBreakIterator getInstanceFromCompiledRules(InputStream is) throws IOException
is
- an input stream supplying the compiled binary rules.
IOException
- if there is an error while reading the rules from the InputStream.compileRules(String, OutputStream)
public Object clone()
clone
in class BreakIterator
public boolean equals(Object that)
equals
in class Object
public String toString()
toString
in class Object
public int hashCode()
hashCode
in class Object
public void dump()
public static void compileRules(String rules, OutputStream ruleBinary) throws IOException
rules
- The source form of the break rulesruleBinary
- An output stream to receive the compiled rules.
IOException
- If there is an error writing the output.getInstanceFromCompiledRules(InputStream)
public int first()
first
in class BreakIterator
public int last()
last
in class BreakIterator
public int next(int n)
next
in class BreakIterator
n
- The number of steps to move. The sign indicates the direction
(negative is backwards, and positive is forwards).
public int next()
next
in class BreakIterator
public int previous()
previous
in class BreakIterator
public int following(int offset)
following
in class BreakIterator
offset
- The position from which to begin searching for a break position.
public int preceding(int offset)
preceding
in class BreakIterator
offset
- The position to begin searching for a break from.
protected static final void checkOffset(int offset, CharacterIterator text)
public boolean isBoundary(int offset)
isBoundary
in class BreakIterator
offset
- the offset to check.
public int current()
current
in class BreakIterator
public int getRuleStatus()
Of the standard types of ICU break iterators, only the word break
iterator provides status values. The values are defined in
class RuleBasedBreakIterator, and allow distinguishing between words
that contain alphabetic letters, "words" that appear to be numbers,
punctuation and spaces, words containing ideographic characters, and
more. Call
getRuleStatus
after obtaining a boundary
position from next()
,
previous()
, or
any other break iterator functions that returns a boundary position.
public int getRuleStatusVec(int[] fillInArray)
The status values used by the standard ICU break rules are defined as public constants in class RuleBasedBreakIterator.
If the size of the output array is insufficient to hold the data, the output will be truncated to the available length. No exception will be thrown.
fillInArray
- an array to be filled in with the status values.
public CharacterIterator getText()
getText
in class BreakIterator
public void setText(CharacterIterator newText)
setText
in class BreakIterator
newText
- An iterator over the text to analyze.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |