|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.ibm.icu.text.Collator com.ibm.icu.text.RuleBasedCollator
public final class RuleBasedCollator
RuleBasedCollator is a concrete subclass of Collator. It allows customization of the Collator via user-specified rule sets. RuleBasedCollator is designed to be fully compliant to the Unicode Collation Algorithm (UCA) and conforms to ISO 14651.
Users are strongly encouraged to read the users guide for more information about the collation service before using this class.
Create a RuleBasedCollator from a locale by calling the getInstance(Locale) factory method in the base class Collator. Collator.getInstance(Locale) creates a RuleBasedCollator object based on the collation rules defined by the argument locale. If a customized collation ordering ar attributes is required, use the RuleBasedCollator(String) constructor with the appropriate rules. The customized RuleBasedCollator will base its ordering on UCA, while re-adjusting the attributes and orders of the characters in the specified rule accordingly.
RuleBasedCollator provides correct collation orders for most locales supported in ICU. If specific data for a locale is not available, the orders eventually falls back to the UCA collation order .
For information about the collation rule syntax and details about customization, please refer to the Collation customization section of the user's guide.
Note that there are some differences between the Collation rule syntax used in Java and ICU4J:
Modifier '!' : Turns on Thai/Lao vowel-consonant swapping. If this rule is in force when a Thai vowel of the range \U0E40-\U0E44 precedes a Thai consonant of the range \U0E01-\U0E2E OR a Lao vowel of the range \U0EC0-\U0EC4 precedes a Lao consonant of the range \U0E81-\U0EAE then the vowel is placed after the consonant for collation purposes.
If a rule is without the modifier '!', the Thai/Lao vowel-consonant swapping is not turned on.
ICU4J's RuleBasedCollator does not support turning off the Thai/Lao vowel-consonant swapping, since the UCA clearly states that it has to be supported to ensure a correct sorting order. If a '!' is encountered, it is ignored.
Examples
Creating Customized RuleBasedCollators:
Concatenating rules to combineString simple = "& a < b < c < d"; RuleBasedCollator simpleCollator = new RuleBasedCollator(simple); String norwegian = "& a , A < b , B < c , C < d , D < e , E " + "< f , F < g , G < h , H < i , I < j , " + "J < k , K < l , L < m , M < n , N < " + "o , O < p , P < q , Q < r , R < s , S < " + "t , T < u , U < v , V < w , W < x , X " + "< y , Y < z , Z < \u00E5 = a\u030A " + ", \u00C5 = A\u030A ; aa , AA < \u00E6 " + ", \u00C6 < \u00F8 , \u00D8"; RuleBasedCollator norwegianCollator = new RuleBasedCollator(norwegian);
Collator
s: Making changes to an existing RuleBasedCollator to create a new// Create an en_US Collator object RuleBasedCollator en_USCollator = (RuleBasedCollator) Collator.getInstance(new Locale("en", "US", "")); // Create a da_DK Collator object RuleBasedCollator da_DKCollator = (RuleBasedCollator) Collator.getInstance(new Locale("da", "DK", "")); // Combine the two // First, get the collation rules from en_USCollator String en_USRules = en_USCollator.getRules(); // Second, get the collation rules from da_DKCollator String da_DKRules = da_DKCollator.getRules(); RuleBasedCollator newCollator = new RuleBasedCollator(en_USRules + da_DKRules); // newCollator has the combined rules
Collator
object, by appending changes to
the existing rule: How to change the order of non-spacing accents:// Create a new Collator object with additional rules String addRules = "& C < ch, cH, Ch, CH"; RuleBasedCollator myCollator = new RuleBasedCollator(en_USCollator.getRules() + addRules); // myCollator contains the new rules
Putting in a new primary ordering before the default setting, e.g. sort English characters before or after Japanese characters in the Japanese// old rule with main accents String oldRules = "= \u0301 ; \u0300 ; \u0302 ; \u0308 " + "; \u0327 ; \u0303 ; \u0304 ; \u0305 " + "; \u0306 ; \u0307 ; \u0309 ; \u030A " + "; \u030B ; \u030C ; \u030D ; \u030E " + "; \u030F ; \u0310 ; \u0311 ; \u0312 " + "< a , A ; ae, AE ; \u00e6 , \u00c6 " + "< b , B < c, C < e, E & C < d , D"; // change the order of accent characters String addOn = "& \u0300 ; \u0308 ; \u0302"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
Collator
: // get en_US Collator rules RuleBasedCollator en_USCollator = (RuleBasedCollator)Collator.getInstance(Locale.US); // add a few Japanese characters to sort before English characters // suppose the last character before the first base letter 'a' in // the English collation rule is \u2212 String jaString = "& \u2212 < \u3041, \u3042 < \u3043, " + "\u3044"; RuleBasedCollator myJapaneseCollator = new RuleBasedCollator(en_USCollator.getRules() + jaString);
This class is not subclassable
Nested Class Summary |
---|
Nested classes/interfaces inherited from class com.ibm.icu.text.Collator |
---|
Collator.CollatorFactory, Collator.ReorderCodes |
Field Summary |
---|
Fields inherited from class com.ibm.icu.text.Collator |
---|
CANONICAL_DECOMPOSITION, FULL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, QUATERNARY, SECONDARY, TERTIARY |
Constructor Summary | |
---|---|
RuleBasedCollator(String rules)
Constructor that takes the argument rules for customization. |
Method Summary | |
---|---|
Object |
clone()
Clones the RuleBasedCollator |
int |
compare(String source,
String target)
Compares the source text String to the target text String according to the collation rules, strength and decomposition mode for this RuleBasedCollator. |
boolean |
equals(Object obj)
Compares the equality of two RuleBasedCollator objects. |
CollationElementIterator |
getCollationElementIterator(CharacterIterator source)
Return a CollationElementIterator for the given CharacterIterator. |
CollationElementIterator |
getCollationElementIterator(String source)
Return a CollationElementIterator for the given String. |
CollationElementIterator |
getCollationElementIterator(UCharacterIterator source)
Return a CollationElementIterator for the given UCharacterIterator. |
CollationKey |
getCollationKey(String source)
Get a Collation key for the argument String source from this RuleBasedCollator. |
void |
getContractionsAndExpansions(UnicodeSet contractions,
UnicodeSet expansions,
boolean addPrefixes)
Gets unicode sets containing contractions and/or expansions of a collator |
boolean |
getNumericCollation()
Method to retrieve the numeric collation value. |
RawCollationKey |
getRawCollationKey(String source,
RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key. |
int[] |
getReorderCodes()
Deprecated. This API is ICU internal only. |
static int[] |
getReorderingCodesGroup(int reorderCode)
Deprecated. This API is ICU internal only. |
String |
getRules()
Gets the collation rules for this RuleBasedCollator. |
String |
getRules(boolean fullrules)
Returns current rules. |
UnicodeSet |
getTailoredSet()
Get an UnicodeSet that contains all the characters and sequences tailored in this collator. |
VersionInfo |
getUCAVersion()
Get the UCA version of this collator object. |
int |
getVariableTop()
Gets the variable top value of a Collator. |
VersionInfo |
getVersion()
Get the version of this collator object. |
int |
hashCode()
Generates a unique hash code for this RuleBasedCollator. |
boolean |
isAlternateHandlingShifted()
Checks if the alternate handling behaviour is the UCA defined SHIFTED or NON_IGNORABLE. |
boolean |
isCaseLevel()
Checks if case level is set to true. |
boolean |
isFrenchCollation()
Checks if French Collation is set to true. |
boolean |
isHiraganaQuaternary()
Checks if the Hiragana Quaternary mode is set on. |
boolean |
isLowerCaseFirst()
Return true if a lowercase character is sorted before the corresponding uppercase character. |
boolean |
isUpperCaseFirst()
Return true if an uppercase character is sorted before the corresponding lowercase character. |
void |
setAlternateHandlingDefault()
Sets the alternate handling mode to the initial mode set during construction of the RuleBasedCollator. |
void |
setAlternateHandlingShifted(boolean shifted)
Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable. |
void |
setCaseFirstDefault()
Sets the case first mode to the initial mode set during construction of the RuleBasedCollator. |
void |
setCaseLevel(boolean flag)
When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known as the case level. |
void |
setCaseLevelDefault()
Sets the case level mode to the initial mode set during construction of the RuleBasedCollator. |
void |
setDecompositionDefault()
Sets the decomposition mode to the initial mode set during construction of the RuleBasedCollator. |
void |
setFrenchCollation(boolean flag)
Sets the mode for the direction of SECONDARY weights to be used in French collation. |
void |
setFrenchCollationDefault()
Sets the French collation mode to the initial mode set during construction of the RuleBasedCollator. |
void |
setHiraganaQuaternary(boolean flag)
Sets the Hiragana Quaternary mode to be on or off. |
void |
setHiraganaQuaternaryDefault()
Sets the Hiragana Quaternary mode to the initial mode set during construction of the RuleBasedCollator. |
void |
setLowerCaseFirst(boolean lowerfirst)
Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY. |
void |
setNumericCollation(boolean flag)
When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits. |
void |
setNumericCollationDefault()
Method to set numeric collation to its default value. |
void |
setReorderCodes(int... order)
Deprecated. This API is ICU internal only. |
void |
setStrength(int newStrength)
Sets this Collator's strength property. |
void |
setStrengthDefault()
Sets the collation strength to the initial mode set during the construction of the RuleBasedCollator. |
void |
setUpperCaseFirst(boolean upperfirst)
Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY. |
void |
setVariableTop(int varTop)
Sets the variable top to a collation element value supplied. |
int |
setVariableTop(String varTop)
Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED. |
Methods inherited from class com.ibm.icu.text.Collator |
---|
compare, equals, getAvailableLocales, getAvailableULocales, getDecomposition, getDisplayName, getDisplayName, getDisplayName, getDisplayName, getFunctionalEquivalent, getFunctionalEquivalent, getInstance, getInstance, getInstance, getKeywords, getKeywordValues, getKeywordValuesForLocale, getLocale, getStrength, registerFactory, registerInstance, setDecomposition, setStrength2, unregister |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RuleBasedCollator(String rules) throws Exception
Constructor that takes the argument rules for customization. The collator will be based on UCA, with the attributes and re-ordering of the characters specified in the argument rules.
See the user guide's section on Collation Customization for details on the rule syntax.
rules
- the collation rules to build the collation table from.
ParseException
- and IOException thrown. ParseException thrown when argument rules have an invalid syntax.
IOException thrown when an error occured while reading internal data.
Exception
Method Detail |
---|
public Object clone() throws CloneNotSupportedException
clone
in class Collator
CloneNotSupportedException
public CollationElementIterator getCollationElementIterator(String source)
CollationElementIterator
public CollationElementIterator getCollationElementIterator(CharacterIterator source)
CollationElementIterator
public CollationElementIterator getCollationElementIterator(UCharacterIterator source)
CollationElementIterator
public void setHiraganaQuaternary(boolean flag)
flag
- true if Hiragana Quaternary mode is to be on, false otherwisesetHiraganaQuaternaryDefault()
,
isHiraganaQuaternary()
public void setHiraganaQuaternaryDefault()
setHiraganaQuaternary(boolean)
,
isHiraganaQuaternary()
public void setUpperCaseFirst(boolean upperfirst)
upperfirst
- true to sort uppercase characters before lowercase characters, false to sort lowercase characters
before uppercase charactersisLowerCaseFirst()
,
isUpperCaseFirst()
,
setLowerCaseFirst(boolean)
,
setCaseFirstDefault()
public void setLowerCaseFirst(boolean lowerfirst)
lowerfirst
- true for sorting lower cased characters before upper cased characters, false to ignore case
preferences.isLowerCaseFirst()
,
isUpperCaseFirst()
,
setUpperCaseFirst(boolean)
,
setCaseFirstDefault()
public final void setCaseFirstDefault()
isLowerCaseFirst()
,
isUpperCaseFirst()
,
setLowerCaseFirst(boolean)
,
setUpperCaseFirst(boolean)
public void setAlternateHandlingDefault()
setAlternateHandlingShifted(boolean)
,
isAlternateHandlingShifted()
public void setCaseLevelDefault()
setCaseLevel(boolean)
,
isCaseLevel()
public void setDecompositionDefault()
Collator.getDecomposition()
,
Collator.setDecomposition(int)
public void setFrenchCollationDefault()
isFrenchCollation()
,
setFrenchCollation(boolean)
public void setStrengthDefault()
setStrength(int)
,
Collator.getStrength()
public void setNumericCollationDefault()
getNumericCollation()
,
setNumericCollation(boolean)
public void setFrenchCollation(boolean flag)
flag
- true to set the French collation on, false to set it offisFrenchCollation()
,
setFrenchCollationDefault()
public void setAlternateHandlingShifted(boolean shifted)
shifted
- true if SHIFTED behaviour for alternate handling is desired, false for the NON_IGNORABLE behaviour.isAlternateHandlingShifted()
,
setAlternateHandlingDefault()
public void setCaseLevel(boolean flag)
When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known as the case level. The case level is used to distinguish large and small Japanese Kana characters. Case level could also be used in other situations. For example to distinguish certain Pinyin characters. The default value is false, which means the case level is not generated. The contents of the case level are affected by the case first mode. A simple way to ignore accent differences in a string is to set the strength to PRIMARY and enable case level.
See the section on case level for more information.
flag
- true if case level sorting is required, false otherwisesetCaseLevelDefault()
,
isCaseLevel()
public void setStrength(int newStrength)
Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.
See the Collator class description for an example of use.
setStrength
in class Collator
newStrength
- the new strength value.
IllegalArgumentException
- If the new strength value is not one of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.Collator.getStrength()
,
setStrengthDefault()
,
Collator.PRIMARY
,
Collator.SECONDARY
,
Collator.TERTIARY
,
Collator.QUATERNARY
,
Collator.IDENTICAL
public int setVariableTop(String varTop)
Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.
Sets the variable top to a collation element value of a string supplied.
setVariableTop
in class Collator
varTop
- one or more (if contraction) characters to which the variable top should be set
IllegalArgumentException
- is thrown if varTop argument is not a valid variable top element. A variable top element is
invalid when
getVariableTop()
,
setAlternateHandlingShifted(boolean)
public void setVariableTop(int varTop)
setVariableTop
in class Collator
varTop
- Collation element value, as returned by setVariableTop or getVariableTopgetVariableTop()
,
setVariableTop(String)
public void setNumericCollation(boolean flag)
flag
- true to turn numeric collation on and false to turn it offgetNumericCollation()
,
setNumericCollationDefault()
public void setReorderCodes(int... order)
setReorderCodes
in class Collator
order
- the reordering codes to apply to this collator, if null then clears the reorderinggetReorderCodes()
public String getRules()
getRules(boolean)
public String getRules(boolean fullrules)
fullrules
- true if the rules that defines the full set of collation order is required, otherwise false for
returning only the tailored rules
getRules()
public UnicodeSet getTailoredSet()
getTailoredSet
in class Collator
public void getContractionsAndExpansions(UnicodeSet contractions, UnicodeSet expansions, boolean addPrefixes) throws Exception
contractions
- if not null, set to contain contractionsexpansions
- if not null, set to contain expansionsaddPrefixes
- add the prefix contextual elements to contractions
Exception
- Throws an exception if any errors occurs.public CollationKey getCollationKey(String source)
Get a Collation key for the argument String source from this RuleBasedCollator.
General recommendation:
If comparison are to be done to the same String multiple times, it would be more efficient to generate
CollationKeys for the Strings and use CollationKey.compareTo(CollationKey) for the comparisons. If the each
Strings are compared to only once, using the method RuleBasedCollator.compare(String, String) will have a better
performance.
See the class documentation for an explanation about CollationKeys.
getCollationKey
in class Collator
source
- the text String to be transformed into a collation key.
CollationKey
,
compare(String, String)
,
getRawCollationKey(java.lang.String, com.ibm.icu.text.RawCollationKey)
public RawCollationKey getRawCollationKey(String source, RawCollationKey key)
getRawCollationKey
in class Collator
source
- the text String to be transformed into a RawCollationKeykey
- output RawCollationKey to store results
getCollationKey(java.lang.String)
,
compare(String, String)
,
RawCollationKey
public boolean isUpperCaseFirst()
setUpperCaseFirst(boolean)
,
setLowerCaseFirst(boolean)
,
isLowerCaseFirst()
,
setCaseFirstDefault()
public boolean isLowerCaseFirst()
setUpperCaseFirst(boolean)
,
setLowerCaseFirst(boolean)
,
isUpperCaseFirst()
,
setCaseFirstDefault()
public boolean isAlternateHandlingShifted()
setAlternateHandlingShifted(boolean)
,
setAlternateHandlingDefault()
public boolean isCaseLevel()
setCaseLevelDefault()
,
isCaseLevel()
,
setCaseLevel(boolean)
public boolean isFrenchCollation()
setFrenchCollation(boolean)
,
setFrenchCollationDefault()
public boolean isHiraganaQuaternary()
setHiraganaQuaternaryDefault()
,
setHiraganaQuaternary(boolean)
public int getVariableTop()
getVariableTop
in class Collator
setVariableTop(java.lang.String)
public boolean getNumericCollation()
setNumericCollation(boolean)
,
setNumericCollationDefault()
public int[] getReorderCodes()
getReorderCodes
in class Collator
setReorderCodes(int...)
public static int[] getReorderingCodesGroup(int reorderCode)
reorderCode
- code for which equivalents to be retrieved
setReorderCodes(int...)
,
getReorderCodes()
public boolean equals(Object obj)
equals
in interface Comparator<Object>
equals
in class Object
obj
- the RuleBasedCollator to be compared to.
public int hashCode()
hashCode
in class Object
public int compare(String source, String target)
General recommendation:
If comparison are to be done to the same String multiple times, it would be more efficient to generate
CollationKeys for the Strings and use CollationKey.compareTo(CollationKey) for the comparisons. If speed
performance is critical and object instantiation is to be reduced, further optimization may be achieved by
generating a simpler key of the form RawCollationKey and reusing this RawCollationKey object with the method
RuleBasedCollator.getRawCollationKey. Internal byte representation can be directly accessed via RawCollationKey
and stored for future use. Like CollationKey, RawCollationKey provides a method RawCollationKey.compareTo for key
comparisons. If the each Strings are compared to only once, using the method RuleBasedCollator.compare(String,
String) will have a better performance.
compare
in class Collator
source
- the source text String.target
- the target text String.
CollationKey
,
getCollationKey(java.lang.String)
public VersionInfo getVersion()
getVersion
in class Collator
public VersionInfo getUCAVersion()
getUCAVersion
in class Collator
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |