com.ibm.icu.text
Class RuleBasedCollator

java.lang.Object
  extended by com.ibm.icu.text.Collator
      extended by com.ibm.icu.text.RuleBasedCollator
All Implemented Interfaces:
Freezable<Collator>, Cloneable, Comparator<Object>

public final class RuleBasedCollator
extends Collator

RuleBasedCollator is a concrete subclass of Collator. It allows customization of the Collator via user-specified rule sets. RuleBasedCollator is designed to be fully compliant to the Unicode Collation Algorithm (UCA) and conforms to ISO 14651.

Users are strongly encouraged to read the users guide for more information about the collation service before using this class.

Create a RuleBasedCollator from a locale by calling the getInstance(Locale) factory method in the base class Collator. Collator.getInstance(Locale) creates a RuleBasedCollator object based on the collation rules defined by the argument locale. If a customized collation ordering ar attributes is required, use the RuleBasedCollator(String) constructor with the appropriate rules. The customized RuleBasedCollator will base its ordering on UCA, while re-adjusting the attributes and orders of the characters in the specified rule accordingly.

RuleBasedCollator provides correct collation orders for most locales supported in ICU. If specific data for a locale is not available, the orders eventually falls back to the UCA collation order .

For information about the collation rule syntax and details about customization, please refer to the Collation customization section of the user's guide.

Note that there are some differences between the Collation rule syntax used in Java and ICU4J:

Examples

Creating Customized RuleBasedCollators:

 String simple = "& a < b < c < d";
 RuleBasedCollator simpleCollator = new RuleBasedCollator(simple);

 String norwegian = "& a , A < b , B < c , C < d , D < e , E "
                    + "< f , F < g , G < h , H < i , I < j , "
                    + "J < k , K < l , L < m , M < n , N < "
                    + "o , O < p , P < q , Q < r , R < s , S < "
                    + "t , T < u , U < v , V < w , W < x , X "
                    + "< y , Y < z , Z < \u00E5 = a\u030A "
                    + ", \u00C5 = A\u030A ; aa , AA < \u00E6 "
                    + ", \u00C6 < \u00F8 , \u00D8";
 RuleBasedCollator norwegianCollator = new RuleBasedCollator(norwegian);
 
Concatenating rules to combine Collators:
 // Create an en_US Collator object
 RuleBasedCollator en_USCollator = (RuleBasedCollator)
     Collator.getInstance(new Locale("en", "US", ""));
 // Create a da_DK Collator object
 RuleBasedCollator da_DKCollator = (RuleBasedCollator)
     Collator.getInstance(new Locale("da", "DK", ""));
 // Combine the two
 // First, get the collation rules from en_USCollator
 String en_USRules = en_USCollator.getRules();
 // Second, get the collation rules from da_DKCollator
 String da_DKRules = da_DKCollator.getRules();
 RuleBasedCollator newCollator =
                             new RuleBasedCollator(en_USRules + da_DKRules);
 // newCollator has the combined rules
 
Making changes to an existing RuleBasedCollator to create a new Collator object, by appending changes to the existing rule:
 // Create a new Collator object with additional rules
 String addRules = "& C < ch, cH, Ch, CH";
 RuleBasedCollator myCollator =
     new RuleBasedCollator(en_USCollator.getRules() + addRules);
 // myCollator contains the new rules
 
How to change the order of non-spacing accents:
 // old rule with main accents
 String oldRules = "= \u0301 ; \u0300 ; \u0302 ; \u0308 "
                 + "; \u0327 ; \u0303 ; \u0304 ; \u0305 "
                 + "; \u0306 ; \u0307 ; \u0309 ; \u030A "
                 + "; \u030B ; \u030C ; \u030D ; \u030E "
                 + "; \u030F ; \u0310 ; \u0311 ; \u0312 "
                 + "< a , A ; ae, AE ; \u00e6 , \u00c6 "
                 + "< b , B < c, C < e, E & C < d , D";
 // change the order of accent characters
 String addOn = "& \u0300 ; \u0308 ; \u0302";
 RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
 
Putting in a new primary ordering before the default setting, e.g. sort English characters before or after Japanese characters in the Japanese Collator:
 // get en_US Collator rules
 RuleBasedCollator en_USCollator
                        = (RuleBasedCollator)Collator.getInstance(Locale.US);
 // add a few Japanese characters to sort before English characters
 // suppose the last character before the first base letter 'a' in
 // the English collation rule is \u2212
 String jaString = "& \u2212 < \u3041, \u3042 < \u3043, "
                   + "\u3044";
 RuleBasedCollator myJapaneseCollator
              = new RuleBasedCollator(en_USCollator.getRules() + jaString);
 

This class is not subclassable

Author:
Syn Wee Quek
Status:
Stable ICU 2.8.

Nested Class Summary
 
Nested classes/interfaces inherited from class com.ibm.icu.text.Collator
Collator.CollatorFactory, Collator.ReorderCodes
 
Field Summary
 
Fields inherited from class com.ibm.icu.text.Collator
CANONICAL_DECOMPOSITION, FULL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, QUATERNARY, SECONDARY, TERTIARY
 
Constructor Summary
RuleBasedCollator(String rules)
           Constructor that takes the argument rules for customization.
 
Method Summary
 Object clone()
          Clones the RuleBasedCollator
 RuleBasedCollator cloneAsThawed()
          Provides for the clone operation.
 int compare(String source, String target)
          Compares the source text String to the target text String according to the collation rules, strength and decomposition mode for this RuleBasedCollator.
 boolean equals(Object obj)
          Compares the equality of two RuleBasedCollator objects.
 Collator freeze()
          Freezes the collator.
 CollationElementIterator getCollationElementIterator(CharacterIterator source)
          Return a CollationElementIterator for the given CharacterIterator.
 CollationElementIterator getCollationElementIterator(String source)
          Return a CollationElementIterator for the given String.
 CollationElementIterator getCollationElementIterator(UCharacterIterator source)
          Return a CollationElementIterator for the given UCharacterIterator.
 CollationKey getCollationKey(String source)
           Get a Collation key for the argument String source from this RuleBasedCollator.
 void getContractionsAndExpansions(UnicodeSet contractions, UnicodeSet expansions, boolean addPrefixes)
          Gets unicode sets containing contractions and/or expansions of a collator
static int[] getEquivalentReorderCodes(int reorderCode)
          Retrieves all the reorder codes that are grouped with the given reorder code.
 boolean getNumericCollation()
          Method to retrieve the numeric collation value.
 RawCollationKey getRawCollationKey(String source, RawCollationKey key)
          Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key.
 int[] getReorderCodes()
          Retrieves the reordering codes for this collator.
 String getRules()
          Gets the collation rules for this RuleBasedCollator.
 String getRules(boolean fullrules)
          Returns current rules.
 UnicodeSet getTailoredSet()
          Get an UnicodeSet that contains all the characters and sequences tailored in this collator.
 VersionInfo getUCAVersion()
          Get the UCA version of this collator object.
 int getVariableTop()
          Gets the variable top value of a Collator.
 VersionInfo getVersion()
          Get the version of this collator object.
 int hashCode()
          Generates a unique hash code for this RuleBasedCollator.
 boolean isAlternateHandlingShifted()
          Checks if the alternate handling behaviour is the UCA defined SHIFTED or NON_IGNORABLE.
 boolean isCaseLevel()
          Checks if case level is set to true.
 boolean isFrenchCollation()
          Checks if French Collation is set to true.
 boolean isFrozen()
          Determines whether the object has been frozen or not.
 boolean isHiraganaQuaternary()
          Checks if the Hiragana Quaternary mode is set on.
 boolean isLowerCaseFirst()
          Return true if a lowercase character is sorted before the corresponding uppercase character.
 boolean isUpperCaseFirst()
          Return true if an uppercase character is sorted before the corresponding lowercase character.
 void setAlternateHandlingDefault()
          Sets the alternate handling mode to the initial mode set during construction of the RuleBasedCollator.
 void setAlternateHandlingShifted(boolean shifted)
          Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable.
 void setCaseFirstDefault()
          Sets the case first mode to the initial mode set during construction of the RuleBasedCollator.
 void setCaseLevel(boolean flag)
           When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known as the case level.
 void setCaseLevelDefault()
          Sets the case level mode to the initial mode set during construction of the RuleBasedCollator.
 void setDecompositionDefault()
          Sets the decomposition mode to the initial mode set during construction of the RuleBasedCollator.
 void setFrenchCollation(boolean flag)
          Sets the mode for the direction of SECONDARY weights to be used in French collation.
 void setFrenchCollationDefault()
          Sets the French collation mode to the initial mode set during construction of the RuleBasedCollator.
 void setHiraganaQuaternary(boolean flag)
          Sets the Hiragana Quaternary mode to be on or off.
 void setHiraganaQuaternaryDefault()
          Sets the Hiragana Quaternary mode to the initial mode set during construction of the RuleBasedCollator.
 void setLowerCaseFirst(boolean lowerfirst)
          Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY.
 void setNumericCollation(boolean flag)
          When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits.
 void setNumericCollationDefault()
          Method to set numeric collation to its default value.
 void setReorderCodes(int... order)
          Sets the reordering codes for this collator.
 void setStrength(int newStrength)
           Sets this Collator's strength property.
 void setStrengthDefault()
          Sets the collation strength to the initial mode set during the construction of the RuleBasedCollator.
 void setUpperCaseFirst(boolean upperfirst)
          Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY.
 void setVariableTop(int varTop)
          Sets the variable top to a collation element value supplied.
 int setVariableTop(String varTop)
           Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.
 
Methods inherited from class com.ibm.icu.text.Collator
compare, equals, getAvailableLocales, getAvailableULocales, getDecomposition, getDisplayName, getDisplayName, getDisplayName, getDisplayName, getFunctionalEquivalent, getFunctionalEquivalent, getInstance, getInstance, getInstance, getKeywords, getKeywordValues, getKeywordValuesForLocale, getLocale, getStrength, internalSetDecomposition, registerFactory, registerInstance, setDecomposition, setStrength2, unregister
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RuleBasedCollator

public RuleBasedCollator(String rules)
                  throws Exception

Constructor that takes the argument rules for customization. The collator will be based on UCA, with the attributes and re-ordering of the characters specified in the argument rules.

See the user guide's section on Collation Customization for details on the rule syntax.

Parameters:
rules - the collation rules to build the collation table from.
Throws:
ParseException - and IOException thrown. ParseException thrown when argument rules have an invalid syntax. IOException thrown when an error occured while reading internal data.
Exception
Status:
Stable ICU 2.8.
Method Detail

clone

public Object clone()
             throws CloneNotSupportedException
Clones the RuleBasedCollator

Overrides:
clone in class Collator
Returns:
a new instance of this RuleBasedCollator object
Throws:
CloneNotSupportedException
Status:
Stable ICU 2.8.

getCollationElementIterator

public CollationElementIterator getCollationElementIterator(String source)
Return a CollationElementIterator for the given String.

See Also:
CollationElementIterator
Status:
Stable ICU 2.8.

getCollationElementIterator

public CollationElementIterator getCollationElementIterator(CharacterIterator source)
Return a CollationElementIterator for the given CharacterIterator. The source iterator's integrity will be preserved since a new copy will be created for use.

See Also:
CollationElementIterator
Status:
Stable ICU 2.8.

getCollationElementIterator

public CollationElementIterator getCollationElementIterator(UCharacterIterator source)
Return a CollationElementIterator for the given UCharacterIterator. The source iterator's integrity will be preserved since a new copy will be created for use.

See Also:
CollationElementIterator
Status:
Stable ICU 2.8.

isFrozen

public boolean isFrozen()
Determines whether the object has been frozen or not.

Specified by:
isFrozen in interface Freezable<Collator>
Overrides:
isFrozen in class Collator
Status:
Draft ICU 4.8.

freeze

public Collator freeze()
Freezes the collator.

Specified by:
freeze in interface Freezable<Collator>
Overrides:
freeze in class Collator
Returns:
the collator itself.
Status:
Draft ICU 4.8.

cloneAsThawed

public RuleBasedCollator cloneAsThawed()
Provides for the clone operation. Any clone is initially unfrozen.

Specified by:
cloneAsThawed in interface Freezable<Collator>
Overrides:
cloneAsThawed in class Collator
Status:
Draft ICU 4.8.

setHiraganaQuaternary

public void setHiraganaQuaternary(boolean flag)
Sets the Hiragana Quaternary mode to be on or off. When the Hiragana Quaternary mode is turned on, the collator positions Hiragana characters before all non-ignorable characters in QUATERNARY strength. This is to produce a correct JIS collation order, distinguishing between Katakana and Hiragana characters.

Parameters:
flag - true if Hiragana Quaternary mode is to be on, false otherwise
See Also:
setHiraganaQuaternaryDefault(), isHiraganaQuaternary()
Status:
Stable ICU 2.8.

setHiraganaQuaternaryDefault

public void setHiraganaQuaternaryDefault()
Sets the Hiragana Quaternary mode to the initial mode set during construction of the RuleBasedCollator. See setHiraganaQuaternary(boolean) for more details.

See Also:
setHiraganaQuaternary(boolean), isHiraganaQuaternary()
Status:
Stable ICU 2.8.

setUpperCaseFirst

public void setUpperCaseFirst(boolean upperfirst)
Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY. The default mode is false, and so lowercase characters sort before uppercase characters. If true, sort upper case characters first.

Parameters:
upperfirst - true to sort uppercase characters before lowercase characters, false to sort lowercase characters before uppercase characters
See Also:
isLowerCaseFirst(), isUpperCaseFirst(), setLowerCaseFirst(boolean), setCaseFirstDefault()
Status:
Stable ICU 2.8.

setLowerCaseFirst

public void setLowerCaseFirst(boolean lowerfirst)
Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY. The default mode is false. If true is set, the RuleBasedCollator will sort lower cased characters before the upper cased ones. Otherwise, if false is set, the RuleBasedCollator will ignore case preferences.

Parameters:
lowerfirst - true for sorting lower cased characters before upper cased characters, false to ignore case preferences.
See Also:
isLowerCaseFirst(), isUpperCaseFirst(), setUpperCaseFirst(boolean), setCaseFirstDefault()
Status:
Stable ICU 2.8.

setCaseFirstDefault

public final void setCaseFirstDefault()
Sets the case first mode to the initial mode set during construction of the RuleBasedCollator. See setUpperCaseFirst(boolean) and setLowerCaseFirst(boolean) for more details.

See Also:
isLowerCaseFirst(), isUpperCaseFirst(), setLowerCaseFirst(boolean), setUpperCaseFirst(boolean)
Status:
Stable ICU 2.8.

setAlternateHandlingDefault

public void setAlternateHandlingDefault()
Sets the alternate handling mode to the initial mode set during construction of the RuleBasedCollator. See setAlternateHandling(boolean) for more details.

See Also:
setAlternateHandlingShifted(boolean), isAlternateHandlingShifted()
Status:
Stable ICU 2.8.

setCaseLevelDefault

public void setCaseLevelDefault()
Sets the case level mode to the initial mode set during construction of the RuleBasedCollator. See setCaseLevel(boolean) for more details.

See Also:
setCaseLevel(boolean), isCaseLevel()
Status:
Stable ICU 2.8.

setDecompositionDefault

public void setDecompositionDefault()
Sets the decomposition mode to the initial mode set during construction of the RuleBasedCollator. See setDecomposition(int) for more details.

See Also:
Collator.getDecomposition(), Collator.setDecomposition(int)
Status:
Stable ICU 2.8.

setFrenchCollationDefault

public void setFrenchCollationDefault()
Sets the French collation mode to the initial mode set during construction of the RuleBasedCollator. See setFrenchCollation(boolean) for more details.

See Also:
isFrenchCollation(), setFrenchCollation(boolean)
Status:
Stable ICU 2.8.

setStrengthDefault

public void setStrengthDefault()
Sets the collation strength to the initial mode set during the construction of the RuleBasedCollator. See setStrength(int) for more details.

See Also:
setStrength(int), Collator.getStrength()
Status:
Stable ICU 2.8.

setNumericCollationDefault

public void setNumericCollationDefault()
Method to set numeric collation to its default value. When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort AFTER '2'

See Also:
getNumericCollation(), setNumericCollation(boolean)
Status:
Stable ICU 2.8.

setFrenchCollation

public void setFrenchCollation(boolean flag)
Sets the mode for the direction of SECONDARY weights to be used in French collation. The default value is false, which treats SECONDARY weights in the order they appear. If set to true, the SECONDARY weights will be sorted backwards. See the section on French collation for more information.

Parameters:
flag - true to set the French collation on, false to set it off
See Also:
isFrenchCollation(), setFrenchCollationDefault()
Status:
Stable ICU 2.8.

setAlternateHandlingShifted

public void setAlternateHandlingShifted(boolean shifted)
Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable. See the UCA definition on Alternate Weighting. This attribute will only be effective when QUATERNARY strength is set. The default value for this mode is false, corresponding to the NON_IGNORABLE mode in UCA. In the NON-IGNORABLE mode, the RuleBasedCollator will treats all the codepoints with non-ignorable primary weights in the same way. If the mode is set to true, the behaviour corresponds to SHIFTED defined in UCA, this causes codepoints with PRIMARY orders that are equal or below the variable top value to be ignored in PRIMARY order and moved to the QUATERNARY order.

Parameters:
shifted - true if SHIFTED behaviour for alternate handling is desired, false for the NON_IGNORABLE behaviour.
See Also:
isAlternateHandlingShifted(), setAlternateHandlingDefault()
Status:
Stable ICU 2.8.

setCaseLevel

public void setCaseLevel(boolean flag)

When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known as the case level. The case level is used to distinguish large and small Japanese Kana characters. Case level could also be used in other situations. For example to distinguish certain Pinyin characters. The default value is false, which means the case level is not generated. The contents of the case level are affected by the case first mode. A simple way to ignore accent differences in a string is to set the strength to PRIMARY and enable case level.

See the section on case level for more information.

Parameters:
flag - true if case level sorting is required, false otherwise
See Also:
setCaseLevelDefault(), isCaseLevel()
Status:
Stable ICU 2.8.

setStrength

public void setStrength(int newStrength)

Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.

See the Collator class description for an example of use.

Overrides:
setStrength in class Collator
Parameters:
newStrength - the new strength value.
Throws:
IllegalArgumentException - If the new strength value is not one of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
See Also:
Collator.getStrength(), setStrengthDefault(), Collator.PRIMARY, Collator.SECONDARY, Collator.TERTIARY, Collator.QUATERNARY, Collator.IDENTICAL
Status:
Stable ICU 2.8.

setVariableTop

public int setVariableTop(String varTop)

Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.

Sets the variable top to a collation element value of a string supplied.

Specified by:
setVariableTop in class Collator
Parameters:
varTop - one or more (if contraction) characters to which the variable top should be set
Returns:
a int value containing the value of the variable top in upper 16 bits. Lower 16 bits are undefined.
Throws:
IllegalArgumentException - is thrown if varTop argument is not a valid variable top element. A variable top element is invalid when
  • it is a contraction that does not exist in the Collation order
  • when the PRIMARY strength collation element for the variable top has more than two bytes
  • when the varTop argument is null or zero in length.
See Also:
getVariableTop(), setAlternateHandlingShifted(boolean)
Status:
Stable ICU 2.6.

setVariableTop

public void setVariableTop(int varTop)
Sets the variable top to a collation element value supplied. Variable top is set to the upper 16 bits. Lower 16 bits are ignored.

Specified by:
setVariableTop in class Collator
Parameters:
varTop - Collation element value, as returned by setVariableTop or getVariableTop
See Also:
getVariableTop(), setVariableTop(String)
Status:
Stable ICU 2.6.

setNumericCollation

public void setNumericCollation(boolean flag)
When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort AFTER '2'

Parameters:
flag - true to turn numeric collation on and false to turn it off
See Also:
getNumericCollation(), setNumericCollationDefault()
Status:
Stable ICU 2.8.

setReorderCodes

public void setReorderCodes(int... order)
Sets the reordering codes for this collator. Collation reordering allows scripts and some other defined blocks of characters to be moved relative to each other as a block. This reordering is done on top of the DUCET/CLDR standard collation order. Reordering can specify groups to be placed at the start and/or the end of the collation order.

By default, reordering codes specified for the start of the order are placed in the order given after a group of “special” non-script blocks. These special groups of characters are space, punctuation, symbol, currency, and digit. These special groups are represented with Collator.ReorderCodes. Script groups can be intermingled with these special non-script blocks if those special blocks are explicitly specified in the reordering.

The special code OTHERS stands for any script that is not explicitly mentioned in the list of reordering codes given. Anything that is after OTHERS will go at the very end of the reordering in the order given.

The special reorder code DEFAULT will reset the reordering for this collator to the default for this collator. The default reordering may be the DUCET/CLDR order or may be a reordering that was specified when this collator was created from resource data or from rules. The DEFAULT code must be the sole code supplied when it used. If not that will result in an IllegalArgumentException being thrown.

The special reorder code NONE will remove any reordering for this collator. The result of setting no reordering will be to have the DUCET/CLDR reordering used. The NONE code must be the sole code supplied when it used.

Overrides:
setReorderCodes in class Collator
Parameters:
order - the reordering codes to apply to this collator; if this is null or an empty array then this clears any existing reordering
Throws:
IllegalArgumentException - if the reordering codes are malformed in any way (e.g. duplicates, multiple reset codes, overlapping equivalent scripts)
See Also:
getReorderCodes(), getEquivalentReorderCodes(int)
Status:
Draft ICU 4.8.

getRules

public String getRules()
Gets the collation rules for this RuleBasedCollator. Equivalent to String getRules(RuleOption.FULL_RULES).

Returns:
returns the collation rules
See Also:
getRules(boolean)
Status:
Stable ICU 2.8.

getRules

public String getRules(boolean fullrules)
Returns current rules. The argument defines whether full rules (UCA + tailored) rules are returned or just the tailoring.

Parameters:
fullrules - true if the rules that defines the full set of collation order is required, otherwise false for returning only the tailored rules
Returns:
the current rules that defines this Collator.
See Also:
getRules()
Status:
Stable ICU 2.6.

getTailoredSet

public UnicodeSet getTailoredSet()
Get an UnicodeSet that contains all the characters and sequences tailored in this collator.

Overrides:
getTailoredSet in class Collator
Returns:
a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the UCA.
Status:
Stable ICU 2.4.

getContractionsAndExpansions

public void getContractionsAndExpansions(UnicodeSet contractions,
                                         UnicodeSet expansions,
                                         boolean addPrefixes)
                                  throws Exception
Gets unicode sets containing contractions and/or expansions of a collator

Parameters:
contractions - if not null, set to contain contractions
expansions - if not null, set to contain expansions
addPrefixes - add the prefix contextual elements to contractions
Throws:
Exception - Throws an exception if any errors occurs.
Status:
Stable ICU 3.4.

getCollationKey

public CollationKey getCollationKey(String source)

Get a Collation key for the argument String source from this RuleBasedCollator.

General recommendation:
If comparison are to be done to the same String multiple times, it would be more efficient to generate CollationKeys for the Strings and use CollationKey.compareTo(CollationKey) for the comparisons. If the each Strings are compared to only once, using the method RuleBasedCollator.compare(String, String) will have a better performance.

See the class documentation for an explanation about CollationKeys.

Specified by:
getCollationKey in class Collator
Parameters:
source - the text String to be transformed into a collation key.
Returns:
the CollationKey for the given String based on this RuleBasedCollator's collation rules. If the source String is null, a null CollationKey is returned.
See Also:
CollationKey, compare(String, String), getRawCollationKey(java.lang.String, com.ibm.icu.text.RawCollationKey)
Status:
Stable ICU 2.8.

getRawCollationKey

public RawCollationKey getRawCollationKey(String source,
                                          RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key. If key has a internal byte array of length that's too small for the result, the internal byte array will be grown to the exact required size.

Specified by:
getRawCollationKey in class Collator
Parameters:
source - the text String to be transformed into a RawCollationKey
key - output RawCollationKey to store results
Returns:
If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned.
See Also:
getCollationKey(java.lang.String), compare(String, String), RawCollationKey
Status:
Stable ICU 2.8.

isUpperCaseFirst

public boolean isUpperCaseFirst()
Return true if an uppercase character is sorted before the corresponding lowercase character. See setCaseFirst(boolean) for details.

Returns:
true if upper cased characters are sorted before lower cased characters, false otherwise
See Also:
setUpperCaseFirst(boolean), setLowerCaseFirst(boolean), isLowerCaseFirst(), setCaseFirstDefault()
Status:
Stable ICU 2.8.

isLowerCaseFirst

public boolean isLowerCaseFirst()
Return true if a lowercase character is sorted before the corresponding uppercase character. See setCaseFirst(boolean) for details.

Returns:
true lower cased characters are sorted before upper cased characters, false otherwise
See Also:
setUpperCaseFirst(boolean), setLowerCaseFirst(boolean), isUpperCaseFirst(), setCaseFirstDefault()
Status:
Stable ICU 2.8.

isAlternateHandlingShifted

public boolean isAlternateHandlingShifted()
Checks if the alternate handling behaviour is the UCA defined SHIFTED or NON_IGNORABLE. If return value is true, then the alternate handling attribute for the Collator is SHIFTED. Otherwise if return value is false, then the alternate handling attribute for the Collator is NON_IGNORABLE See setAlternateHandlingShifted(boolean) for more details.

Returns:
true or false
See Also:
setAlternateHandlingShifted(boolean), setAlternateHandlingDefault()
Status:
Stable ICU 2.8.

isCaseLevel

public boolean isCaseLevel()
Checks if case level is set to true. See setCaseLevel(boolean) for details.

Returns:
the case level mode
See Also:
setCaseLevelDefault(), isCaseLevel(), setCaseLevel(boolean)
Status:
Stable ICU 2.8.

isFrenchCollation

public boolean isFrenchCollation()
Checks if French Collation is set to true. See setFrenchCollation(boolean) for details.

Returns:
true if French Collation is set to true, false otherwise
See Also:
setFrenchCollation(boolean), setFrenchCollationDefault()
Status:
Stable ICU 2.8.

isHiraganaQuaternary

public boolean isHiraganaQuaternary()
Checks if the Hiragana Quaternary mode is set on. See setHiraganaQuaternary(boolean) for more details.

Returns:
flag true if Hiragana Quaternary mode is on, false otherwise
See Also:
setHiraganaQuaternaryDefault(), setHiraganaQuaternary(boolean)
Status:
Stable ICU 2.8.

getVariableTop

public int getVariableTop()
Gets the variable top value of a Collator. Lower 16 bits are undefined and should be ignored.

Specified by:
getVariableTop in class Collator
Returns:
the variable top value of a Collator.
See Also:
setVariableTop(java.lang.String)
Status:
Stable ICU 2.6.

getNumericCollation

public boolean getNumericCollation()
Method to retrieve the numeric collation value. When numeric collation is turned on, this Collator generates a collation key for the numeric value of substrings of digits. This is a way to get '100' to sort AFTER '2'

Returns:
true if numeric collation is turned on, false otherwise
See Also:
setNumericCollation(boolean), setNumericCollationDefault()
Status:
Stable ICU 2.8.

getReorderCodes

public int[] getReorderCodes()
Retrieves the reordering codes for this collator. These reordering codes are a combination of UScript codes and ReorderCodes.

Overrides:
getReorderCodes in class Collator
Returns:
a copy of the reordering codes for this collator; if none are set then returns an empty array
See Also:
setReorderCodes(int...), getEquivalentReorderCodes(int)
Status:
Draft ICU 4.8.

getEquivalentReorderCodes

public static int[] getEquivalentReorderCodes(int reorderCode)
Retrieves all the reorder codes that are grouped with the given reorder code. Some reorder codes are grouped and must reorder together.

Parameters:
reorderCode - code for which equivalents to be retrieved
Returns:
the set of all reorder codes in the same group as the given reorder code.
See Also:
setReorderCodes(int...), getReorderCodes()
Status:
Draft ICU 4.8.

equals

public boolean equals(Object obj)
Compares the equality of two RuleBasedCollator objects. RuleBasedCollator objects are equal if they have the same collation rules and the same attributes.

Specified by:
equals in interface Comparator<Object>
Overrides:
equals in class Object
Parameters:
obj - the RuleBasedCollator to be compared to.
Returns:
true if this RuleBasedCollator has exactly the same collation behaviour as obj, false otherwise.
Status:
Stable ICU 2.8.

hashCode

public int hashCode()
Generates a unique hash code for this RuleBasedCollator.

Overrides:
hashCode in class Object
Returns:
the unique hash code for this Collator
Status:
Stable ICU 2.8.

compare

public int compare(String source,
                   String target)
Compares the source text String to the target text String according to the collation rules, strength and decomposition mode for this RuleBasedCollator. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

General recommendation:
If comparison are to be done to the same String multiple times, it would be more efficient to generate CollationKeys for the Strings and use CollationKey.compareTo(CollationKey) for the comparisons. If speed performance is critical and object instantiation is to be reduced, further optimization may be achieved by generating a simpler key of the form RawCollationKey and reusing this RawCollationKey object with the method RuleBasedCollator.getRawCollationKey. Internal byte representation can be directly accessed via RawCollationKey and stored for future use. Like CollationKey, RawCollationKey provides a method RawCollationKey.compareTo for key comparisons. If the each Strings are compared to only once, using the method RuleBasedCollator.compare(String, String) will have a better performance.

Specified by:
compare in class Collator
Parameters:
source - the source text String.
target - the target text String.
Returns:
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
See Also:
CollationKey, getCollationKey(java.lang.String)
Status:
Stable ICU 2.8.

getVersion

public VersionInfo getVersion()
Get the version of this collator object.

Specified by:
getVersion in class Collator
Returns:
the version object associated with this collator
Status:
Stable ICU 2.8.

getUCAVersion

public VersionInfo getUCAVersion()
Get the UCA version of this collator object.

Specified by:
getUCAVersion in class Collator
Returns:
the version object associated with this collator
Status:
Stable ICU 2.8.


Copyright (c) 2011 IBM Corporation and others.