java.lang.Object
- com.apple.foundationdb.record.provider.common.text.DefaultTextTokenizer

All Implemented Interfaces:

TextTokenizer
```
@API(EXPERIMENTAL)
public class DefaultTextTokenizer
extends Object
implements TextTokenizer
```
This is the default tokenizer used by full-text indexes. It will split the text on whitespace, normalize the input into Unicode normalization form KD (compatibility decomposition), case fold input to lower case, and strip all diacritical marks. This is appropriate for exact matching of many languages (those that use whitespace as their word separator, e.g., most European languages, Korean, Semitic languages, etc.), but it doesn't handle highly synthetic languages particularly well, nor does it handle languages like Chinese, Japanese, or Thai that do not generally use whitespace to indicate word boundaries.

Nested Class Summary
- Nested classes/interfaces inherited from interface com.apple.foundationdb.record.provider.common.text.TextTokenizer
  TextTokenizer.TokenizerMode

Field Summary

Fields
Modifier and Type Field Description

static String NAME
The name of the default tokenizer.
- Fields inherited from interface com.apple.foundationdb.record.provider.common.text.TextTokenizer
  GLOBAL_MIN_VERSION

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`int`	`getMaxVersion()`	Get the maximum supported version.
`String`	`getName()`	Get the name for this tokenizer.
`static DefaultTextTokenizer`	`instance()`	Get this class's singleton.
`Iterator<String>`	`tokenize(String text, int version, TextTokenizer.TokenizerMode mode)`	Tokenize the text based on whitespace.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface com.apple.foundationdb.record.provider.common.text.TextTokenizer
getMinVersion, tokenizeToList, tokenizeToMap, validateVersion

- Field Detail
  - NAME
```
@Nonnull
public static final String NAME
```
    The name of the default tokenizer. This can be used to explicitly require the default tokenizer in a text index.
    
    See Also:
    
    Constant Field Values
- Method Detail
  - instance
```
@Nonnull
public static DefaultTextTokenizer instance()
```
    Get this class's singleton. This text tokenizer maintains no state, so only one instance is needed.
    
    Returns:
    
    this tokenizer's singleton instance
  - tokenize
```
@Nonnull
public Iterator<String> tokenize(@Nonnull
                                 String text,
                                 int version,
                                 @Nonnull
                                 TextTokenizer.TokenizerMode mode)
```
    Tokenize the text based on whitespace. This normalizes the input using the NFKD (compatibility decomposition) normal form, case-folds to lower case, and then strips out diacritical marks. It makes no other attempts to stem words into their base forms, nor does it attempt to make word splits between words in synthetic languages or in languages that do not use whitespace as tokenizers. This tokenizer performs identically when used to tokenize documents at index time and when used to tokenize query strings.
    
    Specified by:
    
    tokenize in interface TextTokenizer
    
    Parameters:
    
    text - source text to split
    
    version - version of the tokenizer to use to split the text
    
    mode - ignored as this tokenizer operates the same way at index and query time
    
    Returns:
    
    an iterator over whitespace-separated tokens
  - getName
```
@Nonnull
public String getName()
```
    Get the name for this tokenizer. For default tokenizers, the name is ""default"".
    
    Specified by:
    
    getName in interface TextTokenizer
    
    Returns:
    
    the name of the default tokenizer
  - getMaxVersion
```
public int getMaxVersion()
```
    Get the maximum supported version. Currently, there is only one version of this tokenizer, so the maximum version is the same as the minimum version.
    
    Specified by:
    
    getMaxVersion in interface TextTokenizer
    
    Returns:
    
    the maximum version supported by this tokenizer

Class DefaultTextTokenizer

Nested Class Summary

Nested classes/interfaces inherited from interface com.apple.foundationdb.record.provider.common.text.TextTokenizer

Field Summary

Fields inherited from interface com.apple.foundationdb.record.provider.common.text.TextTokenizer

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface com.apple.foundationdb.record.provider.common.text.TextTokenizer

Field Detail

NAME

Method Detail

instance

tokenize

getName

getMaxVersion