Tokenizer (container-search 6.274.30 API)

java.lang.Object
- com.yahoo.prelude.query.parser.Tokenizer

```
public final class Tokenizer
extends Object
```
Query tokenizer. Singlethreaded.

Author:

bratseth

Field Summary

Fields
Modifier and Type	Field and Description
`private com.yahoo.language.process.CharacterClasses`	`characterClasses`
`private int`	`indexLastExplicitlyChangedAt`
`private int`	`parensToEat`
`private String`	`source`
`private SpecialTokens`	`specialTokens` Tokens which should be words, regardless of which characters they contain
`private boolean`	`substringSpecialTokens` Whether to recognize tokens also as substrings of other tokens, needed for cjk
`private List<Token>`	`tokens`

Constructor Summary

Constructors
Constructor and Description

Tokenizer(com.yahoo.language.Linguistics linguistics)
Creates a tokenizer which initializes from a given Linguistics

Constructors
Constructor and Description
`Tokenizer(com.yahoo.language.Linguistics linguistics)` Creates a tokenizer which initializes from a given Linguistics

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`private boolean`	`acceptApostropheAsWordCharacter(Index currentIndex)`
`private void`	`addToken(Token.Kind kind, String word, int start, int end)`
`private void`	`addToken(Token token)`
`private int`	`consumeExact(int start, Index index)`
`private int`	`consumeHeuristicExact(int start)`
`private int`	`consumeSpecialToken(int start)`
`private int`	`consumeToTerminator(int start, String terminator)`
`private int`	`consumeWordOrNumber(int start, Index currentIndex)` Consumes a word or number and/or possibly a special token starting within this word or number
`private Index`	`determineCurrentIndex(Index defaultIndex, IndexFacts.Session indexFacts)`
`private SpecialTokens.SpecialToken`	`getSpecialToken(int start)`
`private boolean`	`looksLikeExactEnd(int end)`
`void`	`setSpecialTokens(SpecialTokens specialTokens)` Sets a list of tokens (Strings) which should be returned as WORD tokens regardless of their content.
`void`	`setSubstringSpecialTokens(boolean substringSpecialTokens)` Sets whether to recognize tokens also as substrings of other tokens, needed for cjk.
`private boolean`	`terminatorStartsAt(int start, String terminator)`
`List<Token>`	`tokenize(String string)` Resets this tokenizer and create tokens from the given string, using "default" as the default index, and using no index information.
`List<Token>`	`tokenize(String string, IndexFacts.Session indexFacts)` Resets this tokenizer and create tokens from the given string, using "default" as the default index
`List<Token>`	`tokenize(String string, String defaultIndexName, IndexFacts.Session indexFacts)` Resets this tokenizer and create tokens from the given string.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

tokens
```
private List<Token> tokens
```

source
```
private String source
```

specialTokens
```
private SpecialTokens specialTokens
```
Tokens which should be words, regardless of which characters they contain

substringSpecialTokens
```
private boolean substringSpecialTokens
```
Whether to recognize tokens also as substrings of other tokens, needed for cjk

characterClasses

private final com.yahoo.language.process.CharacterClasses characterClasses

parensToEat
```
private int parensToEat
```

indexLastExplicitlyChangedAt

private int indexLastExplicitlyChangedAt

Constructor Detail
- Tokenizer
```
public Tokenizer(com.yahoo.language.Linguistics linguistics)
```
  Creates a tokenizer which initializes from a given Linguistics

Method Detail

setSpecialTokens
```
public void setSpecialTokens(SpecialTokens specialTokens)
```
Sets a list of tokens (Strings) which should be returned as WORD tokens regardless of their content. This list is used directly by the Tokenizer and should not be changed after calling this. The tokenizer will not change it. Special tokens are case sensitive.

setSubstringSpecialTokens
```
public void setSubstringSpecialTokens(boolean substringSpecialTokens)
```
Sets whether to recognize tokens also as substrings of other tokens, needed for cjk. Default false.

tokenize
```
public List<Token> tokenize(String string)
```
Resets this tokenizer and create tokens from the given string, using "default" as the default index, and using no index information.

Returns:

a read-only list of tokens. This list can only be used by this thread

tokenize
```
public List<Token> tokenize(String string,
                            IndexFacts.Session indexFacts)
```
Resets this tokenizer and create tokens from the given string, using "default" as the default index

Returns:

a read-only list of tokens. This list can only be used by this thread

tokenize
```
public List<Token> tokenize(String string,
                            String defaultIndexName,
                            IndexFacts.Session indexFacts)
```
Resets this tokenizer and create tokens from the given string.

Parameters:

string - the string to tokenize

defaultIndexName - the name of the index to use as default

indexFacts - information about the indexes we will search

Returns:

a read-only list of tokens. This list can only be used by this thread

acceptApostropheAsWordCharacter

private boolean acceptApostropheAsWordCharacter(Index currentIndex)

determineCurrentIndex

private Index determineCurrentIndex(Index defaultIndex,
                                    IndexFacts.Session indexFacts)

consumeSpecialToken

private int consumeSpecialToken(int start)

getSpecialToken

private SpecialTokens.SpecialToken getSpecialToken(int start)

consumeExact

private int consumeExact(int start,
                         Index index)

looksLikeExactEnd

private boolean looksLikeExactEnd(int end)

consumeHeuristicExact

private int consumeHeuristicExact(int start)

consumeToTerminator

private int consumeToTerminator(int start,
                                String terminator)

terminatorStartsAt

private boolean terminatorStartsAt(int start,
                                   String terminator)

consumeWordOrNumber

private int consumeWordOrNumber(int start,
                                Index currentIndex)

Consumes a word or number and/or possibly a special token starting within this word or number

addToken

private void addToken(Token.Kind kind,
                      String word,
                      int start,
                      int end)

addToken
```
private void addToken(Token token)
```

Class Tokenizer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

tokens

source

specialTokens

substringSpecialTokens

characterClasses

parensToEat

indexLastExplicitlyChangedAt

Constructor Detail

Tokenizer

Method Detail

setSpecialTokens

setSubstringSpecialTokens

tokenize

tokenize

tokenize

acceptApostropheAsWordCharacter

determineCurrentIndex

consumeSpecialToken

getSpecialToken

consumeExact

looksLikeExactEnd

consumeHeuristicExact

consumeToTerminator

terminatorStartsAt

consumeWordOrNumber

addToken

addToken