Package com.yahoo.prelude.query.parser
Class Tokenizer
java.lang.Object
com.yahoo.prelude.query.parser.Tokenizer
Query tokenizer. Singlethreaded.
- Author:
- bratseth
-
Constructor Summary
ConstructorDescriptionTokenizer
(com.yahoo.language.Linguistics linguistics) Creates a tokenizer which initializes from a given Linguistics -
Method Summary
Modifier and TypeMethodDescriptionvoid
setSpecialTokens
(com.yahoo.language.process.SpecialTokens specialTokens) Sets a list of tokens (Strings) which should be returned as WORD tokens regardless of their content.void
setSubstringSpecialTokens
(boolean substringSpecialTokens) Sets whether to recognize tokens also as substrings of other tokens, needed for cjk.Resets this tokenizer and create tokens from the given string, using "default" as the default index, and using no index information.tokenize
(String string, IndexFacts.Session indexFacts) Resets this tokenizer and create tokens from the given string, using "default" as the default indextokenize
(String string, String defaultIndexName, IndexFacts.Session indexFacts) Resets this tokenizer and create tokens from the given string.
-
Constructor Details
-
Tokenizer
public Tokenizer(com.yahoo.language.Linguistics linguistics) Creates a tokenizer which initializes from a given Linguistics
-
-
Method Details
-
setSpecialTokens
public void setSpecialTokens(com.yahoo.language.process.SpecialTokens specialTokens) Sets a list of tokens (Strings) which should be returned as WORD tokens regardless of their content. This list is used directly by the Tokenizer and should not be changed after calling this. The tokenizer will not change it. Special tokens are case sensitive. -
setSubstringSpecialTokens
public void setSubstringSpecialTokens(boolean substringSpecialTokens) Sets whether to recognize tokens also as substrings of other tokens, needed for cjk. Default false. -
tokenize
Resets this tokenizer and create tokens from the given string, using "default" as the default index, and using no index information.- Returns:
- a read-only list of tokens. This list can only be used by this thread
-
tokenize
Resets this tokenizer and create tokens from the given string, using "default" as the default index- Returns:
- a read-only list of tokens. This list can only be used by this thread
-
tokenize
Resets this tokenizer and create tokens from the given string.- Parameters:
string
- the string to tokenizedefaultIndexName
- the name of the index to use as defaultindexFacts
- information about the indexes we will search- Returns:
- a read-only list of tokens. This list can only be used by this thread
-
toToken
-