Class Tokenizer

java.lang.Object
com.yahoo.prelude.query.parser.Tokenizer

public final class Tokenizer extends Object
Query tokenizer. Singlethreaded.
Author:
bratseth
  • Constructor Summary

    Constructors
    Constructor
    Description
    Tokenizer(com.yahoo.language.Linguistics linguistics)
    Creates a tokenizer which initializes from a given Linguistics
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    setSpecialTokens(com.yahoo.language.process.SpecialTokens specialTokens)
    Sets a list of tokens (Strings) which should be returned as WORD tokens regardless of their content.
    void
    setSubstringSpecialTokens(boolean substringSpecialTokens)
    Sets whether to recognize tokens also as substrings of other tokens, needed for cjk.
    tokenize(String string)
    Resets this tokenizer and create tokens from the given string, using "default" as the default index, and using no index information.
    tokenize(String string, IndexFacts.Session indexFacts)
    Resets this tokenizer and create tokens from the given string, using "default" as the default index
    tokenize(String string, String defaultIndexName, IndexFacts.Session indexFacts)
    Resets this tokenizer and create tokens from the given string.
    toToken(com.yahoo.language.process.SpecialTokens.Token specialToken, int start, String rawSource)
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Tokenizer

      public Tokenizer(com.yahoo.language.Linguistics linguistics)
      Creates a tokenizer which initializes from a given Linguistics
  • Method Details

    • setSpecialTokens

      public void setSpecialTokens(com.yahoo.language.process.SpecialTokens specialTokens)
      Sets a list of tokens (Strings) which should be returned as WORD tokens regardless of their content. This list is used directly by the Tokenizer and should not be changed after calling this. The tokenizer will not change it. Special tokens are case sensitive.
    • setSubstringSpecialTokens

      public void setSubstringSpecialTokens(boolean substringSpecialTokens)
      Sets whether to recognize tokens also as substrings of other tokens, needed for cjk. Default false.
    • tokenize

      public List<Token> tokenize(String string)
      Resets this tokenizer and create tokens from the given string, using "default" as the default index, and using no index information.
      Returns:
      a read-only list of tokens. This list can only be used by this thread
    • tokenize

      public List<Token> tokenize(String string, IndexFacts.Session indexFacts)
      Resets this tokenizer and create tokens from the given string, using "default" as the default index
      Returns:
      a read-only list of tokens. This list can only be used by this thread
    • tokenize

      public List<Token> tokenize(String string, String defaultIndexName, IndexFacts.Session indexFacts)
      Resets this tokenizer and create tokens from the given string.
      Parameters:
      string - the string to tokenize
      defaultIndexName - the name of the index to use as default
      indexFacts - information about the indexes we will search
      Returns:
      a read-only list of tokens. This list can only be used by this thread
    • toToken

      public Token toToken(com.yahoo.language.process.SpecialTokens.Token specialToken, int start, String rawSource)