Class SimpleLinguistics

java.lang.Object
com.yahoo.language.simple.SimpleLinguistics
All Implemented Interfaces:
Linguistics

public class SimpleLinguistics extends Object implements Linguistics
Factory of simple linguistic processor implementations. Useful for testing and english-only use cases.
Author:
bratseth, bjorncs
  • Constructor Details

    • SimpleLinguistics

      @Inject public SimpleLinguistics()
  • Method Details

    • getStemmer

      public Stemmer getStemmer()
      Description copied from interface: Linguistics
      Returns a thread-unsafe stemmer or lemmatizer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with stemming turned on
      Specified by:
      getStemmer in interface Linguistics
    • getTokenizer

      public Tokenizer getTokenizer()
      Description copied from interface: Linguistics
      Returns a thread-unsafe tokenizer. This is used at indexing time to produce an optionally stemmed and transformed (accent normalized) stream of indexable tokens.
      Specified by:
      getTokenizer in interface Linguistics
    • getNormalizer

      public Normalizer getNormalizer()
      Description copied from interface: Linguistics
      Returns a thread-unsafe normalizer. This is used at query time to cjk normalize query text.
      Specified by:
      getNormalizer in interface Linguistics
    • getTransformer

      public Transformer getTransformer()
      Description copied from interface: Linguistics
      Returns a thread-unsafe transformer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with accent normalization turned on
      Specified by:
      getTransformer in interface Linguistics
    • getSegmenter

      public Segmenter getSegmenter()
      Description copied from interface: Linguistics
      Returns a thread-unsafe segmenter. This is used at query time to find the individual semantic components of search terms to indexes tokenized with segmentation.
      Specified by:
      getSegmenter in interface Linguistics
    • getDetector

      public Detector getDetector()
      Description copied from interface: Linguistics
      Returns a thread-unsafe detector. The language of the text is a parameter to other linguistic operations. This is used to determine the language of a query or document field when not specified explicitly.
      Specified by:
      getDetector in interface Linguistics
    • getGramSplitter

      public GramSplitter getGramSplitter()
      Description copied from interface: Linguistics
      Returns a thread-unsafe gram splitter. This is used to split query or document text into fixed-length grams which allows matching without needing or using segmented tokens.
      Specified by:
      getGramSplitter in interface Linguistics
    • getCharacterClasses

      public CharacterClasses getCharacterClasses()
      Description copied from interface: Linguistics
      Returns a thread-unsafe character classes instance.
      Specified by:
      getCharacterClasses in interface Linguistics
    • equals

      public boolean equals(Linguistics other)
      Description copied from interface: Linguistics
      Check if another instance is equivalent to this one
      Specified by:
      equals in interface Linguistics