Class SimpleLinguistics

  • All Implemented Interfaces:
    Linguistics
    Direct Known Subclasses:
    OpenNlpLinguistics

    public class SimpleLinguistics
    extends java.lang.Object
    implements Linguistics
    Factory of simple linguistic processor implementations. Useful for testing and english-only use cases.
    Author:
    bratseth, bjorncs
    • Constructor Detail

      • SimpleLinguistics

        @Inject
        public SimpleLinguistics()
    • Method Detail

      • getStemmer

        public Stemmer getStemmer()
        Description copied from interface: Linguistics
        Returns a thread-unsafe stemmer or lemmatizer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with stemming turned on
        Specified by:
        getStemmer in interface Linguistics
      • getTokenizer

        public Tokenizer getTokenizer()
        Description copied from interface: Linguistics
        Returns a thread-unsafe tokenizer. This is used at indexing time to produce a optionally stemmed and transformed (accent normalized) stream of indexable tokens.
        Specified by:
        getTokenizer in interface Linguistics
      • getNormalizer

        public Normalizer getNormalizer()
        Description copied from interface: Linguistics
        Returns a thread-unsafe normalizer. This is used at query time to cjk normalize query text.
        Specified by:
        getNormalizer in interface Linguistics
      • getTransformer

        public Transformer getTransformer()
        Description copied from interface: Linguistics
        Returns a thread-unsafe transformer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with accent normalization turned on
        Specified by:
        getTransformer in interface Linguistics
      • getSegmenter

        public Segmenter getSegmenter()
        Description copied from interface: Linguistics
        Returns a thread-unsafe segmenter. This is used at query time to find the individual semantic components of search terms to indexes tokenized with segmentation.
        Specified by:
        getSegmenter in interface Linguistics
      • getDetector

        public Detector getDetector()
        Description copied from interface: Linguistics
        Returns a thread-unsafe detector. The language of the text is a parameter to other linguistic operations. This is used to determine the language of a query or document field when not specified explicitly.
        Specified by:
        getDetector in interface Linguistics
      • getGramSplitter

        public GramSplitter getGramSplitter()
        Description copied from interface: Linguistics
        Returns a thread-unsafe gram splitter. This is used to split query or document text into fixed-length grams which allows matching without needing or using segmented tokens.
        Specified by:
        getGramSplitter in interface Linguistics