Package com.yahoo.language.simple
Class SimpleLinguistics
- java.lang.Object
-
- com.yahoo.language.simple.SimpleLinguistics
-
- All Implemented Interfaces:
Linguistics
- Direct Known Subclasses:
OpenNlpLinguistics
public class SimpleLinguistics extends Object implements Linguistics
Factory of simple linguistic processor implementations. Useful for testing and english-only use cases.- Author:
- bratseth, bjorncs
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface com.yahoo.language.Linguistics
Linguistics.Component
-
-
Constructor Summary
Constructors Constructor Description SimpleLinguistics()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Linguistics other)
Check if another instance is equivalent to this oneCharacterClasses
getCharacterClasses()
Returns a thread-unsafe character classes instance.Detector
getDetector()
Returns a thread-unsafe detector.GramSplitter
getGramSplitter()
Returns a thread-unsafe gram splitter.Normalizer
getNormalizer()
Returns a thread-unsafe normalizer.Segmenter
getSegmenter()
Returns a thread-unsafe segmenter.Stemmer
getStemmer()
Returns a thread-unsafe stemmer or lemmatizer.Tokenizer
getTokenizer()
Returns a thread-unsafe tokenizer.Transformer
getTransformer()
Returns a thread-unsafe transformer.
-
-
-
Method Detail
-
getStemmer
public Stemmer getStemmer()
Description copied from interface:Linguistics
Returns a thread-unsafe stemmer or lemmatizer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with stemming turned on- Specified by:
getStemmer
in interfaceLinguistics
-
getTokenizer
public Tokenizer getTokenizer()
Description copied from interface:Linguistics
Returns a thread-unsafe tokenizer. This is used at indexing time to produce a optionally stemmed and transformed (accent normalized) stream of indexable tokens.- Specified by:
getTokenizer
in interfaceLinguistics
-
getNormalizer
public Normalizer getNormalizer()
Description copied from interface:Linguistics
Returns a thread-unsafe normalizer. This is used at query time to cjk normalize query text.- Specified by:
getNormalizer
in interfaceLinguistics
-
getTransformer
public Transformer getTransformer()
Description copied from interface:Linguistics
Returns a thread-unsafe transformer. This is used at query time to do stemming of search terms to indexes which contains text tokenized with accent normalization turned on- Specified by:
getTransformer
in interfaceLinguistics
-
getSegmenter
public Segmenter getSegmenter()
Description copied from interface:Linguistics
Returns a thread-unsafe segmenter. This is used at query time to find the individual semantic components of search terms to indexes tokenized with segmentation.- Specified by:
getSegmenter
in interfaceLinguistics
-
getDetector
public Detector getDetector()
Description copied from interface:Linguistics
Returns a thread-unsafe detector. The language of the text is a parameter to other linguistic operations. This is used to determine the language of a query or document field when not specified explicitly.- Specified by:
getDetector
in interfaceLinguistics
-
getGramSplitter
public GramSplitter getGramSplitter()
Description copied from interface:Linguistics
Returns a thread-unsafe gram splitter. This is used to split query or document text into fixed-length grams which allows matching without needing or using segmented tokens.- Specified by:
getGramSplitter
in interfaceLinguistics
-
getCharacterClasses
public CharacterClasses getCharacterClasses()
Description copied from interface:Linguistics
Returns a thread-unsafe character classes instance.- Specified by:
getCharacterClasses
in interfaceLinguistics
-
equals
public boolean equals(Linguistics other)
Description copied from interface:Linguistics
Check if another instance is equivalent to this one- Specified by:
equals
in interfaceLinguistics
-
-