Package com.yahoo.language.process
Interface Segmenter
- All Known Implementing Classes:
SegmenterImpl
public interface Segmenter
Interface providing segmentation, i.e. splitting of CJK character blocks into separate tokens. This is primarily a
convenience feature for users who don't need full tokenization (or who use a separate tokenizer and only need CJK
processing).
- Author:
- Mathias Mølster Lidal
-
Method Summary
-
Method Details
-
segment
Split input-string into tokens, and returned a list of tokens in unprocessed form (i.e. lowercased, normalized and stemmed if applicable, see @link{StemMode} for list of stemming options). It is assumed that the input only contains word-characters, any punctuation and spacing tokens will be removed.- Parameters:
input
- the text to segment.language
- language of input text.- Returns:
- the list of segments.
- Throws:
ProcessingException
- if an exception is encountered during processing
-