public class SimplePatternTokenizerFactory extends TokenizerFactory
SimplePatternTokenizer, for matching tokens based on the provided regexp.
This tokenizer uses Lucene RegExp pattern matching to construct distinct tokens
for the input stream. The syntax is more limited than PatternTokenizer, but the
tokenization is quite a bit faster. It takes two arguments:
RegExpThe pattern matches the characters to include in a token (not the split characters), and the matching is greedy such that the longest token matching at a given point is created. Empty tokens are never created.
For example, to match tokens delimited by simple whitespace characters:
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.SimplePatternTokenizerFactory" pattern="[^ \t\r\n]+"/>
</analyzer>
</fieldType>SimplePatternTokenizer| Modifier and Type | Field and Description |
|---|---|
static String |
NAME
SPI name
|
static String |
PATTERN |
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion| Constructor and Description |
|---|
SimplePatternTokenizerFactory(Map<String,String> args)
Creates a new SimplePatternTokenizerFactory
|
| Modifier and Type | Method and Description |
|---|---|
SimplePatternTokenizer |
create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactory
|
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizersget, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNamespublic static final String NAME
public static final String PATTERN
public SimplePatternTokenizer create(AttributeFactory factory)
TokenizerFactorycreate in class TokenizerFactoryCopyright © 2000-2021 Apache Software Foundation. All Rights Reserved.