Class PatternTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.util.AbstractAnalysisFactory
org.apache.lucene.analysis.util.TokenizerFactory
org.apache.lucene.analysis.pattern.PatternTokenizerFactory
Factory for
PatternTokenizer
.
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/> </analyzer> </fieldType>
- Since:
- solr1.2
- See Also:
-
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM
-
Constructor Summary
ConstructorsConstructorDescriptionPatternTokenizerFactory
(Map<String, String> args) Creates a new PatternTokenizerFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate
(AttributeSource.AttributeFactory factory, Reader in) Split the input using configured patternMethods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, forName, lookupClass, reloadTokenizers
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getChar, getClassArg, getLuceneMatchVersion, getOriginalArgs, getSet, isExplicitLuceneMatchVersion, require, require, require, requireChar, setExplicitLuceneMatchVersion
-
Field Details
-
PATTERN
- See Also:
-
GROUP
- See Also:
-
-
Constructor Details
-
PatternTokenizerFactory
Creates a new PatternTokenizerFactory
-
-
Method Details
-
create
Split the input using configured pattern- Specified by:
create
in classTokenizerFactory
-