Class PatternTokenizerFactory
- java.lang.Object
-
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.util.TokenizerFactory
-
- org.apache.lucene.analysis.pattern.PatternTokenizerFactory
-
public class PatternTokenizerFactory extends TokenizerFactory
Factory forPatternTokenizer
. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/> </analyzer> </fieldType>
- Since:
- solr1.2
- See Also:
PatternTokenizer
-
-
Field Summary
Fields Modifier and Type Field Description static String
GROUP
static String
PATTERN
-
Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM
-
-
Constructor Summary
Constructors Constructor Description PatternTokenizerFactory(Map<String,String> args)
Creates a new PatternTokenizerFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description PatternTokenizer
create(AttributeSource.AttributeFactory factory, Reader in)
Split the input using configured pattern-
Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, forName, lookupClass, reloadTokenizers
-
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getChar, getClassArg, getLuceneMatchVersion, getOriginalArgs, getSet, isExplicitLuceneMatchVersion, require, require, require, requireChar, setExplicitLuceneMatchVersion
-
-
-
-
Field Detail
-
PATTERN
public static final String PATTERN
- See Also:
- Constant Field Values
-
GROUP
public static final String GROUP
- See Also:
- Constant Field Values
-
-
Method Detail
-
create
public PatternTokenizer create(AttributeSource.AttributeFactory factory, Reader in)
Split the input using configured pattern- Specified by:
create
in classTokenizerFactory
-
-