SimplePatternTokenizerFactory (Lucene 8.8.2 API)

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
- - org.apache.lucene.analysis.util.TokenizerFactory
  - - org.apache.lucene.analysis.pattern.SimplePatternTokenizerFactory

```
public class SimplePatternTokenizerFactory
extends TokenizerFactory
```
Factory for SimplePatternTokenizer, for matching tokens based on the provided regexp.
This tokenizer uses Lucene RegExp pattern matching to construct distinct tokens for the input stream. The syntax is more limited than PatternTokenizer, but the tokenization is quite a bit faster. It takes two arguments:
- "pattern" (required) is the regular expression, according to the syntax described at RegExp
- "maxDeterminizedStates" (optional, default 10000) the limit on total state count for the determined automaton computed from the regexp
The pattern matches the characters to include in a token (not the split characters), and the matching is greedy such that the longest token matching at a given point is created. Empty tokens are never created.
For example, to match tokens delimited by simple whitespace characters:
```
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.SimplePatternTokenizerFactory" pattern="[^ \t\r\n]+"/>
   </analyzer>
 </fieldType>
```
Since:

6.5.0

See Also:

SimplePatternTokenizer

WARNING: This API is experimental and might change in incompatible ways in the next release.

SPI Name (Note: This is case-insensitive. e.g., if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service):

"simplePattern"

Field Summary

Fields
Modifier and Type Field and Description

static String NAME
SPI name

static String PATTERN
- Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
  LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion

Constructor Summary

Constructors
Constructor and Description

SimplePatternTokenizerFactory(Map<String,String> args)
Creates a new SimplePatternTokenizerFactory

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`SimplePatternTokenizer`	`create(AttributeFactory factory)` Creates a TokenStream of the specified input using the given AttributeFactory

Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers

Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - NAME
```
public static final String NAME
```
    SPI name
    
    See Also:
    
    Constant Field Values
  - PATTERN
```
public static final String PATTERN
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - SimplePatternTokenizerFactory
```
public SimplePatternTokenizerFactory(Map<String,String> args)
```
    Creates a new SimplePatternTokenizerFactory
- Method Detail
  - create
```
public SimplePatternTokenizer create(AttributeFactory factory)
```
    Description copied from class: TokenizerFactory
    
    Creates a TokenStream of the specified input using the given AttributeFactory
    
    Specified by:
    
    create in class TokenizerFactory

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.