This class allows simple access to custom Lucene text processing pipelines, a.k.a.
This class allows simple access to custom Lucene text processing pipelines, a.k.a. text analyzers,
which are specified via a JSON schema that hosts named analyzer specifications and mappings from
field name(s) to analyzer(s).
Here's an example schema with descriptions inline as comments:
{
"defaultLuceneMatchVersion": "7.0.0"// Optional. Supplied to analysis components// that don't explicitly specify "luceneMatchVersion"."analyzers": [ // Optional. If not included, all field mappings must be
{ // to fully qualified class names of Lucene Analyzer subclasses."name": "html", // Required. Mappings in the "fields" array below refer to this name."charFilters":[{ // Optional."type": "htmlstrip"// Required. "htmlstrip" is the SPI name for HTMLStripCharFilter
}],
"tokenizer": { // Required. Only one allowed."type": "standard"// Required. "standard" is the SPI name for StandardTokenizer
},
"filters": [{ // Optional."type": "stop", // Required. "stop" is the SPI name for StopFilter"ignoreCase": "true", // Component-specific params"format": "snowball",
"words": "org/apache/lucene/analysis/snowball/english_stop.txt"
}, {
"type": "lowercase"// Required. "lowercase" is the SPI name for LowerCaseFilter
}]
},
{ "name": "stdtok", "tokenizer": { "type": "standard" } }
],
"fields": [{ // Required. To lookup an analyzer for a field, first the "name"// mappings are consulted, and then the "regex" mappings are// tested, in the order specified."name": "keywords", // Either "name" or "regex" is required. "name" matches the field name exactly."analyzer": "org.apache.lucene.analysis.core.KeywordAnalyzer"// FQCN of an Analyzer subclass
}, {
"regex": ".*html.*"// Either "name" or "regex" is required. "regex" must match the whole field name."analyzer": "html"// Reference to the named analyzer specified in the "analyzers" section.
}, {
"regex": ".+", // Either "name" or "regex" is required. "regex" must match the whole field name."analyzer": "stdtok"// Reference to the named analyzer specified in the "analyzers" section.
}]
}
This class allows simple access to custom Lucene text processing pipelines, a.k.a. text analyzers, which are specified via a JSON schema that hosts named analyzer specifications and mappings from field name(s) to analyzer(s).
Here's an example schema with descriptions inline as comments: