Package com.yahoo.schema.processing
package com.yahoo.schema.processing
Classes in this package (processors) implements some search
definition features by reducing them to simpler features.
The processors are run after parsing of the search definition,
before creating the derived model.
For simplicity, features should always be implemented here
rather than in the derived model if possible.
New processors must be added to the list in Processing.
-
ClassDescriptionAdds the attribute summary transform (
SummaryTransform.ATTRIBUTE
to allSummaryField
having an imported field as source.This processor creates aSDDocumentType
for eachSchema
object which holds all the data that search associates with a document described in a search definition file.Checks that attribute properties only are set for attributes that have data (are created by an indexing statement).Fields that derive to attribute(s) and no indices should use the WORD indexing form, in a feeble attempt to match the most peoples expectations as closely as possible.Checks that bolding or dynamic summary is turned on only for text fields.Validates attribute fields using bool type, ensuring the collection type is supported.Adds field sets for 1) fields defined inside document type 2) fields inside search but outside documentAdds a "fieldName_zcurve" long attribute and "fieldName.distance" and "FieldName.position" summary fields to all position type fields.Propagates dictionary settings from field level to attribute level.Non-primitive key types for map and weighted set forbidden (though OK in document model)This class contains utils used when handling summary fields with dynamic transforms during processing and deriving.The implementation of exact matchingValidates the use of the fast-access property.Computes the right "index commands" for each fieldset in a search definition.Takes the fields and indexes that are of type rank filter, and stores those names on all rank profilesMakes implicitly defined summaries into explicit summariesThis processor adds all implicit summary fields to all registered document summaries.Iterates all imported fields from schema parsing and validates and resolves them into concrete fields from referenced document types.Because of the way the parser works (allowing any token as identifier), it is not practical to limit the syntax of field names there, do it here.This processor modifies all indexing scripts so that they input the value of the owning field by default.This processor modifies all indexing scripts so that they output to the owning field by default.Replaces the 'index' statement of all numerical fields to 'attribute' because we no longer support numerical indexes.Expresses literal boosts in terms of extra indices with rank boost.Takes the aliases set on field by parser and sets them on correct Index or AttributeAll summary fields which are not attributes must currently be present in the default summary class, since the default summary class also defines the docsum.dat format.Warn on inconsistent match settings for any indexIterates all summary fields with 'matched-elements-only' and adjusts transform (if all struct-fields are attributes) and validates that the field type is supported.Validates the match phase settings for all registered rank profiles.Ensures that there are no conflicting types or field settings in multifield indices, either by changing settings or by splitting conflicting fields in multiple ones with different settings.The implementation of "gram" matching - splitting the incoming text and the queries into n-grams for matching.Processes ONNX ranking features of the form: onnx("files/model.onnx", "path/to/output:1") And generates an "onnx-model" configuration as if it was defined in the profile: onnx-model files_model_onnx { file: "files/model.onnx" } Inputs and outputs are resolved in OnnxModelTypeResolver, which must be processed after this.Processes every "onnx-model" element in the schema.Run ExpressionOptimizer on all scripts, to get rid of expressions that have no effect.Validates the 'paged' attribute setting and throws if specified on unsupported types.Validates the predicate fields.Executor of processors.Abstract superclass of all search definition processors.Resolves and assigns types to all functions in a ranking expression, and validates the types of all ranking expressions under a schema instance: Some operators constrain the types of inputs, and first-and second-phase expressions must return scalar values.Class that processes reference fields and removes attribute aspect of such fields from summary.Issues a warning if some function has a reserved name.A search must have a document definition of the same name inside of it, otherwise crashes may occur as late as during feedingAll rank: filter fields should have rank type empty.Validate conflicting settings for sortingEnsure that summary field transforms for fields having the same name are consistent across summary classesEmits a warning for summaries which accesses disk.Fail if: An SD field explicitly says summary:dynamic , but the field is non-string array, wset, or struct.Verifies that the source fields actually refers to a valid field.Verifies that equally named summary fields in different summary classes don't use different fields for source.Adds the corresponding summary transform for all "documentid" summary fields.The implementation of the tag datatypeClass that processes and validates tensor fields.This Processor makes sure all fields with the same name have the sameDataType
.Check that fields with index settings actually creates an index or attributeThe implementation of word matching - with word matching the field is assumed to contain a single "word" - some contiguous sequence of word and number characters - but without changing the data at the indexing side (as with text matching) to enforce this.