public abstract class NCModelFileAdapter extends NCModelAdapter
One of the use cases this adapter supports is ability to load model configuration from the external JSON/YAML file and then update it in the code. For example, a model can load its configuration from JSON file and then add intents or synonyms loaded from a database to a certain model element. To support this usage all getters return internal mutable sets or maps, i.e. you can modify them in your sub-class constructors and those modifications will alter the model's configuration.
Read full documentation in Data Model section and review examples.
NCModelAdapter
DFLT_ENABLED_BUILTIN_TOKENS, DFLT_IS_DUP_SYNONYMS_ALLOWED, DFLT_IS_NO_NOUNS_ALLOWED, DFLT_IS_NO_USER_TOKENS_ALLOWED, DFLT_IS_NON_ENGLISH_ALLOWED, DFLT_IS_NOT_LATIN_CHARSET_ALLOWED, DFLT_IS_PERMUTATE_SYNONYMS, DFLT_IS_SWEAR_WORDS_ALLOWED, DFLT_JIGGLE_FACTOR, DFLT_MAX_FREE_WORDS, DFLT_MAX_SUSPICIOUS_WORDS, DFLT_MAX_TOKENS, DFLT_MAX_TOTAL_SYNONYMS, DFLT_MAX_UNKNOWN_WORDS, DFLT_MAX_WORDS, DFLT_METADATA, DFLT_MIN_NON_STOPWORDS, DFLT_MIN_TOKENS, DFLT_MIN_WORDS
Constructor and Description |
---|
NCModelFileAdapter(String filePath)
Creates new model loading its configuration from given file path.
|
NCModelFileAdapter(URI uri)
Creates new model loading its configuration from given URI.
|
Modifier and Type | Method and Description |
---|---|
Set<String> |
getAdditionalStopWords()
Gets an optional list of stopwords to add to the built-in ones.
|
String |
getDescription()
Gets optional short model description.
|
Set<NCElement> |
getElements()
Gets a set of model elements or named entities.
|
Set<String> |
getEnabledBuiltInTokens()
Gets a set of IDs for built-in named entities (tokens) that should be enabled and detected for this model.
|
Set<String> |
getExamples()
Gets an optional list of example sentences demonstrating what can be asked with this model.
|
Set<String> |
getExcludedStopWords()
Gets an optional list of stopwords to exclude from the built-in list of stopwords.
|
String |
getId()
Gets unique, immutable ID of this model.
|
Set<String> |
getIntents()
Gets list of intents declared in JSON/YML model definition, if any.
|
int |
getJiggleFactor()
Measure of how much sparsity is allowed when user input words are reordered in attempt to
match the multi-word synonyms.
|
Map<String,String> |
getMacros()
Gets an optional map of macros to be used in this model.
|
int |
getMaxFreeWords()
Gets maximum number of free words until automatic rejection.
|
int |
getMaxSuspiciousWords()
Gets maximum number of suspicious words until automatic rejection.
|
int |
getMaxTokens()
Gets maximum number of all tokens (system and user defined) above which user input will be
automatically rejected as too long.
|
int |
getMaxTotalSynonyms()
Total number of synonyms allowed per model.
|
int |
getMaxUnknownWords()
Gets maximum number of unknown words until automatic rejection.
|
int |
getMaxWords()
Gets maximum word count (including stopwords) above which user input will be automatically
rejected as too long.
|
Map<String,Object> |
getMetadata()
Gets metadata.
|
int |
getMinNonStopwords()
Gets minimum word count (excluding stopwords) below which user input will be automatically rejected
as ambiguous sentence.
|
int |
getMinTokens()
Gets minimum number of all tokens (system and user defined) below which user input will be
automatically rejected as too short.
|
int |
getMinWords()
Gets minimum word count (including stopwords) below which user input will be automatically
rejected as too short.
|
String |
getName()
Gets descriptive name of this model.
|
String |
getOrigin()
Gets this file model adapter origin (file path or URI).
|
List<NCCustomParser> |
getParsers()
Gets optional user-defined model element parsers for custom NER implementations.
|
Set<String> |
getSuspiciousWords()
Gets an optional list of suspicious words.
|
String |
getVersion()
Gets the version of this model using semantic versioning.
|
boolean |
isDupSynonymsAllowed()
Whether or not duplicate synonyms are allowed.
|
boolean |
isNonEnglishAllowed()
Whether or not to allow non-English language in user input.
|
boolean |
isNoNounsAllowed()
Whether or not to allow user input without a single noun.
|
boolean |
isNotLatinCharsetAllowed()
Whether or not to allow non-Latin charset in user input.
|
boolean |
isNoUserTokensAllowed()
Whether or not to allow the user input with no user token detected.
|
boolean |
isPermutateSynonyms()
Whether or not to permutate multi-word synonyms.
|
boolean |
isSwearWordsAllowed()
Whether or not to allow known English swear words in user input.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
onContext, onError, onMatchedIntent, onParsedVariant, onRejection, onResult
meta, meta, metaOpt, metax
onDiscard, onInit
public NCModelFileAdapter(String filePath)
.js
,
.json
, .yml
and .yaml
files are supported. File path can be
classpath relative or absolute.filePath
- Classpath relative or absolute file path to load model configuration from.NCException
- Thrown in case of any errors loading model configuration.public NCModelFileAdapter(URI uri)
.js
,
.json
, .yml
and .yaml
resources are supported.uri
- URI to load model configuration from.NCException
- Thrown in case of any errors loading model configuration.public String getOrigin()
public Set<String> getIntents()
public String getId()
NCModelView
Note that model IDs are immutable while name and version can be changed freely. Changing model ID is equal to creating a completely new model. Model IDs (unlike name and version) are not exposed to the end user and only serve a technical purpose. ID's max length is 32 characters.
JSON
If using JSON/YAML model presentation this is set by id
property:
{ "id": "my.model.id" }
getId
in interface NCModelView
getId
in class NCModelAdapter
public String getName()
NCModelView
JSON
If using JSON/YAML model presentation this is set by name
property:
{ "name": "My Model" }
getName
in interface NCModelView
getName
in class NCModelAdapter
public String getVersion()
NCModelView
JSON
If using JSON/YAML model presentation this is set by version
property:
{ "version": "1.0.0" }
getVersion
in interface NCModelView
getVersion
in class NCModelAdapter
public String getDescription()
NCModelView
JSON
If using JSON/YAML model presentation this is set by description
property:
{ "description": "Model description..." }
public int getMaxUnknownWords()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MAX_UNKNOWN_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxUnknownWords
property:
{ "maxUnknownWords": 2 }
public int getMaxFreeWords()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MAX_FREE_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxFreeWords
property:
{ "maxFreeWords": 2 }
public int getMaxSuspiciousWords()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MAX_SUSPICIOUS_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxSuspiciousWords
property:
{ "maxSuspiciousWords": 2 }
public int getMinWords()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MIN_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by minWords
property:
{ "minWords": 2 }
public int getMaxWords()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MAX_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxWords
property:
{ "maxWords": 50 }
public int getMinTokens()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MIN_TOKENS
will be used.
JSON
If using JSON/YAML model presentation this is set by minTokens
property:
{ "minTokens": 1 }
public int getMaxTokens()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MAX_TOKENS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxTokens
property:
{ "maxTokens": 100 }
public int getMinNonStopwords()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MIN_NON_STOPWORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by minNonStopwords
property:
{ "minNonStopwords": 2 }
public boolean isNonEnglishAllowed()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_IS_NON_ENGLISH_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by nonEnglishAllowed
property:
{ "nonEnglishAllowed": false }
public boolean isNotLatinCharsetAllowed()
NCModelView
false
such user input will be automatically
rejected.
Default
If not provided by the model the default value NCModelView.DFLT_IS_NOT_LATIN_CHARSET_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by nonLatinCharsetAllowed
property:
{ "nonLatinCharsetAllowed": false }
public boolean isSwearWordsAllowed()
NCModelView
false
- user input with
detected known English swear words will be automatically rejected.
Default
If not provided by the model the default value NCModelView.DFLT_IS_SWEAR_WORDS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by swearWordsAllowed
property:
{ "swearWordsAllowed": false }
public boolean isNoNounsAllowed()
NCModelView
false
such user input
will be automatically rejected. Typically for command or query-oriented models this should be set to
false
as any command or query should have at least one noun subject. However, for conversational
models this can be set to false
to allow for a smalltalk and one-liners.
Default
If not provided by the model the default value NCModelView.DFLT_IS_NO_NOUNS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by noNounsAllowed
property:
{ "noNounsAllowed": false }
public boolean isPermutateSynonyms()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_IS_PERMUTATE_SYNONYMS
will be used.
JSON
If using JSON/YAML model presentation this is set by permutateSynonyms
property:
{ "permutateSynonyms": true }
public boolean isDupSynonymsAllowed()
NCModelView
true
- the model will pick the random
model element when multiple elements found due to duplicate synonyms. If false
- model
will print error message and will not deploy.
Default
If not provided by the model the default value NCModelView.DFLT_IS_DUP_SYNONYMS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by dupSynonymsAllowed
property:
{ "dupSynonymsAllowed": true }
public int getMaxTotalSynonyms()
NCModelView
Default
If not provided by the model the default value NCModelView.DFLT_MAX_TOTAL_SYNONYMS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxTotalSynonyms
property:
{ "maxTotalSynonyms": true }
public boolean isNoUserTokensAllowed()
NCModelView
false
such user
input will be automatically rejected. Note that this property only applies to user-defined
token (i.e. model element). Even if there are no user defined tokens, the user input may still
contain system token like nlpcraft:city
or nlpcraft:date
. In many cases models
should be build to allow user input without user tokens. However, set it to false
if presence
of at least one user token is mandatory.
Default
If not provided by the model the default value NCModelView.DFLT_IS_NO_USER_TOKENS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by noUserTokensAllowed
property:
{ "noUserTokensAllowed": false }
public int getJiggleFactor()
NCModelView
2
proved to be a good default value in most cases. Note that larger
values mean that synonym words can be almost in any random place in the user input which makes
synonym matching practically meaningless. Maximum value is 4
.
Default
If not provided by the model the default value NCModelView.DFLT_JIGGLE_FACTOR
will be used.
JSON
If using JSON/YAML model presentation this is set by jiggleFactor
property:
{ "jiggleFactor": 2 }
public Map<String,Object> getMetadata()
NCMetadata
NCMetadata.meta(String)
,
NCMetadata.metaOpt(String)
,
NCMetadata.meta(String, Object)
public Set<String> getAdditionalStopWords()
NCModelView
Stopword is an individual word (i.e. sequence of characters excluding whitespaces) that contribute no semantic meaning to the sentence. For example, 'the', 'wow', or 'hm' provide no semantic meaning to the sentence and can be safely excluded from semantic analysis.
NLPCraft comes with a carefully selected list of English stopwords which should be sufficient for a majority of use cases. However, you can add additional stopwords to this list. The typical use for user-defined stopwords are jargon parasite words that are specific to the model's domain.
JSON
If using JSON/YAML model presentation this is set by additionalStopwords
property:
{ "additionalStopwords": [ "stopword1", "stopword2" ] }
public Set<String> getExcludedStopWords()
NCModelView
Just like you can add additional stopwords via NCModelView.getAdditionalStopWords()
you can exclude
certain words from the list of stopwords. This can be useful in rare cases when default built-in
stopword has specific meaning of your model. In order to process them you need to exclude them
from the list of stopwords.
JSON
If using JSON/YAML model presentation this is set by excludedStopwords
property:
{ "excludedStopwords": [ "excludedStopword1", "excludedStopword2" ] }
public Set<String> getExamples()
NCModelView
JSON
If using JSON/YAML model presentation this is set by examples
property:
{ "examples": [ "Example questions one", "Another sample sentence" ] }
public Set<String> getSuspiciousWords()
NCModelView
MAX_SUSPICIOUS_WORDS
property set to zero.
Note that by setting model's metadata MAX_SUSPICIOUS_WORDS
property to non-zero value you can
adjust the sensitivity of suspicious words auto-rejection logic.
JSON
If using JSON/YAML model presentation this is set by suspiciousWords
property:
{ "suspiciousWords": [ "sex", "porn" ] }
public Map<String,String> getMacros()
NCModelView
NCElement
for documentation on macros.
JSON
If using JSON/YAML model presentation this is set by macros
property:
{ "macros": [ { "name": "<OF>", "macro": "{of|for|per}" }, { "name": "<CUR>", "macro": "{current|present|moment|now}" } ] }
public Set<NCElement> getElements()
NCModelView
An element is the main building block of the semantic model. Data model element defines a named entity
that will be automatically recognized in the user input. See also NCModelView.getParsers()
method on how
to provide programmatic named entity recognizer (NER) implementations.
Note that unless model elements are loaded dynamically it is highly recommended to declare model
elements in the external JSON/YAML model configuration (under elements
property):
{ "elements": [ { "id": "wt:hist", "synonyms": [ "{<WEATHER>|*} <HISTORY>", "<HISTORY> {<OF>|*} <WEATHER>" ], "description": "Past weather conditions." } ] }
NCModelView.getParsers()
public Set<String> getEnabledBuiltInTokens()
NCModelView
Default
The following built-in tokens are enabled by default implementation of this method:
nlpcraft:date
nlpcraft:continent
nlpcraft:subcontinent
nlpcraft:country
nlpcraft:metro
nlpcraft:region
nlpcraft:city
nlpcraft:num
nlpcraft:coordinate
nlpcraft:relation
nlpcraft:sort
nlpcraft:limit
NCToken
for the list of all supported built-in tokens.
JSON
If using JSON/YAML model presentation this is set by enabledBuiltInTokens
property:
{ "enabledBuiltInTokens": [ "google:person", "google:location", "stanford:money" ] }
public List<NCCustomParser> getParsers()
NCModelView
By default the semantic data model detects its elements by their synonyms, regexp or DSL expressions. However, in some cases these methods are not expressive enough. In such cases, a user-defined parser can be defined for the model that would allow the user to define its own NER logic to detect the model elements in the user input programmatically. Note that there can be only one custom parser per model and it can detect any number of model elements (named entities).
JSON
If using JSON/YAML model presentation this is set by parser
property which is an array
with every element being a fully qualified class name implementing NCCustomParser
interface:
{ "parsers": [ "my.package.Parser1", "my.package.Parser2" ] }
null
if not used (default).
Copyright © 2020 Apache Software Foundation