Fits an Annotator to match exact strings or regex patterns provided in a file against a Document and assigns them an named entity.
Instantiated model of the EntityRulerApproach.
Instantiated model of the EntityRulerApproach. For usage and examples see the documentation of the main class.
Fits an Annotator to match exact strings or regex patterns provided in a file against a Document and assigns them an named entity. The definitions can contain any number of named entities.
There are multiple ways and formats to set the extraction resource. It is possible to set it either as a "JSON", "JSONL" or "CSV" file. A path to the file needs to be provided to
setPatternsResource
. The file format needs to be set as the "format" field in theoption
parameter map and depending on the file type, additional parameters might need to be set.To enable regex extraction,
setEnablePatternRegex(true)
needs to be called.If the file is in a JSON format, then the rule definitions need to be given in a list with the fields "id", "label" and "patterns":
The same fields also apply to a file in the JSONL format:
In order to use a CSV file, an additional parameter "delimiter" needs to be set. In this case, the delimiter might be set by using
.setPatternsResource("patterns.csv", ReadAs.TEXT, Map("format"->"csv", "delimiter" -> "\\|"))
Example
In this example, the entities file as the form of
where each line represents an entity and the associated string delimited by "|".