A map of tries to be matched for each given category label The order of the matchers is important: it indicates priority during ties (first has higher priority)
Set of single-token entity names that can be spelled using lower case, according to the KB(s)
If true, tokens are matched using lemmas, otherwise using words Author: mihais Created: 5/11/15 Modified: 9/27/17 - Clean up from RuleNER into LexiconNER
The class is serializable and this method is used during testing to determine whether a reconstitued object is equal to the original without interfering with the operation of equals and getting into hash codes.
The class is serializable and this method is used during testing to determine whether a reconstitued object is equal to the original without interfering with the operation of equals and getting into hash codes. Is is not necessary for this operation to be efficient or complete.
The object to compare to
Whether this and other are equal, at least as far is serialization is concerned
Matches the lexicons against this sentence
Matches the lexicons against this sentence
The input sentence
An array of BIO notations the store the outcome of the matches
Finds the longest match across all matchers.
Finds the longest match across all matchers. This means that the longest match is always chosen, even if coming from a matcher with lower priority Only ties are disambiguated according to the order provided in the constructor
Words known to appear with and without capitalized letters which help determine whether a span of text is contentful
Words known to appear with and without capitalized letters which help determine whether a span of text is contentful
A map of tries to be matched for each given category label The order of the matchers is important: it indicates priority during ties (first has higher priority)
If false, use the words of a sentence; if true, the lemmas
If false, use the words of a sentence; if true, the lemmas
Lexicon-based NER, which efficiently recognizes entities from large dictionaries
Note: This is a cleaned-up version of the old RuleNER. It may have been known simply as LexiconNER at one point, but was renamed to emphasize the fact that each KB is stored in a separate matcher (BooleanHashTrie). Other variations get by with fewer matchers.
Create a SeparatedLexiconNER object using either LexiconNER.apply() or SlowLexiconNERBuilder.build() rather than by the constructor if at all possible. Use it by calling the find() method on a single sentence.