Matches the lexicons against this sentence
Matches the lexicons against this sentence
The input sentence
An array of BIO notations the store the outcome of the matches
The class is serializable and this method is used during testing to determine whether a reconstitued object is equal to the original without interfering with the operation of equals and getting into hash codes.
The class is serializable and this method is used during testing to determine whether a reconstitued object is equal to the original without interfering with the operation of equals and getting into hash codes. Is is not necessary for this operation to be efficient or complete.
The object to compare to
Whether this and other are equal, at least as far is serialization is concerned
Words known to appear with and without capitalized letters which help determine whether a span of text is contentful
If false, use the words of a sentence; if true, the lemmas
The abstract base class for several concrete child classes used for Named Entity Recognition (NER) based on the contents of lexica, which are lists of words and phrases representing named entities
For all of these classes, NER labels are derived from the file names of the lexica or the records in overrideKBs by the LexiconNERBuilders. This class, via variables USE_FAST and USE_COMPACT, controls which builder use used.
The collection of child classes is small:
- The SeparatedLexiconNER is closest to the original implementation. It has a BooleanHashTrie for each label and in that trie, Boolean values indicate that the sequence of strings leading there is a named entity. Each trie structure must be searched for potential named entities.
- The CombinedLexiconNER stores instead of the Boolean in the BooleanHashTrie an Int in an IntHashTrie. The Int indicates which of the labels is the one to use for the entity just found. In this way, only one trie (or two if there are different case sensitivity settings) needs to be searched no matter how many labels there are (at least until Integer.MAX_VALUE).
- The CompactLexiconNER uses the same strategy to minimize the number of tries, but also converts the tries into CompactTries which consist of arrays of integers indicating offsets into other arrays. In this way the time it takes to de/serialize the NER is reduced, and some lookup operations are made more efficient.