Class/Object

org.clulab.sequences

LexiconNER

Related Docs: object LexiconNER | package sequences

Permalink

abstract class LexiconNER extends Tagger[String] with Serializable

The abstract base class for several concrete child classes used for Named Entity Recognition (NER) based on the contents of lexica, which are lists of words and phrases representing named entities

For all of these classes, NER labels are derived from the file names of the lexica or the records in overrideKBs by the LexiconNERBuilders. This class, via variables USE_FAST and USE_COMPACT, controls which builder use used.

The collection of child classes is small:

- The SeparatedLexiconNER is closest to the original implementation. It has a BooleanHashTrie for each label and in that trie, Boolean values indicate that the sequence of strings leading there is a named entity. Each trie structure must be searched for potential named entities.

- The CombinedLexiconNER stores instead of the Boolean in the BooleanHashTrie an Int in an IntHashTrie. The Int indicates which of the labels is the one to use for the entity just found. In this way, only one trie (or two if there are different case sensitivity settings) needs to be searched no matter how many labels there are (at least until Integer.MAX_VALUE).

- The CompactLexiconNER uses the same strategy to minimize the number of tries, but also converts the tries into CompactTries which consist of arrays of integers indicating offsets into other arrays. In this way the time it takes to de/serialize the NER is reduced, and some lookup operations are made more efficient.

Linear Supertypes
Serializable, Serializable, Tagger[String], AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LexiconNER
  2. Serializable
  3. Serializable
  4. Tagger
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new LexiconNER(knownCaseInsensitives: Set[String], useLemmas: Boolean)

    Permalink

    knownCaseInsensitives

    Words known to appear with and without capitalized letters which help determine whether a span of text is contentful

    useLemmas

    If false, use the words of a sentence; if true, the lemmas

Abstract Value Members

  1. abstract def find(sentence: Sentence): Array[String]

    Permalink

    Matches the lexicons against this sentence

    Matches the lexicons against this sentence

    sentence

    The input sentence

    returns

    An array of BIO notations the store the outcome of the matches

    Definition Classes
    LexiconNERTagger
  2. abstract def getLabels: Seq[String]

    Permalink

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. val contentQualifiers: Array[(IndexedSeqView[String, Array[String]]) ⇒ Boolean]

    Permalink
  7. def contentfulSpan(sentence: Sentence, start: Int, length: Int): Boolean

    Permalink
    Attributes
    protected
  8. def countCharacters(wordsView: IndexedSeqView[String, Array[String]]): Int

    Permalink
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def equalsForSerialization(other: AnyRef): Boolean

    Permalink

    The class is serializable and this method is used during testing to determine whether a reconstitued object is equal to the original without interfering with the operation of equals and getting into hash codes.

    The class is serializable and this method is used during testing to determine whether a reconstitued object is equal to the original without interfering with the operation of equals and getting into hash codes. Is is not necessary for this operation to be efficient or complete.

    other

    The object to compare to

    returns

    Whether this and other are equal, at least as far is serialization is concerned

  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  14. def getLemmas(sentence: Sentence): Array[String]

    Permalink
    Attributes
    protected
  15. val getTokens: (Sentence) ⇒ Array[String]

    Permalink
    Attributes
    protected
  16. def getWords(sentence: Sentence): Array[String]

    Permalink
    Attributes
    protected
  17. def hasCondition(wordsView: IndexedSeqView[String, Array[String]], condition: (Char) ⇒ Boolean): Boolean

    Permalink
  18. def hasDigit(wordsView: IndexedSeqView[String, Array[String]]): Boolean

    Permalink
  19. def hasLetter(wordsView: IndexedSeqView[String, Array[String]]): Boolean

    Permalink
  20. def hasSpace(wordsView: IndexedSeqView[String, Array[String]]): Boolean

    Permalink
  21. def hasUpperCaseLetters(wordsView: IndexedSeqView[String, Array[String]]): Boolean

    Permalink
  22. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  23. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  24. val knownCaseInsensitives: Set[String]

    Permalink

    Words known to appear with and without capitalized letters which help determine whether a span of text is contentful

  25. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  26. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  29. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  30. val useLemmas: Boolean

    Permalink

    If false, use the words of a sentence; if true, the lemmas

  31. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  33. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Tagger[String]

Inherited from AnyRef

Inherited from Any

Ungrouped