Package

org.clulab

sequences

Permalink

package sequences

Visibility
  1. Public
  2. All

Type Members

  1. abstract class BiMEMMSequenceTagger[L, F] extends SequenceTagger[L, F]

    Permalink

    Bidirectional MEMM sequence tagger User: mihais Date: 8/27/17

  2. trait BuildState extends AnyRef

    Permalink
  3. class ColumnsToDocument extends AnyRef

    Permalink
  4. class CombinedLexiconNER extends LexiconNER

    Permalink

    Lexicon-based NER which efficiently recognizes entities from large dictionaries by combining like matchers

    Lexicon-based NER which efficiently recognizes entities from large dictionaries by combining like matchers

    Case insensitive matching is performed by one matcher and case sensitive by the other. Each can account for multiple KBs. Each IntHashTrie stores Ints which indicate which of the KBs an entry comes from. The KBs, either from the kbs or overrideKBs in LexiconNER.apply, have priorities, and the one with highest priority is recorded.

    Annotations
    @SerialVersionUID()
  5. class CompactLexiconNER extends LexiconNER

    Permalink

    Lexicon-based NER similar to CombinedLexiconNER but which also adds efficient serialization, deserialization, and storage by using the CompactTrie

    Lexicon-based NER similar to CombinedLexiconNER but which also adds efficient serialization, deserialization, and storage by using the CompactTrie

    Annotations
    @SerialVersionUID()
  6. class CompactTrie extends Serializable

    Permalink
    Annotations
    @SerialVersionUID()
  7. class FastBuildState extends BuildState

    Permalink
  8. class FastLexiconNERBuilder extends LexiconNERBuilder

    Permalink

    A class that builds either a

    A class that builds either a

    depending on the value of useCompact.

    The building performed here works on a text file. Both kinds of NERs are also Serializable and can be loaded as objects without the text parsing.

  9. class FeatureExtractor extends AnyRef

    Permalink

    Implements common features used in sequence tagging Created by mihais on 6/8/17.

  10. trait FileKbSource extends AnyRef

    Permalink
  11. class FileOverrideKbSource extends OverrideKbSource with FileKbSource

    Permalink
  12. class FileStandardKbSource extends StandardKbSource with FileKbSource

    Permalink
  13. trait KbSource extends AnyRef

    Permalink
  14. trait LexicalVariations extends Serializable

    Permalink

    Generates all accepted lexical variations for an entity User: mihais Date: 10/3/17

    Generates all accepted lexical variations for an entity User: mihais Date: 10/3/17

    Annotations
    @SerialVersionUID()
  15. abstract class LexiconNER extends Tagger[String] with Serializable

    Permalink

    The abstract base class for several concrete child classes used for Named Entity Recognition (NER) based on the contents of lexica, which are lists of words and phrases representing named entities

    The abstract base class for several concrete child classes used for Named Entity Recognition (NER) based on the contents of lexica, which are lists of words and phrases representing named entities

    For all of these classes, NER labels are derived from the file names of the lexica or the records in overrideKBs by the LexiconNERBuilders. This class, via variables USE_FAST and USE_COMPACT, controls which builder use used.

    The collection of child classes is small:

    - The SeparatedLexiconNER is closest to the original implementation. It has a BooleanHashTrie for each label and in that trie, Boolean values indicate that the sequence of strings leading there is a named entity. Each trie structure must be searched for potential named entities.

    - The CombinedLexiconNER stores instead of the Boolean in the BooleanHashTrie an Int in an IntHashTrie. The Int indicates which of the labels is the one to use for the entity just found. In this way, only one trie (or two if there are different case sensitivity settings) needs to be searched no matter how many labels there are (at least until Integer.MAX_VALUE).

    - The CompactLexiconNER uses the same strategy to minimize the number of tries, but also converts the tries into CompactTries which consist of arrays of integers indicating offsets into other arrays. In this way the time it takes to de/serialize the NER is reduced, and some lookup operations are made more efficient.

  16. abstract class LexiconNERBuilder extends AnyRef

    Permalink

    Concrete subclasses are responsible for building various NERs.

    Concrete subclasses are responsible for building various NERs. The mapping is as follows:

    For an explanation of how the NERs differ from each other, see their superclass, LexiconNER.

  17. class LexiconNERShell extends Shell

    Permalink
  18. abstract class MEMMSequenceTagger[L, F] extends SequenceTagger[L, F]

    Permalink

    Sequence tagger using a maximum entrop Markov model (MEMM) User: mihais Date: 8/26/17

  19. class MemoryOverrideKbSource extends OverrideKbSource

    Permalink
  20. class MemoryStandardKbSource extends StandardKbSource

    Permalink
  21. class NoLexicalVariations extends LexicalVariations

    Permalink
    Annotations
    @SerialVersionUID()
  22. abstract class OverrideKbSource extends KbSource

    Permalink
  23. trait ResourceKbSource extends AnyRef

    Permalink
  24. class ResourceOverrideKbSource extends OverrideKbSource with ResourceKbSource

    Permalink
  25. class ResourceStandardKbSource extends StandardKbSource with ResourceKbSource

    Permalink
  26. case class Row(tokens: Array[String]) extends Product with Serializable

    Permalink

    Stores training data for sequence modeling Mandatory columns: 0 - word, 1 - label Optional columns: 2 - POS tag, 3+ SRL arguments

  27. class SeparatedLexiconNER extends LexiconNER

    Permalink

    Lexicon-based NER, which efficiently recognizes entities from large dictionaries

    Lexicon-based NER, which efficiently recognizes entities from large dictionaries

    Note: This is a cleaned-up version of the old RuleNER. It may have been known simply as LexiconNER at one point, but was renamed to emphasize the fact that each KB is stored in a separate matcher (BooleanHashTrie). Other variations get by with fewer matchers.

    Create a SeparatedLexiconNER object using either LexiconNER.apply() or SlowLexiconNERBuilder.build() rather than by the constructor if at all possible. Use it by calling the find() method on a single sentence.

    Annotations
    @SerialVersionUID()
  28. class SeqScorer extends AnyRef

    Permalink

    Computes P, R, F1 scores for the complete mentions produced by a sequence tagger, in the BIO notation User: mihais Date: 2/27/15

  29. trait SequenceTagger[L, F] extends Tagger[L]

    Permalink

    Trait for all sequence taggers User: mihais Date: 8/25/17

  30. class SequenceTaggerEvaluator[L, F] extends AnyRef

    Permalink

    Implements evaluation of a sequence tagger Created by mihais on 6/8/17.

  31. class SequenceTaggerLogger extends AnyRef

    Permalink

    Logger holder User: mihais Date: 8/26/17

  32. class SlowBuildState extends BuildState

    Permalink
  33. class SlowLexiconNERBuilder extends LexiconNERBuilder

    Permalink

    A class that builds a SeparatedLexiconNER

    A class that builds a SeparatedLexiconNER

    The building performed here works on a text file. The SeparatedLexiconNER is also Serializable and can be loaded as an object without the text parsing.

  34. class SpanAndIndex extends AnyRef

    Permalink
  35. abstract class StandardKbSource extends KbSource

    Permalink
  36. trait Tagger[L] extends AnyRef

    Permalink

    High-level trait for a sequence tagger User: mihais Date: 10/12/17

Value Members

  1. object CaseDetector

    Permalink

    Detects the case of a word

  2. object ColumnReader

    Permalink

    Reads the CoNLL-like column format

  3. object ColumnsToDocument

    Permalink

    Converts the CoNLLX column-based format to our Document by reading only words and POS tags Created by mihais on 6/8/17.

    Converts the CoNLLX column-based format to our Document by reading only words and POS tags Created by mihais on 6/8/17. Last Modified: Fix compiler issue: import scala.io.Source.

  4. object CombinedLexiconNER extends Serializable

    Permalink
  5. object CommentedStandardKbSource

    Permalink
  6. object CompactLexiconNER extends Serializable

    Permalink
  7. object CompactTrie extends Serializable

    Permalink
  8. object FeatureExtractor

    Permalink
  9. object LexiconNER extends Serializable

    Permalink
  10. object LexiconNERShell extends App

    Permalink
  11. object NormalizeParens

    Permalink

    Transforms -LRB-, -LCB-, etc.

    Transforms -LRB-, -LCB-, etc. tokens back into "(", "{", etc. This is necessary because the POS WSJ dataset uses the -LRB- conventions to replace words in the dataset, whereas all the others datasets we use (NER, syntax) do not. Note that we continue to keep the *POS tags* as -LRB-, -LCB-, etc., because these are standard Penn Treebank tags. We just replace the words.

  12. object SequenceTaggerEvaluator

    Permalink
  13. object SequenceTaggerLoader

    Permalink
  14. object SequenceTaggerLogger

    Permalink
  15. object SequenceTaggerShell

    Permalink

    Simple shell for sequence taggers Created by mihais on 6/7/17.

Ungrouped