sequences

Type Members

abstract class BiMEMMSequenceTagger[L, F] extends SequenceTagger[L, F]

Bidirectional MEMM sequence tagger User: mihais Date: 8/27/17
trait BuildState extends AnyRef
class ColumnsToDocument extends AnyRef
class CombinedLexiconNER extends LexiconNER

Lexicon-based NER which efficiently recognizes entities from large dictionaries by combining like matchers
Lexicon-based NER which efficiently recognizes entities from large dictionaries by combining like matchers
Case insensitive matching is performed by one matcher and case sensitive by the other. Each can account for multiple KBs. Each IntHashTrie stores Ints which indicate which of the KBs an entry comes from. The KBs, either from the kbs or overrideKBs in LexiconNER.apply, have priorities, and the one with highest priority is recorded.

Annotations
@SerialVersionUID()
class CompactLexiconNER extends LexiconNER

Lexicon-based NER similar to CombinedLexiconNER but which also adds efficient serialization, deserialization, and storage by using the CompactTrie
Lexicon-based NER similar to CombinedLexiconNER but which also adds efficient serialization, deserialization, and storage by using the CompactTrie

Annotations
@SerialVersionUID()
class CompactTrie extends Serializable

Annotations
@SerialVersionUID()
class FastBuildState extends BuildState
class FastLexiconNERBuilder extends LexiconNERBuilder

A class that builds either a
A class that builds either a
- CombinedLexiconNER or
- CompactLexiconNER
depending on the value of useCompact.
The building performed here works on a text file. Both kinds of NERs are also Serializable and can be loaded as objects without the text parsing.
class FeatureExtractor extends AnyRef

Implements common features used in sequence tagging Created by mihais on 6/8/17.
trait FileKbSource extends AnyRef
class FileOverrideKbSource extends OverrideKbSource with FileKbSource
class FileStandardKbSource extends StandardKbSource with FileKbSource
trait KbSource extends AnyRef
trait LexicalVariations extends Serializable

Generates all accepted lexical variations for an entity User: mihais Date: 10/3/17
Generates all accepted lexical variations for an entity User: mihais Date: 10/3/17

Annotations
@SerialVersionUID()
abstract class LexiconNER extends Tagger[String] with Serializable

The abstract base class for several concrete child classes used for Named Entity Recognition (NER) based on the contents of lexica, which are lists of words and phrases representing named entities
The abstract base class for several concrete child classes used for Named Entity Recognition (NER) based on the contents of lexica, which are lists of words and phrases representing named entities
For all of these classes, NER labels are derived from the file names of the lexica or the records in overrideKBs by the LexiconNERBuilders. This class, via variables USE_FAST and USE_COMPACT, controls which builder use used.
The collection of child classes is small:
- The SeparatedLexiconNER is closest to the original implementation. It has a BooleanHashTrie for each label and in that trie, Boolean values indicate that the sequence of strings leading there is a named entity. Each trie structure must be searched for potential named entities.
- The CombinedLexiconNER stores instead of the Boolean in the BooleanHashTrie an Int in an IntHashTrie. The Int indicates which of the labels is the one to use for the entity just found. In this way, only one trie (or two if there are different case sensitivity settings) needs to be searched no matter how many labels there are (at least until Integer.MAX_VALUE).
- The CompactLexiconNER uses the same strategy to minimize the number of tries, but also converts the tries into CompactTries which consist of arrays of integers indicating offsets into other arrays. In this way the time it takes to de/serialize the NER is reduced, and some lookup operations are made more efficient.
abstract class LexiconNERBuilder extends AnyRef

Concrete subclasses are responsible for building various NERs.
Concrete subclasses are responsible for building various NERs. The mapping is as follows:
- The SlowLexiconNERBuilder builds a SeparatedLexiconNER.
- The FastLexiconNERBuilder builds either a
- CombinedLexiconNER or a
- CompactLexiconNER, depending on the value of useCompact.
For an explanation of how the NERs differ from each other, see their superclass, LexiconNER.
class LexiconNERShell extends Shell
abstract class MEMMSequenceTagger[L, F] extends SequenceTagger[L, F]

Sequence tagger using a maximum entrop Markov model (MEMM) User: mihais Date: 8/26/17
class MemoryOverrideKbSource extends OverrideKbSource
class MemoryStandardKbSource extends StandardKbSource
class NoLexicalVariations extends LexicalVariations

Annotations
@SerialVersionUID()
abstract class OverrideKbSource extends KbSource
trait ResourceKbSource extends AnyRef
class ResourceOverrideKbSource extends OverrideKbSource with ResourceKbSource
class ResourceStandardKbSource extends StandardKbSource with ResourceKbSource
case class Row(tokens: Array[String]) extends Product with Serializable

Stores training data for sequence modeling Mandatory columns: 0 - word, 1 - label Optional columns: 2 - POS tag, 3+ SRL arguments
class SeparatedLexiconNER extends LexiconNER

Lexicon-based NER, which efficiently recognizes entities from large dictionaries
Lexicon-based NER, which efficiently recognizes entities from large dictionaries
Note: This is a cleaned-up version of the old RuleNER. It may have been known simply as LexiconNER at one point, but was renamed to emphasize the fact that each KB is stored in a separate matcher (BooleanHashTrie). Other variations get by with fewer matchers.
Create a SeparatedLexiconNER object using either LexiconNER.apply() or SlowLexiconNERBuilder.build() rather than by the constructor if at all possible. Use it by calling the find() method on a single sentence.

Annotations
@SerialVersionUID()
class SeqScorer extends AnyRef

Computes P, R, F1 scores for the complete mentions produced by a sequence tagger, in the BIO notation User: mihais Date: 2/27/15
trait SequenceTagger[L, F] extends Tagger[L]

Trait for all sequence taggers User: mihais Date: 8/25/17
class SequenceTaggerEvaluator[L, F] extends AnyRef

Implements evaluation of a sequence tagger Created by mihais on 6/8/17.
class SequenceTaggerLogger extends AnyRef

Logger holder User: mihais Date: 8/26/17
class SlowBuildState extends BuildState
class SlowLexiconNERBuilder extends LexiconNERBuilder

A class that builds a SeparatedLexiconNER
A class that builds a SeparatedLexiconNER
The building performed here works on a text file. The SeparatedLexiconNER is also Serializable and can be loaded as an object without the text parsing.
class SpanAndIndex extends AnyRef
abstract class StandardKbSource extends KbSource
trait Tagger[L] extends AnyRef

High-level trait for a sequence tagger User: mihais Date: 10/12/17

Value Members

object CaseDetector

Detects the case of a word
object ColumnReader

Reads the CoNLL-like column format
object ColumnsToDocument

Converts the CoNLLX column-based format to our Document by reading only words and POS tags Created by mihais on 6/8/17.
Converts the CoNLLX column-based format to our Document by reading only words and POS tags Created by mihais on 6/8/17. Last Modified: Fix compiler issue: import scala.io.Source.
object CombinedLexiconNER extends Serializable
object CommentedStandardKbSource
object CompactLexiconNER extends Serializable
object CompactTrie extends Serializable
object FeatureExtractor
object LexiconNER extends Serializable
object LexiconNERShell extends App
object NormalizeParens

Transforms -LRB-, -LCB-, etc.
Transforms -LRB-, -LCB-, etc. tokens back into "(", "{", etc. This is necessary because the POS WSJ dataset uses the -LRB- conventions to replace words in the dataset, whereas all the others datasets we use (NER, syntax) do not. Note that we continue to keep the *POS tags* as -LRB-, -LCB-, etc., because these are standard Penn Treebank tags. We just replace the words.
object SequenceTaggerEvaluator
object SequenceTaggerLoader
object SequenceTaggerLogger
object SequenceTaggerShell

Simple shell for sequence taggers Created by mihais on 6/7/17.

package sequences

Type Members

abstract class BiMEMMSequenceTagger[L, F] extends SequenceTagger[L, F]

trait BuildState extends AnyRef

class ColumnsToDocument extends AnyRef

class CombinedLexiconNER extends LexiconNER

class CompactLexiconNER extends LexiconNER

class CompactTrie extends Serializable

class FastBuildState extends BuildState

class FastLexiconNERBuilder extends LexiconNERBuilder

class FeatureExtractor extends AnyRef

trait FileKbSource extends AnyRef

class FileOverrideKbSource extends OverrideKbSource with FileKbSource

class FileStandardKbSource extends StandardKbSource with FileKbSource

trait KbSource extends AnyRef

trait LexicalVariations extends Serializable

abstract class LexiconNER extends Tagger[String] with Serializable

abstract class LexiconNERBuilder extends AnyRef

class LexiconNERShell extends Shell

abstract class MEMMSequenceTagger[L, F] extends SequenceTagger[L, F]

class MemoryOverrideKbSource extends OverrideKbSource

class MemoryStandardKbSource extends StandardKbSource

class NoLexicalVariations extends LexicalVariations

abstract class OverrideKbSource extends KbSource

trait ResourceKbSource extends AnyRef

class ResourceOverrideKbSource extends OverrideKbSource with ResourceKbSource

class ResourceStandardKbSource extends StandardKbSource with ResourceKbSource

case class Row(tokens: Array[String]) extends Product with Serializable

class SeparatedLexiconNER extends LexiconNER

class SeqScorer extends AnyRef

trait SequenceTagger[L, F] extends Tagger[L]

class SequenceTaggerEvaluator[L, F] extends AnyRef

class SequenceTaggerLogger extends AnyRef

class SlowBuildState extends BuildState

class SlowLexiconNERBuilder extends LexiconNERBuilder

class SpanAndIndex extends AnyRef

abstract class StandardKbSource extends KbSource

trait Tagger[L] extends AnyRef

Value Members

object CaseDetector

object ColumnReader

object ColumnsToDocument

object CombinedLexiconNER extends Serializable

object CommentedStandardKbSource

object CompactLexiconNER extends Serializable

object CompactTrie extends Serializable

object FeatureExtractor

object LexiconNER extends Serializable

object LexiconNERShell extends App

object NormalizeParens

object SequenceTaggerEvaluator

object SequenceTaggerLoader

object SequenceTaggerLogger

object SequenceTaggerShell

Ungrouped