A BIOETag is a tag that us to represent epic.sequences.Segmentations as epic.sequences.TaggedSequences.
A Linear Chain Conditional Random Field.
A Linear Chain Conditional Random Field. Useful for POS tagging, etc.
As usual in Epic, all the heavy lifting is done in the companion object and Marginals.
CRFs can produce epic.sequences.TaggedSequence from an input sequence of words. They can also produce marginals, etc.
A Gazeteer is a map from IndexedSeq[W]->L.
A Gazeteer is a map from IndexedSeq[W]->L. That is, it maps strings of words to a label that we've seen before. For example, you might use a list of countries. These are very useful for named entity recognition.
TODO
Factory class for making a epic.sequences.SemiCRFModel based on some data and an optional gazetteer.
A epic.sequences.Segmenter splits up a sentence into labeled segments.
A epic.sequences.Segmenter splits up a sentence into labeled segments. For instance, it might find all the people, places and things (Named Entity Recognition) in a document.
the type of tag that is annotated
A Semi-Markov Linear Chain Conditional Random Field, that is, the length of time spent in a state may be longer than 1 tick.
A Semi-Markov Linear Chain Conditional Random Field, that is, the length of time spent in a state may be longer than 1 tick. Useful for field segmentation or NER.
As usual in Epic, all the heavy lifting is done in the companion object and Marginals.
A tagged sequence has a sequence of tags and a sequence of words that are in one-to-one correspondence.
A tagged sequence has a sequence of tags and a sequence of words that are in one-to-one correspondence. think POS tags etc.
A Tagger assigns a sequence of Tags to a
A Tagger assigns a sequence of Tags to a
the type of tag that is annotated
HiddenMarkovModel, which is the generative special case of a epic.sequences.CRF.
Simple class that reads in a bunch of files and parses them.
Simple class that reads in a bunch of files and parses them. Output is dumped to standard out.
Object for evaluating epic.sequences.Segmentations.
Object for evaluating epic.sequences.Segmentations. Returned metrics are precision, recall, and f1
Mostly for debugging SemiCRFs.
Mostly for debugging SemiCRFs. Just uses a SemiCRF as a CRF.
Simple class that reads in a bunch of files and tags them.
Simple class that reads in a bunch of files and tags them. Output is dumped to standard out.
Object for evaluating epic.sequences.TaggedSequences.
Object for evaluating epic.sequences.TaggedSequences. Returned metrics are accuracy and exact match.
A BIOETag is a tag that us to represent epic.sequences.Segmentations as epic.sequences.TaggedSequences. It includes Begins, Inside, Outside, and End tags. Sometimes we just use IO, or BIO.