Top abstract class representing the core elements and functionality of a sequence-structured Conditional Random Field.
A CRF that uses a dense (rather than sparse) internal representation
This dynamic feature manager redefines window and ngram functions to operate over the original sequence of elements in the context where we've recoded the original sequence to a selected sub-sequence with re-mapped labels.
This dynamic feature manager redefines window and ngram functions to operate over the original sequence of elements in the context where we've recoded the original sequence to a selected sub-sequence with re-mapped labels. Design note: This could be a trait also
A feature function which subclasses (Int,SourceSequence[Obs],Int) => FeatureReturn
.
A feature function which subclasses (Int,SourceSequence[Obs],Int) => FeatureReturn
. Essentially,
this defines a function that takes three arguments, returns a FeatureReturn
and
also has a string name. Named feature functions are useful for tracing feature application
and provide a means to compare feature functions.
A FeatureManager
defines includes a set of common feature function definitions.
A FeatureManager
defines includes a set of common feature function definitions.
It also holds a list of actual feature function objects that are applied to a sequence of
observations. Sequence labeling applications will need to create a concrete subclass of
FeatureManager
that specifies exactly which feature functions will be applied.
This class defines a simple DSL (Domain-Specific Language) that allows the set of feature
functions for a particular application to be clearly specified.
There are also higher-order
feature functions that take other feature functions as arguments to easily and compactly
specify more complicated feature extraction functionality. The FeatureManager is type-parameterized
by Obs
which represents the observation type and Info
which
denotes the type of the auxilliary information (if any) associated with each observation.
An application-specific FeatureManager should subclass this class and specify, within the
body of the class definition a set of feature functions, where each function is described
as a single expression that returns an instance of FeatureReturn
. Below is an
example:
object MyFeatureManager extends FeatureManager[String,Map[String,String] { "wdFn" as wdFn "capRegFn" as regexpFn("Capitalized", "[A-Z].*".r) "wdNgrm1" as wdFn ngram (-2 to 0) "wdNgrm2" as wdFn ngram (-1,0,1) "cross1" as wdFn ngram (-1,0) cross (regexpFn("EndIn-ed",".*ed$".r) over (-2 to 2)) }
Each top-level function consists as a String
followed by the keyword method
name "as" which is then followed by a feature function. That feature function may be either 1) a
simple feature function such as wdFn
or 2) a complex feature function created by
composing other feature functions. For example, the feature function named "wdNgrm1" creates
an n-gram consisting of the concatenation of the features that result from applying the wdFn
feature function at the positions -2,-1 and 0 relative to the current position. The "cross1"
feature function is a more complicated instance that takes the ngram computed from the words
at -1 and 0 and conjoins that feature with all the features that result from applying
the regular expression function that returns the feature name "EndIn-ed" (when its pattern is matched)
over the relative positions -2,-1,0,1,2.
A list of pairs of feature names along with their value that have fired
based upon the application of a feature function FeatureFn
.
A list of pairs of feature names along with their value that have fired
based upon the application of a feature function FeatureFn
.
Specifies whether the features are node or edge features.
Implements methods that facilitate generating sequences from standoff representations of of data and annotations using a simple JSON-based encoding.
Implements methods that facilitate generating sequences from standoff representations of
of data and annotations using a simple JSON-based encoding. This trait is to be mixed
into appropriate subclasses of SeqGen
such as subclasses specialized for
training or decoding.
Extensions of "linear" CRFs that provide for (some) features to feed into a set of N hidden neurons.
Extensions of "linear" CRFs that provide for (some) features to feed into a set of N hidden neurons. How this will work: Only some of the input features will be fed into the gates. Each FeatureType will keep track of whether that feature type goes through gates. After feature extraction has been completed and when the Neural CRF is being started up, extra parameters are added. Specifically M extra parameters where M = numGates * numNeuralFeatures. This will simply be added so that the parameter associated with feature i going into gate g is indexed at position nfs * (g+1) + i where 'nfs' is the number of input features and g is zero indexed.
Encapsulates functionality for creating labeled sequences.
Encapsulates functionality for creating labeled sequences. This includes extracting features over elements in a sequence.
A Crf that uses a sparse internal representation suitable for Stochastic Gradient Descent learning methods.
Deserialization functionality for handling text input.
Deserialization functionality for handling text input. This uses a customized lexer to identify tokens within a body of text, identifies sentence/zone boundaries and produce annotations as inline tags. It does not use an XML parser, however, and will therefore produce and consume files that do not conform to XML.
Implements methods that facilitate generating sequences from inline XML representations of data and annotations.
Implements methods that facilitate generating sequences from inline XML representations
of data and annotations. This trait is to be mixed
into appropirate subclasses of SeqGen
such as subclasses specialized for
training or decoding.
A creation of Crf.
Object used to initialize a Neural CRF.
Object used to initialize a Neural CRF. Extra parameters and book-keeping needs to be set up.
Top abstract class representing the core elements and functionality of a sequence-structured Conditional Random Field. A
Crf
object is created after feature extraction has occurred.