ml - org.allenai.nlpstack.parse.poly.ml

Type Members

case class BrownClusters(clusters: Iterable[(Symbol, Seq[Int])]) extends Product with Serializable
case class BrownClustersTagger(clusters: Seq[BrownClusters]) extends TokenTagger with Product with Serializable

The BrownClustersTagger tags the tokens of a sentence with their Brown clusters.
case class DatastoreGoogleNGram(groupName: String, artifactName: String, version: Int, frequencyCutoff: Int) extends Product with Serializable

format: OFF A class that parses Google N-Gram data (http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html) to provide information about a requested n-gram.
format: OFF A class that parses Google N-Gram data (http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html) to provide information about a requested n-gram. Takes the datastore location details for a data directory and parses each file, expected to be in the following format (from https://docs.google.com/document/d/14PWeoTkrnKk9H8_7CfVbdvuoFZ7jYivNTkBX2Hj7qLw/edit) - head_word<TAB>syntactic-ngram<TAB>total_count<TAB>counts_by_year The counts_by_year format is a tab-separated list of year<comma>count items. Years are sorted in ascending order, and only years with non-zero counts are included. The syntactic-ngram format is a space-separated list of tokens, each token format is: “word/pos-tag/dep-label/head-index”. The word field can contain any non-whitespace character. The other fields can contain any non-whitespace character except for ‘/’. pos-tag is a Penn-Treebank part-of-speech tag. dep-label is a stanford-basic-dependencies label. head-index is an integer, pointing to the head of the current token. “1” refers to the first token in the list, 2 the second, and 0 indicates that the head is the root of the fragment. format: ON
case class FeatureEncoding(featureNames: IndexedSeq[FeatureName]) extends Product with Serializable

Maps feature names to integers.
Maps feature names to integers. Useful for serializing TrainingData instances for consumption by command-line machine learning tools.
featureNames
an indexed sequence of feature names
case class FeatureName(symbols: Seq[Symbol]) extends Product with Serializable

The name of a feature, represented as a list of Symbols.
The name of a feature, represented as a list of Symbols.
symbols
the list of symbols comprising the feature name
case class FeatureVector(values: Seq[(FeatureName, Double)]) extends Product with Serializable

A mapping from feature names to values.
A mapping from feature names to values.
Unspecified feature names are assumed to correspond to a value of zero.
values
the map from feature names to values
sealed trait GoogleUnigramTagType extends AnyRef
case class GoogleUnigramTagger(googleNgram: DatastoreGoogleNGram, tagType: GoogleUnigramTagType) extends TokenTagger with Product with Serializable

A SentenceTagger that tags tokens with unigram features from Google N-grams.
A SentenceTagger that tags tokens with unigram features from Google N-grams.
googleNgram
the Google N-grams resource
tagType
the Google N-grams tag type you want to create features for
case class LinearModel(coefficients: Seq[(FeatureName, Double)]) extends Product with Serializable

A weighted linear combination of features.
A weighted linear combination of features.
coefficients
map from feature names to weight coefficients
case class NgramInfo(syntacticNgram: Seq[SyntacticInfo], frequency: Long) extends Product with Serializable
case class SyntacticInfo(word: String, posTag: String, depLabel: String, headIndex: Int) extends Product with Serializable

Utility case classes to represent information associated with an ngram in the Google Ngram corpus.
case class TrainingData(labeledVectors: Iterable[(FeatureVector, Int)]) extends Product with Serializable

Abstraction for a set of labeled feature vectors.
Abstraction for a set of labeled feature vectors.
Provides various serialization options for different machine learning tools.
labeledVectors
a sequence of feature vectors labeled with integer outcomes
case class UnigramInfo(syntacticUnigram: SyntacticInfo, frequency: Long) extends Product with Serializable

Encapsulates unigram info pertaining to a word.
Encapsulates unigram info pertaining to a word. Instead of a seq of SyntacticInfo objects in the general purpose NgramInfo class, here we have a single SyntacticInfo representing the info for a single gram.
case class Verbnet(groupName: String, artifactName: String, version: Int) extends Product with Serializable

A class that uses JVerbnet, a 3rd party Wrapper library for Verbnet data (http://projects.csail.mit.edu/jverbnet/), to quickly look up various verbnet features for a verb.
case class VerbnetTagger(verbnet: Verbnet, useSecondaryFrames: Boolean = false) extends SentenceTagger with Product with Serializable

A SentenceTagger that tags sentence tokens using Verbnet frames.
A SentenceTagger that tags sentence tokens using Verbnet frames.
verbnet
the associated Verbnet resource
useSecondaryFrames
set to true if you want secondary (rather than primary) frames
case class WrapperClassifier(classifier: ProbabilisticClassifier, featureNameMap: Seq[(Int, FeatureName)]) extends Product with Serializable

A WrapperClassifier wraps a ProbabilisticClassifier (which uses integer-based feature names) in an interface that allows you to use the more natural org.allenai.nlpstack.parse.poly.ml FeatureVector format for classification.
A WrapperClassifier wraps a ProbabilisticClassifier (which uses integer-based feature names) in an interface that allows you to use the more natural org.allenai.nlpstack.parse.poly.ml FeatureVector format for classification. This is a trait that specific wrappers can extend.
class WrapperClassifierTrainer extends AnyRef

Trains a WrapperClassifier from training data.

Value Members

object BrownClusters extends Serializable
object DatastoreGoogleNGram extends Serializable

Companion object.
object FeatureEncoding extends Serializable
object FeatureName extends Serializable
object FeatureVector extends Serializable
object GoogleNGram

Object containing utility methods to parse a Google Ngram corpus.
Object containing utility methods to parse a Google Ngram corpus. This is not specific to the type of corpus, i.e. whether unigram, bigram, etc.
object GoogleUnigram

Object encapsulating some functionality specific to unigrams.
Object encapsulating some functionality specific to unigrams. Used wherever features need to be constructed based on unigrams (Google Ngram Nodes).
object GoogleUnigramCpos extends GoogleUnigramTagType with Product with Serializable
object GoogleUnigramPos extends GoogleUnigramTagType with Product with Serializable
object GoogleUnigramTagType
object LinearModel extends Serializable
object Verbnet extends Serializable
object WrapperClassifier extends Serializable

Provide Serialization and Deserialization methods based on the runtime type of WrapperClassifier.

ml

package ml

Type Members

case class BrownClusters(clusters: Iterable[(Symbol, Seq[Int])]) extends Product with Serializable

case class BrownClustersTagger(clusters: Seq[BrownClusters]) extends TokenTagger with Product with Serializable

case class DatastoreGoogleNGram(groupName: String, artifactName: String, version: Int, frequencyCutoff: Int) extends Product with Serializable

case class FeatureEncoding(featureNames: IndexedSeq[FeatureName]) extends Product with Serializable

case class FeatureName(symbols: Seq[Symbol]) extends Product with Serializable

case class FeatureVector(values: Seq[(FeatureName, Double)]) extends Product with Serializable

sealed trait GoogleUnigramTagType extends AnyRef

case class GoogleUnigramTagger(googleNgram: DatastoreGoogleNGram, tagType: GoogleUnigramTagType) extends TokenTagger with Product with Serializable

case class LinearModel(coefficients: Seq[(FeatureName, Double)]) extends Product with Serializable

case class NgramInfo(syntacticNgram: Seq[SyntacticInfo], frequency: Long) extends Product with Serializable

case class SyntacticInfo(word: String, posTag: String, depLabel: String, headIndex: Int) extends Product with Serializable

case class TrainingData(labeledVectors: Iterable[(FeatureVector, Int)]) extends Product with Serializable

case class UnigramInfo(syntacticUnigram: SyntacticInfo, frequency: Long) extends Product with Serializable

case class Verbnet(groupName: String, artifactName: String, version: Int) extends Product with Serializable

case class VerbnetTagger(verbnet: Verbnet, useSecondaryFrames: Boolean = false) extends SentenceTagger with Product with Serializable

case class WrapperClassifier(classifier: ProbabilisticClassifier, featureNameMap: Seq[(Int, FeatureName)]) extends Product with Serializable

class WrapperClassifierTrainer extends AnyRef

Value Members

object BrownClusters extends Serializable

object DatastoreGoogleNGram extends Serializable

object FeatureEncoding extends Serializable

object FeatureName extends Serializable

object FeatureVector extends Serializable

object GoogleNGram

object GoogleUnigram

object GoogleUnigramCpos extends GoogleUnigramTagType with Product with Serializable

object GoogleUnigramPos extends GoogleUnigramTagType with Product with Serializable

object GoogleUnigramTagType

object LinearModel extends Serializable

object Verbnet extends Serializable

object WrapperClassifier extends Serializable

Ungrouped