org.allenai.nlpstack.parse.poly

ml

package ml

Visibility
  1. Public
  2. All

Type Members

  1. case class BrownClusters(clusters: Iterable[(Symbol, Seq[Int])]) extends Product with Serializable

  2. case class BrownClustersTagger(clusters: Seq[BrownClusters]) extends TokenTagger with Product with Serializable

    The BrownClustersTagger tags the tokens of a sentence with their Brown clusters.

  3. case class DatastoreGoogleNGram(groupName: String, artifactName: String, version: Int, frequencyCutoff: Int) extends Product with Serializable

    format: OFF A class that parses Google N-Gram data (http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html) to provide information about a requested n-gram.

    format: OFF A class that parses Google N-Gram data (http://commondatastorage.googleapis.com/books/syntactic-ngrams/index.html) to provide information about a requested n-gram. Takes the datastore location details for a data directory and parses each file, expected to be in the following format (from https://docs.google.com/document/d/14PWeoTkrnKk9H8_7CfVbdvuoFZ7jYivNTkBX2Hj7qLw/edit) - head_word<TAB>syntactic-ngram<TAB>total_count<TAB>counts_by_year The counts_by_year format is a tab-separated list of year<comma>count items. Years are sorted in ascending order, and only years with non-zero counts are included. The syntactic-ngram format is a space-separated list of tokens, each token format is: “word/pos-tag/dep-label/head-index”. The word field can contain any non-whitespace character. The other fields can contain any non-whitespace character except for ‘/’. pos-tag is a Penn-Treebank part-of-speech tag. dep-label is a stanford-basic-dependencies label. head-index is an integer, pointing to the head of the current token. “1” refers to the first token in the list, 2 the second, and 0 indicates that the head is the root of the fragment. format: ON

  4. case class FeatureEncoding(featureNames: IndexedSeq[FeatureName]) extends Product with Serializable

    Maps feature names to integers.

    Maps feature names to integers. Useful for serializing TrainingData instances for consumption by command-line machine learning tools.

    featureNames

    an indexed sequence of feature names

  5. case class FeatureName(symbols: Seq[Symbol]) extends Product with Serializable

    The name of a feature, represented as a list of Symbols.

    The name of a feature, represented as a list of Symbols.

    symbols

    the list of symbols comprising the feature name

  6. case class FeatureVector(values: Seq[(FeatureName, Double)]) extends Product with Serializable

    A mapping from feature names to values.

    A mapping from feature names to values.

    Unspecified feature names are assumed to correspond to a value of zero.

    values

    the map from feature names to values

  7. sealed trait GoogleUnigramTagType extends AnyRef

  8. case class GoogleUnigramTagger(googleNgram: DatastoreGoogleNGram, tagType: GoogleUnigramTagType) extends TokenTagger with Product with Serializable

    A SentenceTagger that tags tokens with unigram features from Google N-grams.

    A SentenceTagger that tags tokens with unigram features from Google N-grams.

    googleNgram

    the Google N-grams resource

    tagType

    the Google N-grams tag type you want to create features for

  9. case class LinearModel(coefficients: Seq[(FeatureName, Double)]) extends Product with Serializable

    A weighted linear combination of features.

    A weighted linear combination of features.

    coefficients

    map from feature names to weight coefficients

  10. case class NgramInfo(syntacticNgram: Seq[SyntacticInfo], frequency: Long) extends Product with Serializable

  11. case class SyntacticInfo(word: String, posTag: String, depLabel: String, headIndex: Int) extends Product with Serializable

    Utility case classes to represent information associated with an ngram in the Google Ngram corpus.

  12. case class TrainingData(labeledVectors: Iterable[(FeatureVector, Int)]) extends Product with Serializable

    Abstraction for a set of labeled feature vectors.

    Abstraction for a set of labeled feature vectors.

    Provides various serialization options for different machine learning tools.

    labeledVectors

    a sequence of feature vectors labeled with integer outcomes

  13. case class UnigramInfo(syntacticUnigram: SyntacticInfo, frequency: Long) extends Product with Serializable

    Encapsulates unigram info pertaining to a word.

    Encapsulates unigram info pertaining to a word. Instead of a seq of SyntacticInfo objects in the general purpose NgramInfo class, here we have a single SyntacticInfo representing the info for a single gram.

  14. case class Verbnet(groupName: String, artifactName: String, version: Int) extends Product with Serializable

    A class that uses JVerbnet, a 3rd party Wrapper library for Verbnet data (http://projects.csail.mit.edu/jverbnet/), to quickly look up various verbnet features for a verb.

  15. case class VerbnetTagger(verbnet: Verbnet, useSecondaryFrames: Boolean = false) extends SentenceTagger with Product with Serializable

    A SentenceTagger that tags sentence tokens using Verbnet frames.

    A SentenceTagger that tags sentence tokens using Verbnet frames.

    verbnet

    the associated Verbnet resource

    useSecondaryFrames

    set to true if you want secondary (rather than primary) frames

  16. case class WrapperClassifier(classifier: ProbabilisticClassifier, featureNameMap: Seq[(Int, FeatureName)]) extends Product with Serializable

    A WrapperClassifier wraps a ProbabilisticClassifier (which uses integer-based feature names) in an interface that allows you to use the more natural org.allenai.nlpstack.parse.poly.ml FeatureVector format for classification.

    A WrapperClassifier wraps a ProbabilisticClassifier (which uses integer-based feature names) in an interface that allows you to use the more natural org.allenai.nlpstack.parse.poly.ml FeatureVector format for classification. This is a trait that specific wrappers can extend.

  17. class WrapperClassifierTrainer extends AnyRef

    Trains a WrapperClassifier from training data.

Value Members

  1. object BrownClusters extends Serializable

  2. object DatastoreGoogleNGram extends Serializable

    Companion object.

  3. object FeatureEncoding extends Serializable

  4. object FeatureName extends Serializable

  5. object FeatureVector extends Serializable

  6. object GoogleNGram

    Object containing utility methods to parse a Google Ngram corpus.

    Object containing utility methods to parse a Google Ngram corpus. This is not specific to the type of corpus, i.e. whether unigram, bigram, etc.

  7. object GoogleUnigram

    Object encapsulating some functionality specific to unigrams.

    Object encapsulating some functionality specific to unigrams. Used wherever features need to be constructed based on unigrams (Google Ngram Nodes).

  8. object GoogleUnigramCpos extends GoogleUnigramTagType with Product with Serializable

  9. object GoogleUnigramPos extends GoogleUnigramTagType with Product with Serializable

  10. object GoogleUnigramTagType

  11. object LinearModel extends Serializable

  12. object Verbnet extends Serializable

  13. object WrapperClassifier extends Serializable

    Provide Serialization and Deserialization methods based on the runtime type of WrapperClassifier.

Ungrouped