org.apache.spark.mllib

feature

package feature

Visibility
  1. Public
  2. All

Type Members

  1. class HashingTF extends Serializable

    :: Experimental :: Maps a sequence of terms to their term frequencies using the hashing trick.

    :: Experimental :: Maps a sequence of terms to their term frequencies using the hashing trick.

    Annotations
    @Experimental()
  2. class IDF extends AnyRef

    :: Experimental :: Inverse document frequency (IDF).

    :: Experimental :: Inverse document frequency (IDF). The standard formulation is used: idf = log((m + 1) / (d(t) + 1)), where m is the total number of documents and d(t) is the number of documents that contain term t.

    This implementation supports filtering out terms which do not appear in a minimum number of documents (controlled by the variable minDocFreq). For terms that are not in at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.

    Annotations
    @Experimental()
  3. class IDFModel extends Serializable

    :: Experimental :: Represents an IDF model that can transform term frequency vectors.

    :: Experimental :: Represents an IDF model that can transform term frequency vectors.

    Annotations
    @Experimental()
  4. class Normalizer extends VectorTransformer

    :: Experimental :: Normalizes samples individually to unit Lp norm

    :: Experimental :: Normalizes samples individually to unit Lp norm

    For any 1 <= p < Double.PositiveInfinity, normalizes samples using sum(abs(vector).p)(1/p) as norm.

    For p = Double.PositiveInfinity, max(abs(vector)) will be used as norm for normalization.

    Annotations
    @Experimental()
  5. class StandardScaler extends Logging

    :: Experimental :: Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.

    :: Experimental :: Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.

    Annotations
    @Experimental()
  6. class StandardScalerModel extends VectorTransformer

    :: Experimental :: Represents a StandardScaler model that can transform vectors.

    :: Experimental :: Represents a StandardScaler model that can transform vectors.

    Annotations
    @Experimental()
  7. trait VectorTransformer extends Serializable

    :: DeveloperApi :: Trait for transformation of a vector

    :: DeveloperApi :: Trait for transformation of a vector

    Annotations
    @DeveloperApi()
  8. class Word2Vec extends Serializable with Logging

    :: Experimental :: Word2Vec creates vector representation of words in a text corpus.

    :: Experimental :: Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms.

    We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation.

    For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.

    Annotations
    @Experimental()
  9. class Word2VecModel extends Serializable

    :: Experimental :: Word2Vec model

    :: Experimental :: Word2Vec model

    Annotations
    @Experimental()

Ungrouped