Package

org.apache.spark.examples

mllib

Permalink

package mllib

Visibility
  1. Public
  2. All

Type Members

  1. abstract class AbstractParams[T] extends AnyRef

    Permalink

    Abstract class for parameter case classes.

    Abstract class for parameter case classes. This overrides the toString method to print all case class fields by name and value.

    T

    Concrete parameter class.

  2. final class JavaALS extends AnyRef

    Permalink
  3. class JavaAssociationRulesExample extends AnyRef

    Permalink
  4. class JavaBinaryClassificationMetricsExample extends AnyRef

    Permalink
  5. class JavaBisectingKMeansExample extends AnyRef

    Permalink
  6. class JavaChiSqSelectorExample extends AnyRef

    Permalink
  7. class JavaCorrelationsExample extends AnyRef

    Permalink
  8. class JavaElementwiseProductExample extends AnyRef

    Permalink
  9. class JavaGaussianMixtureExample extends AnyRef

    Permalink
  10. class JavaGradientBoostingClassificationExample extends AnyRef

    Permalink
  11. class JavaGradientBoostingRegressionExample extends AnyRef

    Permalink
  12. class JavaHypothesisTestingExample extends AnyRef

    Permalink
  13. class JavaHypothesisTestingKolmogorovSmirnovTestExample extends AnyRef

    Permalink
  14. class JavaIsotonicRegressionExample extends AnyRef

    Permalink
  15. class JavaKMeansExample extends AnyRef

    Permalink
  16. class JavaKernelDensityEstimationExample extends AnyRef

    Permalink
  17. class JavaLBFGSExample extends AnyRef

    Permalink
  18. class JavaLatentDirichletAllocationExample extends AnyRef

    Permalink
  19. class JavaLinearRegressionWithSGDExample extends AnyRef

    Permalink
  20. class JavaLogisticRegressionWithLBFGSExample extends AnyRef

    Permalink
  21. class JavaMultiLabelClassificationMetricsExample extends AnyRef

    Permalink
  22. class JavaMulticlassClassificationMetricsExample extends AnyRef

    Permalink
  23. class JavaNaiveBayesExample extends AnyRef

    Permalink
  24. class JavaPCAExample extends AnyRef

    Permalink
  25. class JavaPowerIterationClusteringExample extends AnyRef

    Permalink
  26. class JavaPrefixSpanExample extends AnyRef

    Permalink
  27. class JavaRandomForestClassificationExample extends AnyRef

    Permalink
  28. class JavaRandomForestRegressionExample extends AnyRef

    Permalink
  29. class JavaRankingMetricsExample extends AnyRef

    Permalink
  30. class JavaRecommendationExample extends AnyRef

    Permalink
  31. class JavaRegressionMetricsExample extends AnyRef

    Permalink
  32. class JavaSVDExample extends AnyRef

    Permalink
  33. class JavaSVMWithSGDExample extends AnyRef

    Permalink
  34. class JavaSimpleFPGrowth extends AnyRef

    Permalink
  35. class JavaStratifiedSamplingExample extends AnyRef

    Permalink
  36. class JavaStreamingTestExample extends AnyRef

    Permalink
  37. class JavaSummaryStatisticsExample extends AnyRef

    Permalink

Value Members

  1. object AssociationRulesExample

    Permalink
  2. object BinaryClassification

    Permalink

    An example app for binary classification.

    An example app for binary classification. Run with

    bin/run-example org.apache.spark.examples.mllib.BinaryClassification

    A synthetic dataset is located at data/mllib/sample_binary_classification_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.

  3. object BinaryClassificationMetricsExample

    Permalink
  4. object BisectingKMeansExample

    Permalink

    An example demonstrating a bisecting k-means clustering in spark.mllib.

    An example demonstrating a bisecting k-means clustering in spark.mllib.

    Run with

    bin/run-example mllib.BisectingKMeansExample
  5. object ChiSqSelectorExample

    Permalink
  6. object Correlations

    Permalink

    An example app for summarizing multivariate data from a file.

    An example app for summarizing multivariate data from a file. Run with

    bin/run-example org.apache.spark.examples.mllib.Correlations

    By default, this loads a synthetic dataset from data/mllib/sample_linear_regression_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.

  7. object CorrelationsExample

    Permalink
  8. object CosineSimilarity

    Permalink

    Compute the similar columns of a matrix, using cosine similarity.

    Compute the similar columns of a matrix, using cosine similarity.

    The input matrix must be stored in row-oriented dense format, one line per row with its entries separated by space. For example,

    0.5 1.0
    2.0 3.0
    4.0 5.0

    represents a 3-by-2 matrix, whose first row is (0.5, 1.0).

    Example invocation:

    bin/run-example mllib.CosineSimilarity \ --threshold 0.1 data/mllib/sample_svm_data.txt

  9. object DecisionTreeClassificationExample

    Permalink
  10. object DecisionTreeRegressionExample

    Permalink
  11. object DecisionTreeRunner

    Permalink

    An example runner for decision trees and random forests.

    An example runner for decision trees and random forests. Run with

    ./bin/run-example org.apache.spark.examples.mllib.DecisionTreeRunner [options]

    If you use it as a template to create your own app, please use spark-submit to submit your app.

    Note: This script treats all features as real-valued (not categorical). To include categorical features, modify categoricalFeaturesInfo.

  12. object DenseKMeans

    Permalink

    An example k-means app.

    An example k-means app. Run with

    ./bin/run-example org.apache.spark.examples.mllib.DenseKMeans [options] <input>

    If you use it as a template to create your own app, please use spark-submit to submit your app.

  13. object ElementwiseProductExample

    Permalink
  14. object FPGrowthExample

    Permalink

    Example for mining frequent itemsets using FP-growth.

    Example for mining frequent itemsets using FP-growth. Example usage: ./bin/run-example mllib.FPGrowthExample \ --minSupport 0.8 --numPartition 2 ./data/mllib/sample_fpgrowth.txt

  15. object GaussianMixtureExample

    Permalink
  16. object GradientBoostedTreesRunner

    Permalink

    An example runner for Gradient Boosting using decision trees as weak learners.

    An example runner for Gradient Boosting using decision trees as weak learners. Run with

    ./bin/run-example mllib.GradientBoostedTreesRunner [options]

    If you use it as a template to create your own app, please use spark-submit to submit your app.

    Note: This script treats all features as real-valued (not categorical). To include categorical features, modify categoricalFeaturesInfo.

  17. object GradientBoostingClassificationExample

    Permalink
  18. object GradientBoostingRegressionExample

    Permalink
  19. object HypothesisTestingExample

    Permalink
  20. object HypothesisTestingKolmogorovSmirnovTestExample

    Permalink
  21. object IsotonicRegressionExample

    Permalink
  22. object KMeansExample

    Permalink
  23. object KernelDensityEstimationExample

    Permalink
  24. object LBFGSExample

    Permalink
  25. object LDAExample

    Permalink

    An example Latent Dirichlet Allocation (LDA) app.

    An example Latent Dirichlet Allocation (LDA) app. Run with

    ./bin/run-example mllib.LDAExample [options] <input>

    If you use it as a template to create your own app, please use spark-submit to submit your app.

  26. object LatentDirichletAllocationExample

    Permalink
  27. object LogisticRegressionWithLBFGSExample

    Permalink
  28. object MovieLensALS

    Permalink

    An example app for ALS on MovieLens data (http://grouplens.org/datasets/movielens/).

    An example app for ALS on MovieLens data (http://grouplens.org/datasets/movielens/). Run with

    bin/run-example org.apache.spark.examples.mllib.MovieLensALS

    A synthetic dataset in MovieLens format can be found at data/mllib/sample_movielens_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.

  29. object MultiLabelMetricsExample

    Permalink
  30. object MulticlassMetricsExample

    Permalink
  31. object MultivariateSummarizer

    Permalink

    An example app for summarizing multivariate data from a file.

    An example app for summarizing multivariate data from a file. Run with

    bin/run-example org.apache.spark.examples.mllib.MultivariateSummarizer

    By default, this loads a synthetic dataset from data/mllib/sample_linear_regression_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.

  32. object NaiveBayesExample

    Permalink
  33. object NormalizerExample

    Permalink
  34. object PCAOnRowMatrixExample

    Permalink
  35. object PCAOnSourceVectorExample

    Permalink
  36. object PMMLModelExportExample

    Permalink
  37. object PowerIterationClusteringExample

    Permalink

    An example Power Iteration Clustering http://www.icml2010.org/papers/387.pdf app.

    An example Power Iteration Clustering http://www.icml2010.org/papers/387.pdf app. Takes an input of K concentric circles and the number of points in the innermost circle. The output should be K clusters - each cluster containing precisely the points associated with each of the input circles.

    Run with

    ./bin/run-example mllib.PowerIterationClusteringExample [options]
    
    Where options include:
      k:  Number of circles/clusters
      n:  Number of sampled points on innermost circle.. There are proportionally more points
         within the outer/larger circles
      maxIterations:   Number of Power Iterations

    Here is a sample run and output:

    ./bin/run-example mllib.PowerIterationClusteringExample -k 2 --n 10 --maxIterations 15

    Cluster assignments: 1 -> [0,1,2,3,4,5,6,7,8,9], 0 -> [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]

    If you use it as a template to create your own app, please use spark-submit to submit your app.

  38. object PrefixSpanExample

    Permalink
  39. object RandomForestClassificationExample

    Permalink
  40. object RandomForestRegressionExample

    Permalink
  41. object RandomRDDGeneration

    Permalink

    An example app for randomly generated RDDs.

    An example app for randomly generated RDDs. Run with

    bin/run-example org.apache.spark.examples.mllib.RandomRDDGeneration

    If you use it as a template to create your own app, please use spark-submit to submit your app.

  42. object RankingMetricsExample

    Permalink
  43. object RecommendationExample

    Permalink
  44. object SVDExample

    Permalink
  45. object SVMWithSGDExample

    Permalink
  46. object SampledRDDs

    Permalink

    An example app for randomly generated and sampled RDDs.

    An example app for randomly generated and sampled RDDs. Run with

    bin/run-example org.apache.spark.examples.mllib.SampledRDDs

    If you use it as a template to create your own app, please use spark-submit to submit your app.

  47. object SimpleFPGrowth

    Permalink
  48. object SparseNaiveBayes

    Permalink

    An example naive Bayes app.

    An example naive Bayes app. Run with

    ./bin/run-example org.apache.spark.examples.mllib.SparseNaiveBayes [options] <input>

    If you use it as a template to create your own app, please use spark-submit to submit your app.

  49. object StandardScalerExample

    Permalink
  50. object StratifiedSamplingExample

    Permalink
  51. object StreamingKMeansExample

    Permalink

    Estimate clusters on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.

    Estimate clusters on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.

    The rows of the training text files must be vector data in the form [x1,x2,x3,...,xn] Where n is the number of dimensions.

    The rows of the test text files must be labeled data in the form (y,[x1,x2,x3,...,xn]) Where y is some identifier. n must be the same for train and test.

    Usage: StreamingKMeansExample <trainingDir> <testDir> <batchDuration> <numClusters> <numDimensions>

    To run on your local machine using the two directories trainingDir and testDir, with updates every 5 seconds, 2 dimensions per data point, and 3 clusters, call: $ bin/run-example mllib.StreamingKMeansExample trainingDir testDir 5 3 2

    As you add text files to trainingDir the clusters will continuously update. Anytime you add text files to testDir, you'll see predicted labels using the current model.

  52. object StreamingLinearRegressionExample

    Permalink

    Train a linear regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.

    Train a linear regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.

    The rows of the text files must be labeled data points in the form (y,[x1,x2,x3,...,xn]) Where n is the number of features. n must be the same for train and test.

    Usage: StreamingLinearRegressionExample <trainingDir> <testDir>

    To run on your local machine using the two directories trainingDir and testDir, with updates every 5 seconds, and 2 features per data point, call: $ bin/run-example mllib.StreamingLinearRegressionExample trainingDir testDir

    As you add text files to trainingDir the model will continuously update. Anytime you add text files to testDir, you'll see predictions from the current model.

  53. object StreamingLogisticRegression

    Permalink

    Train a logistic regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.

    Train a logistic regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.

    The rows of the text files must be labeled data points in the form (y,[x1,x2,x3,...,xn]) Where n is the number of features, y is a binary label, and n must be the same for train and test.

    Usage: StreamingLogisticRegression <trainingDir> <testDir> <batchDuration> <numFeatures>

    To run on your local machine using the two directories trainingDir and testDir, with updates every 5 seconds, and 2 features per data point, call: $ bin/run-example mllib.StreamingLogisticRegression trainingDir testDir 5 2

    As you add text files to trainingDir the model will continuously update. Anytime you add text files to testDir, you'll see predictions from the current model.

  54. object StreamingTestExample

    Permalink

    Perform streaming testing using Welch's 2-sample t-test on a stream of data, where the data stream arrives as text files in a directory.

    Perform streaming testing using Welch's 2-sample t-test on a stream of data, where the data stream arrives as text files in a directory. Stops when the two groups are statistically significant (p-value < 0.05) or after a user-specified timeout in number of batches is exceeded.

    The rows of the text files must be in the form Boolean, Double. For example: false, -3.92 true, 99.32

    Usage: StreamingTestExample <dataDir> <batchDuration> <numBatchesTimeout>

    To run on your local machine using the directory dataDir with 5 seconds between each batch and a timeout after 100 insignificant batches, call: $ bin/run-example mllib.StreamingTestExample dataDir 5 100

    As you add text files to dataDir the significance test wil continually update every batchDuration seconds until the test becomes significant (p-value < 0.05) or the number of batches processed exceeds numBatchesTimeout.

  55. object SummaryStatisticsExample

    Permalink
  56. object TFIDFExample

    Permalink
  57. object TallSkinnyPCA

    Permalink

    Compute the principal components of a tall-and-skinny matrix, whose rows are observations.

    Compute the principal components of a tall-and-skinny matrix, whose rows are observations.

    The input matrix must be stored in row-oriented dense format, one line per row with its entries separated by space. For example,

    0.5 1.0
    2.0 3.0
    4.0 5.0

    represents a 3-by-2 matrix, whose first row is (0.5, 1.0).

  58. object TallSkinnySVD

    Permalink

    Compute the singular value decomposition (SVD) of a tall-and-skinny matrix.

    Compute the singular value decomposition (SVD) of a tall-and-skinny matrix.

    The input matrix must be stored in row-oriented dense format, one line per row with its entries separated by space. For example,

    0.5 1.0
    2.0 3.0
    4.0 5.0

    represents a 3-by-2 matrix, whose first row is (0.5, 1.0).

  59. object Word2VecExample

    Permalink

Deprecated Value Members

  1. object LinearRegression

    Permalink

    An example app for linear regression.

    An example app for linear regression. Run with

    bin/run-example org.apache.spark.examples.mllib.LinearRegression

    A synthetic dataset can be found at data/mllib/sample_linear_regression_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) Use ml.regression.LinearRegression or LBFGS

  2. object LinearRegressionWithSGDExample

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) Use ml.regression.LinearRegression or LBFGS

  3. object PCAExample

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) Deprecated since LinearRegressionWithSGD is deprecated. Use ml.feature.PCA

  4. object RegressionMetricsExample

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) Use ml.regression.LinearRegression and the resulting model summary for metrics

Ungrouped