org.apache.spark.ml.stat

Correlation

object Correlation

API for correlation functions in MLlib, compatible with DataFrames and Datasets.

The functions in this package generalize the functions in org.apache.spark.sql.Dataset#stat to spark.ml's Vector types.

Annotations
@Since( "2.2.0" ) @Experimental()
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Correlation
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def corr(dataset: Dataset[_], column: String): DataFrame

    Compute the Pearson correlation matrix for the input Dataset of Vectors.

    Compute the Pearson correlation matrix for the input Dataset of Vectors.

    Annotations
    @Since( "2.2.0" )
  9. def corr(dataset: Dataset[_], column: String, method: String): DataFrame

    :: Experimental :: Compute the correlation matrix for the input Dataset of Vectors using the specified method.

    :: Experimental :: Compute the correlation matrix for the input Dataset of Vectors using the specified method. Methods currently supported: pearson (default), spearman.

    dataset

    A dataset or a dataframe

    column

    The name of the column of vectors for which the correlation coefficient needs to be computed. This must be a column of the dataset, and it must contain Vector objects.

    method

    String specifying the method to use for computing correlation. Supported: pearson (default), spearman

    returns

    A dataframe that contains the correlation matrix of the column of vectors. This dataframe contains a single row and a single column of name '$METHODNAME($COLUMN)'.

    Annotations
    @Since( "2.2.0" )
    Exceptions thrown
    IllegalArgumentException

    if the column is not a valid column in the dataset, or if the content of this column is not of type Vector.

    Here is how to access the correlation coefficient:

    val data: Dataset[Vector] = ...
    val Row(coeff: Matrix) = Correlation.corr(data, "value").head
    // coeff now contains the Pearson correlation matrix.
    Note

    For Spearman, a rank correlation, we need to create an RDD[Double] for each column and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector], which is fairly costly. Cache the input Dataset before calling corr with method = "spearman" to avoid recomputing the common lineage.

  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  20. def toString(): String

    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Members