Calculate the covariance of two numerical columns of a DataFrame.
Calculate the covariance of two numerical columns of a DataFrame.
The DataFrame
the column names
the covariance of the two columns.
Generate a table of frequencies for the elements of two columns.
Calculates the approximate quantiles of multiple numerical columns of a DataFrame in one pass.
Calculates the approximate quantiles of multiple numerical columns of a DataFrame in one pass.
The result of this algorithm has the following deterministic bound:
If the DataFrame has N elements and if we request the quantile at probability p
up to error
err
, then the algorithm will return a sample x
from the DataFrame so that the *exact* rank
of x
is close to (p * N).
More precisely,
floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna.
the dataframe
numerical columns of the dataframe
a list of quantile probabilities Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
The relative target precision to achieve (greater than or equal 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.
for each column, returns the requested approximations
null and NaN values will be ignored in numerical columns before calculation. For a column only containing null or NaN values, an empty array is returned.
Calculate the Pearson Correlation Coefficient for the given columns