Assertion level of the check group. If any of the constraints fail this level is used for the status of the check.
The name describes the check block. Generally will be used to show in the logs.
The constraints to apply when this check is run. New ones can be added and will return a new object
Returns a new Check object with the given constraint added to the constraints list.
Returns a new Check object with the given constraint added to the constraints list.
New constraint to be added
Check to run against the compliance of a column against a Credit Card pattern.
Check to run against the compliance of a column against a Credit Card pattern.
Name of the column that should be checked.
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Check to run against the compliance of a column against an e-mail pattern.
Check to run against the compliance of a column against an e-mail pattern.
Name of the column that should be checked.
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Check to run against the compliance of a column against the Social security number pattern for the US.
Check to run against the compliance of a column against the Social security number pattern for the US.
Name of the column that should be checked.
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Check to run against the compliance of a column against an URL pattern.
Check to run against the compliance of a column against an URL pattern.
Name of the column that should be checked.
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
The name describes the check block.
The name describes the check block. Generally will be used to show in the logs.
Evaluate this check on computed metrics
Evaluate this check on computed metrics
result of the metrics computation
Creates a constraint that asserts on the approximate count distinct of the given column
Creates a constraint that asserts on the approximate count distinct of the given column
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on an approximated quantile
Creates a constraint that asserts on an approximated quantile
Column to run the assertion on
Which quantile to assert on
Function that receives a double input parameter (the computed quantile) and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on a column completion.
Creates a constraint that asserts on a column completion. Uses the given history selection strategy to retrieve historical completeness values on this column from the history provider.
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the pearson correlation between two columns.
Creates a constraint that asserts on the pearson correlation between two columns.
First column for correlation calculation
Second column for correlation calculation
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Check to run against the fraction of rows that conform to the given data type.
Check to run against the fraction of rows that conform to the given data type.
Name of the columns that should be checked.
Data type that the columns should be compared against.
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint on the distinctness in a single or combined set of key columns.
Creates a constraint on the distinctness in a single or combined set of key columns.
columns
Function that receives a double input parameter and returns a boolean. Refers to the fraction of distinct values.
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on a column entropy.
Creates a constraint that asserts on a column entropy.
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on column's value distribution.
Creates a constraint that asserts on column's value distribution.
Column to run the assertion on
Function that receives a Distribution input parameter and returns a boolean. E.g .hasHistogramValues("att2", _.absolutes("f") == 3) .hasHistogramValues("att2", _.ratios(Histogram.NullFieldReplacement) == 2/6.0)
An optional binning function
Histogram details is only provided for N column values with top counts. maxBins sets the N
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the maximum of the column
Creates a constraint that asserts on the maximum of the column
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the mean of the column
Creates a constraint that asserts on the mean of the column
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the minimum of the column
Creates a constraint that asserts on the minimum of the column
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on a mutual information between two columns.
Creates a constraint that asserts on a mutual information between two columns.
First column for mutual information calculation
Second column for mutual information calculation
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the number of distinct values a column has.
Creates a constraint that asserts on the number of distinct values a column has.
Column to run the assertion on
Function that receives a long input parameter and returns a boolean
An optional binning function
Histogram details is only provided for N column values with top counts. maxBins sets the N
A hint to provide additional context why a constraint could have failed
Checks for pattern compliance.
Checks for pattern compliance. Given a column name and a regular expression, defines a Check on the average compliance of the column's values to the regular expression.
Name of the column that should be checked.
The columns values will be checked for a match against this pattern.
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that calculates the data frame size and runs the assertion on it.
Creates a constraint that calculates the data frame size and runs the assertion on it.
Function that receives a long input parameter and returns a boolean Assertion functions might refer to the data frame size by "_" .hasSize(_>5), meaning the number of rows should be greater than 5 Or more elaborate function might be provided .hasSize{ aNameForSize => aNameForSize > 0 && aNameForSize < 10 }
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the standard deviation of the column
Creates a constraint that asserts on the standard deviation of the column
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the sum of the column
Creates a constraint that asserts on the sum of the column
Column to run the assertion on
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
Creates a constraint on the unique value ratio in a single or combined set of key columns.
Creates a constraint on the unique value ratio in a single or combined set of key columns.
columns
Function that receives a double input parameter and returns a boolean. Refers to the fraction of distinct values.
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the uniqueness of a key column.
Creates a constraint that asserts on the uniqueness of a key column.
Key column
Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values.
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on the uniqueness of a key column.
Creates a constraint that asserts on the uniqueness of a key column.
Key column
Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values.
Creates a constraint that asserts on uniqueness in a single or combined set of key columns.
Creates a constraint that asserts on uniqueness in a single or combined set of key columns.
Key columns
Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on uniqueness in a single or combined set of key columns.
Creates a constraint that asserts on uniqueness in a single or combined set of key columns.
Key columns
Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values
Creates a constraint that asserts on a column completion.
Creates a constraint that asserts on a column completion.
Column to run the assertion on
A hint to provide additional context why a constraint could have failed
Asserts that the non-null values in a numeric column fall into the predefined interval
Asserts that the non-null values in a numeric column fall into the predefined interval
column to run the assertion
lower bound of the interval
upper bound of the interval
is a value equal to the lower bound allows?
is a value equal to the upper bound allowed?
A hint to provide additional context why a constraint could have failed
Asserts that every non-null value in a column is contained in a set of predefined values
Asserts that every non-null value in a column is contained in a set of predefined values
Column to run the assertion on
allowed values for the column
A hint to provide additional context why a constraint could have failed
Asserts that every non-null value in a column is contained in a set of predefined values
Asserts that every non-null value in a column is contained in a set of predefined values
Column to run the assertion on
allowed values for the column
Asserts that, in each row, the value of columnA is greater than the value of columnB
Asserts that, in each row, the value of columnA is greater than the value of columnB
Column to run the assertion on
Column to run the assertion on
A hint to provide additional context why a constraint could have failed
Asserts that, in each row, the value of columnA is greather than or equal to the value of columnB
Asserts that, in each row, the value of columnA is greather than or equal to the value of columnB
Column to run the assertion on
Column to run the assertion on
A hint to provide additional context why a constraint could have failed
Asserts that, in each row, the value of columnA is less than the value of columnB
Asserts that, in each row, the value of columnA is less than the value of columnB
Column to run the assertion on
Column to run the assertion on
A hint to provide additional context why a constraint could have failed
Asserts that, in each row, the value of columnA is less than or equal to the value of columnB
Asserts that, in each row, the value of columnA is less than or equal to the value of columnB
Column to run the assertion on
Column to run the assertion on
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts that a column contains no negative values
Creates a constraint that asserts that a column contains no negative values
Column to run the assertion on
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on a column(s) primary key characteristics.
Creates a constraint that asserts on a column(s) primary key characteristics. Currently only checks uniqueness, but reserved for primary key checks if there is another assertion to run on primary key columns.
Columns to run the assertion on
A hint to provide additional context why a constraint could have failed
Creates a constraint that asserts on a column(s) primary key characteristics.
Creates a constraint that asserts on a column(s) primary key characteristics. Currently only checks uniqueness, but reserved for primary key checks if there is another assertion to run on primary key columns.
Columns to run the assertion on
Creates a constraint that asserts on a column uniqueness.
Creates a constraint that asserts on a column uniqueness.
Column to run the assertion on
A hint to provide additional context why a constraint could have failed
Assertion level of the check group.
Assertion level of the check group. If any of the constraints fail this level is used for the status of the check.
Creates a constraint that runs the given condition on the data frame.
Creates a constraint that runs the given condition on the data frame.
Data frame column which is a combination of expression and the column
name. It has to comply with Spark SQL syntax.
Can be written in an exact same way with conditions inside the
WHERE
clause.
A name that summarizes the check being made. This name is being used to name the metrics for the analysis being done.
Function that receives a double input parameter and returns a boolean
A hint to provide additional context why a constraint could have failed
A class representing a list of constraints that can be applied to a given org.apache.spark.sql.DataFrame. In order to run the checks, use the
run
method. You can also use VerificationSuite.run to run your checks along with other Checks and Analysis objects. When run with VerificationSuite, Analyzers required by multiple checks/analysis blocks is optimized to run once.Assertion level of the check group. If any of the constraints fail this level is used for the status of the check.
The name describes the check block. Generally will be used to show in the logs.
The constraints to apply when this check is run. New ones can be added and will return a new object