com.amazon.deequ.checks

Check

Related Docs: object Check | package checks

case class Check(level: CheckLevel.Value, description: String, constraints: Seq[Constraint] = Seq.empty) extends Product with Serializable

A class representing a list of constraints that can be applied to a given org.apache.spark.sql.DataFrame. In order to run the checks, use the run method. You can also use VerificationSuite.run to run your checks along with other Checks and Analysis objects. When run with VerificationSuite, Analyzers required by multiple checks/analysis blocks is optimized to run once.

level

Assertion level of the check group. If any of the constraints fail this level is used for the status of the check.

description

The name describes the check block. Generally will be used to show in the logs.

constraints

The constraints to apply when this check is run. New ones can be added and will return a new object

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Check
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Check(level: CheckLevel.Value, description: String, constraints: Seq[Constraint] = Seq.empty)

    level

    Assertion level of the check group. If any of the constraints fail this level is used for the status of the check.

    description

    The name describes the check block. Generally will be used to show in the logs.

    constraints

    The constraints to apply when this check is run. New ones can be added and will return a new object

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. def addConstraint(constraint: Constraint): Check

    Returns a new Check object with the given constraint added to the constraints list.

    Returns a new Check object with the given constraint added to the constraints list.

    constraint

    New constraint to be added

    returns

  5. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  6. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def containsCreditCardNumber(column: String, assertion: (Double) ⇒ Boolean = Check.IsOne, hint: Option[String] = None): Check

    Check to run against the compliance of a column against a Credit Card pattern.

    Check to run against the compliance of a column against a Credit Card pattern.

    column

    Name of the column that should be checked.

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  8. def containsEmail(column: String, assertion: (Double) ⇒ Boolean = Check.IsOne, hint: Option[String] = None): Check

    Check to run against the compliance of a column against an e-mail pattern.

    Check to run against the compliance of a column against an e-mail pattern.

    column

    Name of the column that should be checked.

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  9. def containsSocialSecurityNumber(column: String, assertion: (Double) ⇒ Boolean = Check.IsOne, hint: Option[String] = None): Check

    Check to run against the compliance of a column against the Social security number pattern for the US.

    Check to run against the compliance of a column against the Social security number pattern for the US.

    column

    Name of the column that should be checked.

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  10. def containsURL(column: String, assertion: (Double) ⇒ Boolean = Check.IsOne, hint: Option[String] = None): Check

    Check to run against the compliance of a column against an URL pattern.

    Check to run against the compliance of a column against an URL pattern.

    column

    Name of the column that should be checked.

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  11. val description: String

    The name describes the check block.

    The name describes the check block. Generally will be used to show in the logs.

  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def evaluate(context: AnalyzerContext): CheckResult

    Evaluate this check on computed metrics

    Evaluate this check on computed metrics

    context

    result of the metrics computation

    returns

  14. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  16. def hasApproxCountDistinct(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on the approximate count distinct of the given column

    Creates a constraint that asserts on the approximate count distinct of the given column

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  17. def hasApproxQuantile(column: String, quantile: Double, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): Check

    Creates a constraint that asserts on an approximated quantile

    Creates a constraint that asserts on an approximated quantile

    column

    Column to run the assertion on

    quantile

    Which quantile to assert on

    assertion

    Function that receives a double input parameter (the computed quantile) and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  18. def hasCompleteness(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on a column completion.

    Creates a constraint that asserts on a column completion. Uses the given history selection strategy to retrieve historical completeness values on this column from the history provider.

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  19. def hasCorrelation(columnA: String, columnB: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on the pearson correlation between two columns.

    Creates a constraint that asserts on the pearson correlation between two columns.

    columnA

    First column for correlation calculation

    columnB

    Second column for correlation calculation

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  20. def hasDataType(column: String, dataType: constraints.ConstrainableDataTypes.Value, assertion: (Double) ⇒ Boolean = Check.IsOne, hint: Option[String] = None): Check

    Check to run against the fraction of rows that conform to the given data type.

    Check to run against the fraction of rows that conform to the given data type.

    column

    Name of the columns that should be checked.

    dataType

    Data type that the columns should be compared against.

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  21. def hasDistinctness(columns: Seq[String], assertion: (Double) ⇒ Boolean, hint: Option[String] = None): Check

    Creates a constraint on the distinctness in a single or combined set of key columns.

    Creates a constraint on the distinctness in a single or combined set of key columns.

    columns

    columns

    assertion

    Function that receives a double input parameter and returns a boolean. Refers to the fraction of distinct values.

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  22. def hasEntropy(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): Check

    Creates a constraint that asserts on a column entropy.

    Creates a constraint that asserts on a column entropy.

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  23. def hasHistogramValues(column: String, assertion: (Distribution) ⇒ Boolean, binningUdf: Option[UserDefinedFunction] = None, maxBins: Integer = Histogram.MaximumAllowedDetailBins, hint: Option[String] = None): Check

    Creates a constraint that asserts on column's value distribution.

    Creates a constraint that asserts on column's value distribution.

    column

    Column to run the assertion on

    assertion

    Function that receives a Distribution input parameter and returns a boolean. E.g .hasHistogramValues("att2", _.absolutes("f") == 3) .hasHistogramValues("att2", _.ratios(Histogram.NullFieldReplacement) == 2/6.0)

    binningUdf

    An optional binning function

    maxBins

    Histogram details is only provided for N column values with top counts. maxBins sets the N

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  24. def hasMax(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on the maximum of the column

    Creates a constraint that asserts on the maximum of the column

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  25. def hasMean(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): Check

    Creates a constraint that asserts on the mean of the column

    Creates a constraint that asserts on the mean of the column

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  26. def hasMin(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on the minimum of the column

    Creates a constraint that asserts on the minimum of the column

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  27. def hasMutualInformation(columnA: String, columnB: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): Check

    Creates a constraint that asserts on a mutual information between two columns.

    Creates a constraint that asserts on a mutual information between two columns.

    columnA

    First column for mutual information calculation

    columnB

    Second column for mutual information calculation

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  28. def hasNumberOfDistinctValues(column: String, assertion: (Long) ⇒ Boolean, binningUdf: Option[UserDefinedFunction] = None, maxBins: Integer = Histogram.MaximumAllowedDetailBins, hint: Option[String] = None): Check

    Creates a constraint that asserts on the number of distinct values a column has.

    Creates a constraint that asserts on the number of distinct values a column has.

    column

    Column to run the assertion on

    assertion

    Function that receives a long input parameter and returns a boolean

    binningUdf

    An optional binning function

    maxBins

    Histogram details is only provided for N column values with top counts. maxBins sets the N

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  29. def hasPattern(column: String, pattern: Regex, assertion: (Double) ⇒ Boolean = Check.IsOne, name: Option[String] = None, hint: Option[String] = None): Check

    Checks for pattern compliance.

    Checks for pattern compliance. Given a column name and a regular expression, defines a Check on the average compliance of the column's values to the regular expression.

    column

    Name of the column that should be checked.

    pattern

    The columns values will be checked for a match against this pattern.

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  30. def hasSize(assertion: (Long) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that calculates the data frame size and runs the assertion on it.

    Creates a constraint that calculates the data frame size and runs the assertion on it.

    assertion

    Function that receives a long input parameter and returns a boolean Assertion functions might refer to the data frame size by "_" .hasSize(_>5), meaning the number of rows should be greater than 5 Or more elaborate function might be provided .hasSize{ aNameForSize => aNameForSize > 0 && aNameForSize < 10 }

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  31. def hasStandardDeviation(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on the standard deviation of the column

    Creates a constraint that asserts on the standard deviation of the column

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  32. def hasSum(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on the sum of the column

    Creates a constraint that asserts on the sum of the column

    column

    Column to run the assertion on

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  33. def hasUniqueValueRatio(columns: Seq[String], assertion: (Double) ⇒ Boolean, hint: Option[String] = None): Check

    Creates a constraint on the unique value ratio in a single or combined set of key columns.

    Creates a constraint on the unique value ratio in a single or combined set of key columns.

    columns

    columns

    assertion

    Function that receives a double input parameter and returns a boolean. Refers to the fraction of distinct values.

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  34. def hasUniqueness(column: String, assertion: (Double) ⇒ Boolean, hint: Option[String]): Check

    Creates a constraint that asserts on the uniqueness of a key column.

    Creates a constraint that asserts on the uniqueness of a key column.

    column

    Key column

    assertion

    Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values.

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  35. def hasUniqueness(column: String, assertion: (Double) ⇒ Boolean): Check

    Creates a constraint that asserts on the uniqueness of a key column.

    Creates a constraint that asserts on the uniqueness of a key column.

    column

    Key column

    assertion

    Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values.

    returns

  36. def hasUniqueness(columns: Seq[String], assertion: (Double) ⇒ Boolean, hint: Option[String]): Check

    Creates a constraint that asserts on uniqueness in a single or combined set of key columns.

    Creates a constraint that asserts on uniqueness in a single or combined set of key columns.

    columns

    Key columns

    assertion

    Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  37. def hasUniqueness(columns: Seq[String], assertion: (Double) ⇒ Boolean): Check

    Creates a constraint that asserts on uniqueness in a single or combined set of key columns.

    Creates a constraint that asserts on uniqueness in a single or combined set of key columns.

    columns

    Key columns

    assertion

    Function that receives a double input parameter and returns a boolean. Refers to the fraction of unique values

    returns

  38. def isComplete(column: String, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts on a column completion.

    Creates a constraint that asserts on a column completion.

    column

    Column to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  39. def isContainedIn(column: String, lowerBound: Double, upperBound: Double, includeLowerBound: Boolean = true, includeUpperBound: Boolean = true, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Asserts that the non-null values in a numeric column fall into the predefined interval

    Asserts that the non-null values in a numeric column fall into the predefined interval

    column

    column to run the assertion

    lowerBound

    lower bound of the interval

    upperBound

    upper bound of the interval

    includeLowerBound

    is a value equal to the lower bound allows?

    includeUpperBound

    is a value equal to the upper bound allowed?

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  40. def isContainedIn(column: String, allowedValues: Array[String], hint: Option[String]): CheckWithLastConstraintFilterable

    Asserts that every non-null value in a column is contained in a set of predefined values

    Asserts that every non-null value in a column is contained in a set of predefined values

    column

    Column to run the assertion on

    allowedValues

    allowed values for the column

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  41. def isContainedIn(column: String, allowedValues: Array[String]): CheckWithLastConstraintFilterable

    Asserts that every non-null value in a column is contained in a set of predefined values

    Asserts that every non-null value in a column is contained in a set of predefined values

    column

    Column to run the assertion on

    allowedValues

    allowed values for the column

    returns

  42. def isGreaterThan(columnA: String, columnB: String, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Asserts that, in each row, the value of columnA is greater than the value of columnB

    Asserts that, in each row, the value of columnA is greater than the value of columnB

    columnA

    Column to run the assertion on

    columnB

    Column to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  43. def isGreaterThanOrEqualTo(columnA: String, columnB: String, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Asserts that, in each row, the value of columnA is greather than or equal to the value of columnB

    Asserts that, in each row, the value of columnA is greather than or equal to the value of columnB

    columnA

    Column to run the assertion on

    columnB

    Column to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  44. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  45. def isLessThan(columnA: String, columnB: String, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Asserts that, in each row, the value of columnA is less than the value of columnB

    Asserts that, in each row, the value of columnA is less than the value of columnB

    columnA

    Column to run the assertion on

    columnB

    Column to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  46. def isLessThanOrEqualTo(columnA: String, columnB: String, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Asserts that, in each row, the value of columnA is less than or equal to the value of columnB

    Asserts that, in each row, the value of columnA is less than or equal to the value of columnB

    columnA

    Column to run the assertion on

    columnB

    Column to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  47. def isNonNegative(column: String, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that asserts that a column contains no negative values

    Creates a constraint that asserts that a column contains no negative values

    column

    Column to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  48. def isPrimaryKey(column: String, hint: Option[String], columns: String*): Check

    Creates a constraint that asserts on a column(s) primary key characteristics.

    Creates a constraint that asserts on a column(s) primary key characteristics. Currently only checks uniqueness, but reserved for primary key checks if there is another assertion to run on primary key columns.

    column

    Columns to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  49. def isPrimaryKey(column: String, columns: String*): Check

    Creates a constraint that asserts on a column(s) primary key characteristics.

    Creates a constraint that asserts on a column(s) primary key characteristics. Currently only checks uniqueness, but reserved for primary key checks if there is another assertion to run on primary key columns.

    column

    Columns to run the assertion on

    returns

  50. def isUnique(column: String, hint: Option[String] = None): Check

    Creates a constraint that asserts on a column uniqueness.

    Creates a constraint that asserts on a column uniqueness.

    column

    Column to run the assertion on

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  51. val level: CheckLevel.Value

    Assertion level of the check group.

    Assertion level of the check group. If any of the constraints fail this level is used for the status of the check.

  52. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  53. final def notify(): Unit

    Definition Classes
    AnyRef
  54. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  55. def requiredAnalyzers(): Set[Analyzer[_, Metric[_]]]

  56. def satisfies(columnCondition: String, constraintName: String, assertion: (Double) ⇒ Boolean = Check.IsOne, hint: Option[String] = None): CheckWithLastConstraintFilterable

    Creates a constraint that runs the given condition on the data frame.

    Creates a constraint that runs the given condition on the data frame.

    columnCondition

    Data frame column which is a combination of expression and the column name. It has to comply with Spark SQL syntax. Can be written in an exact same way with conditions inside the WHERE clause.

    constraintName

    A name that summarizes the check being made. This name is being used to name the metrics for the analysis being done.

    assertion

    Function that receives a double input parameter and returns a boolean

    hint

    A hint to provide additional context why a constraint could have failed

    returns

  57. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  58. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  59. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  60. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped