



package diffy

  1. Public
  2. All

Type Members

  1. class AvroDiffy[T <: GenericRecord] extends Diffy[T]


    Field level diff tool for Avro records.

  2. class BigDiffy[T] extends AnyRef


    Big diff between two data sets given a primary key.

  3. case class Delta(field: String, left: Any, right: Any, delta: DeltaValue) extends Product with Serializable


    Delta of a single field between two records.

    Delta of a single field between two records.


    "." separated field identifier


    left hand side value


    right hand side value


    delta of numerical values

  4. case class DeltaStats(deltaType: DeltaType.Value, min: Double, max: Double, count: Long, mean: Double, variance: Double, stddev: Double, skewness: Double, kurtosis: Double) extends Product with Serializable


    Delta level statistics, mean, and the four standardized moments.

    Delta level statistics, mean, and the four standardized moments.

    deltaType - one of NUMERIC, STRING, VECTOR min - minimum distance seen max - maximum distance seen count - number of differences seen mean - mean of all differences variance - squared deviation from the mean stddev - standard deviation from the mean skewness - measure of data asymmetry in all deltas kurtosis - measure of distribution sharpness and tail thickness in deltas

  5. sealed trait DeltaValue extends AnyRef


    Delta value of a single node between two records.

  6. abstract class Diffy[T] extends Serializable


    Field level diff tool.

    Field level diff tool.

    Use ignore to specify set of fields to ignore during comparison. Use unordered to specify set of fields to be treated as unordered, i.e. sort before comparison.

  7. case class FieldStats(field: String, count: Long, fraction: Double, deltaStats: Option[DeltaStats]) extends Product with Serializable


    Field level statistics.

    Field level statistics.

    field - "." separated field identifier. count - number of records with different values of the given field. fraction - fraction over total number of keys with different records on both sides. deltaStats - statistics of field value deltas.

  8. case class GlobalStats(numTotal: Long, numSame: Long, numDiff: Long, numMissingLhs: Long, numMissingRhs: Long) extends Product with Serializable


    Global level statistics.

    Global level statistics.

    numTotal - number of total unique keys. numSame - number of keys with same records on both sides. numDiff - number of keys with different records on both sides. numMissingLhs - number of keys with missing left hand side record. numMissingRhs - number of keys with missing right hand side record.

  9. case class KeyStats(key: String, diffType: DiffType.Value, delta: Option[Delta]) extends Product with Serializable


    Key-field level DiffType and delta.

    Key-field level DiffType and delta.

    If DiffType are SAME, MISSING_LHS, or MISSING_RHS they will appear once with no Delta If DiffType is DIFFERENT, there is one KeyStats for every field that is different for that key with that field's Delta

    key - primary being compared. diffType - how the two records of the given key compares. delta - a single field's difference including field name, values, and distance

  10. class ProtoBufDiffy[T <: AbstractMessage] extends Diffy[T]


    Field level diff tool for ProtoBuf records.

  11. class TableRowDiffy extends Diffy[TableRow]


    Field level diff tool for TableRow records.

  12. case class TypedDelta(deltaType: DeltaType.Value, value: Double) extends DeltaValue with Product with Serializable


    Delta value with a known type and computed difference.

Value Members

  1. object BigDiffy


    Big diff between two data sets given a primary key.

  2. object CosineDistance


    Compute cosine distance between two vectors.

  3. object DeltaType extends Enumeration


    Delta type of a single node between two records.

    Delta type of a single node between two records.

    UNKNOWN - unknown type, no numeric delta is computed. NUMERIC - numeric type, e.g. Long, Double, default delta is numeric difference. STRING - string type, default delta is Levenshtein edit distance. VECTOR - repeated numeric type, default delta is 1.0 - cosine similarity.

  4. object DiffType extends Enumeration


    Diff type between two records of the same key.

    Diff type between two records of the same key.

    SAME - the two records are identical. DIFFERENT - the two records are different. MISSING_LHS - left hand side record is missing. MISSING_RHS - right hand side record is missing.

  5. object Levenshtein


    Compute Levenshtein edit distance between two strings.

    Compute Levenshtein edit distance between two strings.

  6. object NumericDelta


    Companion objects for TypedDelta.

  7. object StringDelta

  8. object UnknownDelta extends DeltaValue with Product with Serializable


    Delta value of unknown type.

  9. object VectorDelta

