Sample wrapper function that manages sampling pipeline based on determinimism, precision, and data type.
Sample wrapper function that manages sampling pipeline based on determinimism, precision, and data type. Can be used to build sampling for data types not supported out of the box.
Record Type
Key Type, usually we use Set[String]
The input SCollection to be sampled
The sample rate
Fields to construct hash over for determinism
Seed used to salt the deterministic hash
Desired output sample distribution
Fields to construct distribution over (strata = set of unique fields)
Approximate or Exact precision
Function to construct a hash given a record, field, and hasher
Function to extract a value that's safe to serialize and key on, given a record
Maximum allowed size per key (can be tweaked for very large data sets)
Determines how bytes are encoded prior to hashing.
SCollection containing Sample population