Package

com.spotify.ratatool

samplers

Permalink

package samplers

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. samplers
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class AvroSampler extends Sampler[GenericRecord]

    Permalink

    Sampler for Avro files.

  2. class BigQuerySampler extends Sampler[TableRow]

    Permalink

    Sampler for BigQuery tables.

    Sampler for BigQuery tables.

    Only head mode is supported.

  3. trait Sampler[T] extends AnyRef

    Permalink

    Trait for a data sampler.

Value Members

  1. object BigSampler extends Command

    Permalink
  2. def sampleAvro[T <: GenericRecord](coll: SCollection[T], fraction: Double, schema: ⇒ Schema, fields: Seq[String] = Seq(), seed: Option[Int] = None, distribution: Option[SampleDistribution] = None, distributionFields: Seq[String] = Seq(), precision: Precision = Approximate, maxKeySize: Int = 1e6.toInt, byteEncoding: ByteEncoding = RawEncoding)(implicit arg0: ClassTag[T], arg1: Coder[T]): SCollection[T]

    Permalink

    Sample wrapper function for Avro GenericRecord

    Sample wrapper function for Avro GenericRecord

    T

    Record Type

    coll

    The input SCollection to be sampled

    fraction

    The sample rate

    fields

    Fields to construct hash over for determinism

    seed

    Seed used to salt the deterministic hash

    distribution

    Desired output sample distribution

    distributionFields

    Fields to construct distribution over (strata = set of unique fields)

    precision

    Approximate or Exact precision

    maxKeySize

    Maximum allowed size per key (can be tweaked for very large data sets)

    byteEncoding

    Determines how bytes are encoded prior to hashing.

    returns

    SCollection containing Sample population

  3. def sampleProto[T <: AbstractMessage](coll: SCollection[T], fraction: Double, fields: Seq[String] = Seq(), seed: Option[Int] = None, distribution: Option[SampleDistribution] = None, distributionFields: Seq[String] = Seq(), precision: Precision = Approximate, maxKeySize: Int = 1e6.toInt, byteEncoding: ByteEncoding = RawEncoding)(implicit arg0: ClassTag[T]): SCollection[T]

    Permalink

    Sample wrapper function for Protobuf Message

    Sample wrapper function for Protobuf Message

    T

    Record Type

    coll

    The input SCollection to be sampled

    fraction

    The sample rate

    fields

    Fields to construct hash over for determinism

    seed

    Seed used to salt the deterministic hash

    distribution

    Desired output sample distribution

    distributionFields

    Fields to construct distribution over (strata = set of unique fields)

    precision

    Approximate or Exact precision

    maxKeySize

    Maximum allowed size per key (can be tweaked for very large data sets)

    byteEncoding

    Determines how bytes are encoded prior to hashing.

    returns

    SCollection containing Sample population

  4. def sampleTableRow(coll: SCollection[TableRow], fraction: Double, schema: TableSchema, fields: Seq[String] = Seq(), seed: Option[Int] = None, distribution: Option[SampleDistribution] = None, distributionFields: Seq[String] = Seq(), precision: Precision = Approximate, maxKeySize: Int = 1e6.toInt, byteEncoding: ByteEncoding = RawEncoding): SCollection[TableRow]

    Permalink

    Sample wrapper function for BigQuery TableRow

    Sample wrapper function for BigQuery TableRow

    coll

    The input SCollection to be sampled

    fraction

    The sample rate

    fields

    Fields to construct hash over for determinism

    seed

    Seed used to salt the deterministic hash

    distribution

    Desired output sample distribution

    distributionFields

    Fields to construct distribution over (strata = set of unique fields)

    precision

    Approximate or Exact precision

    maxKeySize

    Maximum allowed size per key (can be tweaked for very large data sets)

    byteEncoding

    Determines how bytes are encoded prior to hashing.

    returns

    SCollection containing Sample population

  5. package util

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped