SCollection

Abstract Value Members

implicit abstract val ct: ClassTag[T]

Attributes
protected
Definition Classes
PCollectionWrapper
abstract val internal: PCollection[T]

The PCollection being wrapped internally.
The PCollection being wrapped internally.

Definition Classes
PCollectionWrapper

Concrete Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
def ++(that: SCollection[T]): SCollection[T]

Return the union of this SCollection and another one.
Return the union of this SCollection and another one. Any identical elements will appear multiple times (use .distinct() to eliminate them).
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
def aggregate[A, U](aggregator: Aggregator[T, A, U])(implicit arg0: ClassTag[A], arg1: ClassTag[U]): SCollection[U]

Aggregate with Aggregator.
Aggregate with Aggregator. First each item T is mapped to A, then we reduce with a semigroup of A, then finally we present the results as U. This could be more powerful and better optimized in some cases.
def aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): SCollection[U]

Aggregate the elements using given combine functions and a neutral "zero value".
Aggregate the elements using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this SCollection, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
def apply[U](transform: PTransform[_ >: PCollection[T], PCollection[U]])(implicit arg0: ClassTag[U]): SCollection[U]

Attributes
protected
Definition Classes
PCollectionWrapper
final def asInstanceOf[T0]: T0

Definition Classes
Any
def asIterableSideInput: SideInput[Iterable[T]]

Convert this SCollection to a SideInput, mapping each window to an Iterable, to be used with SCollection.withSideInputs.
Convert this SCollection to a SideInput, mapping each window to an Iterable, to be used with SCollection.withSideInputs.
The values of the Iterable for a window are not required to fit in memory, but they may also not be effectively cached. If it is known that every window fits in memory, and stronger caching is desired, use asListSideInput.
def asListSideInput: SideInput[List[T]]

Convert this SCollection to a SideInput, mapping each window to a List, to be used with SCollection.withSideInputs.
def asSingletonSideInput: SideInput[T]

Convert this SCollection of a single value per window to a SideInput, to be used with SCollection.withSideInputs.
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def combine[C](createCombiner: (T) ⇒ C)(mergeValue: (C, T) ⇒ C)(mergeCombiners: (C, C) ⇒ C)(implicit arg0: ClassTag[C]): SCollection[C]

Generic function to combine the elements using a custom set of aggregation functions.
Generic function to combine the elements using a custom set of aggregation functions. Turns an SCollection[T] into a result of type SCollection[C], for a "combined type" C. Note that V and C can be different -- for example, one might combine an SCollection of type Int into an SCollection of type Seq[Int]. Users provide three functions:
- createCombiner, which turns a V into a C (e.g., creates a one-element list)
- mergeValue, to merge a V into a C (e.g., adds it to the end of a list)
- mergeCombiners, to combine two C's into a single one.
def count(): SCollection[Long]

Count the number of elements in the SCollection.
Count the number of elements in the SCollection.
returns
a new SCollection with the count
def countApproxDistinct(maximumEstimationError: Double = 0.02): SCollection[Long]

Count approximate number of distinct elements in the SCollection.
Count approximate number of distinct elements in the SCollection.
maximumEstimationError
the maximum estimation error, which should be in the range [0.01, 0.5]
def countApproxDistinct(sampleSize: Int): SCollection[Long]

Count approximate number of distinct elements in the SCollection.
Count approximate number of distinct elements in the SCollection.
sampleSize
the number of entries in the statisticalsample; the higher this number, the more accurate the estimate will be; should be >= 16
def countByValue(): SCollection[(T, Long)]

Count of each unique value in this SCollection as an SCollection of (value, count) pairs.
def cross[U](that: SCollection[U])(implicit arg0: ClassTag[U]): SCollection[(T, U)]

Return the cross product with another SCollection by replicating that to all workers.
Return the cross product with another SCollection by replicating that to all workers. The right side should be tiny and fit in memory.
def distinct(): SCollection[T]

Return a new SCollection containing the distinct elements in this SCollection.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def filter(f: (T) ⇒ Boolean): SCollection[T]

Return a new SCollection containing only the elements that satisfy a predicate.
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def flatMap[U](f: (T) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): SCollection[U]

Return a new SCollection by first applying a function to all elements of this SCollection, and then flattening the results.
def fold(implicit mon: Monoid[T]): SCollection[T]

Fold with Monoid, which defines the associative function and "zero value" for T.
Fold with Monoid, which defines the associative function and "zero value" for T. This could be more powerful and better optimized in some cases.
def fold(zeroValue: T)(op: (T, T) ⇒ T): SCollection[T]

Aggregate the elements using a given associative function and a neutral "zero value".
Aggregate the elements using a given associative function and a neutral "zero value". The function op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object allocation; however, it should not modify t2.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def groupBy[K](f: (T) ⇒ K)(implicit arg0: ClassTag[K]): SCollection[(K, Iterable[T])]

Return an SCollection of grouped items.
Return an SCollection of grouped items. Each group consists of a key and a sequence of elements mapping to that key. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting SCollection is evaluated.
Note: This operation may be very expensive. If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using PairSCollectionFunctions.aggregateByKey or PairSCollectionFunctions.reduceByKey will provide much better performance.
def hashCode(): Int

Definition Classes
AnyRef → Any
def hashLookup[V](that: SCollection[(T, V)])(implicit arg0: ClassTag[V]): SCollection[(T, Iterable[V])]

Look up values in a SCollection[(T, V)] for each element T in this SCollection by replicating that to all workers.
Look up values in a SCollection[(T, V)] for each element T in this SCollection by replicating that to all workers. The right side should be tiny and fit in memory.
def intersection(that: SCollection[T]): SCollection[T]

Return the intersection of this SCollection and another one.
Return the intersection of this SCollection and another one. The output will not contain any duplicate elements, even if the input SCollections did.
Note that this method performs a shuffle internally.
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def keyBy[K](f: (T) ⇒ K)(implicit arg0: ClassTag[K]): SCollection[(K, T)]

Create tuples of the elements in this SCollection by applying f.
def map[U](f: (T) ⇒ U)(implicit arg0: ClassTag[U]): SCollection[U]

Return a new SCollection by applying a function to all elements of this SCollection.
def materialize: Future[Tap[T]]

Extract data from this SCollection as a Future.
Extract data from this SCollection as a Future. The Future will be completed once the pipeline completes successfully.
def max(implicit ord: Ordering[T]): SCollection[T]

Return the max of this SCollection as defined by the implicit Ordering[T].
Return the max of this SCollection as defined by the implicit Ordering[T].
returns
a new SCollection with the maximum element
def mean(implicit ev: Numeric[T]): SCollection[Double]

Return the mean of this SCollection as defined by the implicit Numeric[T].
Return the mean of this SCollection as defined by the implicit Numeric[T].
returns
a new SCollection with the mean of elements
def min(implicit ord: Ordering[T]): SCollection[T]

Return the min of this SCollection as defined by the implicit Ordering[T].
Return the min of this SCollection as defined by the implicit Ordering[T].
returns
a new SCollection with the minimum element
def name: String

A friendly name for this SCollection.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def partition(numPartitions: Int, f: (T) ⇒ Int): Seq[SCollection[T]]

Partition this SCollection with the provided function.
Partition this SCollection with the provided function.
numPartitions
number of output partitions
f
function that assigns an output partition to each element, should be in the range [0, numPartitions - 1]
returns
partitioned SCollections in a Seq
def quantilesApprox(numQuantiles: Int)(implicit ord: Ordering[T]): SCollection[Iterable[T]]

Compute the SCollection's data distribution using approximate N-tiles.
Compute the SCollection's data distribution using approximate N-tiles.
returns
a new SCollection whose single value is an Iterable of the approximate N-tiles of the elements
def randomSplit(weights: Array[Double]): Array[SCollection[T]]

Randomly splits this SCollection with the provided weights.
Randomly splits this SCollection with the provided weights.
weights
weights for splits, will be normalized if they don't sum to 1
returns
split SCollections in an array
def reduce(op: (T, T) ⇒ T): SCollection[T]

Reduce the elements of this SCollection using the specified commutative and associative binary operator.
def sample(withReplacement: Boolean, fraction: Double): SCollection[T]

Return a sampled subset of this SCollection.
def sample(sampleSize: Int): SCollection[Iterable[T]]

Return a sampled subset of this SCollection.
Return a sampled subset of this SCollection.
returns
a new SCollection whose single value is an Iterable of the samples
def saveAsAvroFile(path: String, numShards: Int = 0, schema: Schema = null): Future[Tap[T]]

Save this SCollection as an Avro file.
Save this SCollection as an Avro file. Note that elements must be of type IndexedRecord.
schema
must be not null if T is of type GenericRecord.
def saveAsBigQuery(tableSpec: String, schema: TableSchema = null, writeDisposition: WriteDisposition = null, createDisposition: CreateDisposition = null)(implicit ev: <:<[T, TableRow]): Future[Tap[TableRow]]

Save this SCollection as a BigQuery table.
Save this SCollection as a BigQuery table. Note that elements must be of type TableRow.
def saveAsBigQuery(table: TableReference, schema: TableSchema, writeDisposition: WriteDisposition, createDisposition: CreateDisposition)(implicit ev: <:<[T, TableRow]): Future[Tap[TableRow]]

Save this SCollection as a BigQuery table.
Save this SCollection as a BigQuery table. Note that elements must be of type TableRow.
def saveAsDatastore(datasetId: String)(implicit ev: <:<[T, Entity]): Future[Tap[Entity]]

Save this SCollection as a Datastore dataset.
Save this SCollection as a Datastore dataset. Note that elements must be of type Entity.
def saveAsObjectFile(path: String, suffix: String = ".obj", numShards: Int = 0): Future[Tap[T]]

Save this SCollection as an object file using default serialization.
def saveAsPubsub(topic: String)(implicit ev: <:<[T, String]): Future[Tap[String]]

Save this SCollection as a Pub/Sub topic.
Save this SCollection as a Pub/Sub topic. Note that elements must be of type String.
def saveAsTableRowJsonFile(path: String, numShards: Int = 0)(implicit ev: <:<[T, TableRow]): Future[Tap[TableRow]]

Save this SCollection as a JSON text file.
Save this SCollection as a JSON text file. Note that elements must be of type TableRow.
def saveAsTextFile(path: String, suffix: String = ".txt", numShards: Int = 0)(implicit ev: <:<[T, String]): Future[Tap[String]]

Save this SCollection as a text file.
Save this SCollection as a text file. Note that elements must be of type String.
def setCoder(coder: Coder[T]): SCollection[T]

Assign a Coder to this SCollection.
def setName(name: String): SCollection[T]

Assign a name to this SCollection.
def subtract(that: SCollection[T]): SCollection[T]

Return an SCollection with the elements from this that are not in other.
def sum(implicit sg: Semigroup[T]): SCollection[T]

Reduce with Semigroup.
Reduce with Semigroup. This could be more powerful and better optimized in some cases.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def take(num: Long): SCollection[T]

Return a sampled subset of any num elements of the SCollection.
def timestampBy(f: (T) ⇒ Instant): SCollection[T]

Assign timestamps to values.
def toString(): String

Definition Classes
AnyRef → Any
def toWindowed: WindowedSCollection[T]

Convert this SCollection to an WindowedSCollection.
def top(num: Int)(implicit ord: Ordering[T]): SCollection[Iterable[T]]

Return the top k (largest) elements from this SCollection as defined by the specified implicit Ordering[T].
Return the top k (largest) elements from this SCollection as defined by the specified implicit Ordering[T].
returns
a new SCollection whose single value is an Iterable of the top k
def union(that: SCollection[T]): SCollection[T]

Return the union of this SCollection and another one.
Return the union of this SCollection and another one. Any identical elements will appear multiple times (use .distinct() to eliminate them).
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def windowByDays(number: Int, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

Window values into by days.
def windowByMonths(number: Int, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

Window values into by months.
def windowByWeeks(number: Int, startDayOfWeek: Int, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

Window values into by weeks.
def windowByYears(number: Int, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

Window values into by years.
def withAccumulator(acc: Accumulator[_]*): SCollectionWithAccumulator[T]

Convert this SCollection to an SCollectionWithAccumulator with one or more Accumulators, similar to Hadoop counters.
Convert this SCollection to an SCollectionWithAccumulator with one or more Accumulators, similar to Hadoop counters. Call SCollectionWithAccumulator.toSCollection when done with accumulators.
Note that each accumulator may be used in a single scope only.
Create accumulators with ScioContext.maxAccumulator, ScioContext.minAccumulator or ScioContext.sumAccumulator. For example:
```
val maxLineLength = sc.maxAccumulator[Int]("maxLineLength")
val minLineLength = sc.maxAccumulator[Int]("maxLineLength")
val emptyLines = sc.maxAccumulator[Long]("emptyLines")

val p: SCollection[String] = // ...
p
  .withAccumulators(maxLineLength, minLineLength, emptyLines)
  .filter { (l, c) =>
    val t = l.strip()
    c.addValue(maxLineLength, t.length).addValue(minLineLength, t.length)
    val b = t.isEmpty
        if (b) c.addValue(emptyLines, 1L)
        !b
  }
  .toSCollection
```
def withFanout(fanout: Int): SCollectionWithFanout[T]

Convert this SCollection to an SCollectionWithFanout that uses an intermediate node to combine parts of the data to reduce load on the final global combine step.
Convert this SCollection to an SCollectionWithFanout that uses an intermediate node to combine parts of the data to reduce load on the final global combine step.
fanout
the number of intermediate keys that will be used
def withFixedWindows(duration: Duration, offset: Duration = Duration.ZERO, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

Window values into fixed windows.
def withGlobalWindow(options: WindowOptions[GlobalWindow] = WindowOptions()): SCollection[T]

Group values in to a single global window.
def withSessionWindows(gapDuration: Duration, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

Window values based on sessions.

def withSideInputs(sides: SideInput[_]*): SCollectionWithSideInput[T]

Convert this SCollection to an SCollectionWithSideInput with one or more SideInputs, similar to Spark broadcast variables.

Convert this SCollection to an SCollectionWithSideInput with one or more SideInputs, similar to Spark broadcast variables. Call SCollectionWithSideInput.toSCollection when done with side inputs.

Note that the side inputs should be tiny and fit in memory.

val s1: SCollection[Int] = // ...
val s2: SCollection[String] = // ...
val s3: SCollection[(String, Double)] = // ...

// Prepare side inputs
val side1 = s1.asSingletonSideInput
val side2 = s2.asIterableSideInput
val side3 = s3.asMapSideInput

val p: SCollection[MyRecord] = // ...
p.withSideInputs(side1, side2, side3).map { (x, s) =>
  // Extract side inputs from context
  val s1: Int = s(side1)
  val s2: Iterable[String] = s(side2)
  val s3: Map[String, Iterable[Double]] = s(side3)
  // ...
}

def withSideOutputs(sides: SideOutput[_]*): SCollectionWithSideOutput[T]

Convert this SCollection to an SCollectionWithSideOutput with one or more SideOutputs, so that a single transform can write to multiple destinations.
Convert this SCollection to an SCollectionWithSideOutput with one or more SideOutputs, so that a single transform can write to multiple destinations.
```
// Prepare side inputs
val side1 = SideOutput[String]()
val side2 = SideOutput[Int]()

val p: SCollection[MyRecord] = // ...
p.withSideOutputs(side1, side2).map { (x, s) =>
  // Write to side outputs via context
  s.output(side1, "word").output(side2, 1)
  // ...
}
```
def withSlidingWindows(size: Duration, period: Duration = Duration.millis(1), offset: Duration = Duration.ZERO, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

Window values into sliding windows.
def withTimestamp(): SCollection[(T, Instant)]

Convert values into pairs of (value, timestamp).
def withWindowFn[W <: BoundedWindow](fn: WindowFn[AnyRef, W], options: WindowOptions[W] = WindowOptions()): SCollection[T]

Window values with the given function.

sealed trait SCollection[T] extends PCollectionWrapper[T]

Abstract Value Members

implicit abstract val ct: ClassTag[T]

abstract val internal: PCollection[T]

Concrete Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

def ++(that: SCollection[T]): SCollection[T]

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

def aggregate[A, U](aggregator: Aggregator[T, A, U])(implicit arg0: ClassTag[A], arg1: ClassTag[U]): SCollection[U]

def aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): SCollection[U]

def apply[U](transform: PTransform[_ >: PCollection[T], PCollection[U]])(implicit arg0: ClassTag[U]): SCollection[U]

final def asInstanceOf[T0]: T0

def asIterableSideInput: SideInput[Iterable[T]]

def asListSideInput: SideInput[List[T]]

def asSingletonSideInput: SideInput[T]

def clone(): AnyRef

def combine[C](createCombiner: (T) ⇒ C)(mergeValue: (C, T) ⇒ C)(mergeCombiners: (C, C) ⇒ C)(implicit arg0: ClassTag[C]): SCollection[C]

def count(): SCollection[Long]

def countApproxDistinct(maximumEstimationError: Double = 0.02): SCollection[Long]

def countApproxDistinct(sampleSize: Int): SCollection[Long]

def countByValue(): SCollection[(T, Long)]

def cross[U](that: SCollection[U])(implicit arg0: ClassTag[U]): SCollection[(T, U)]

def distinct(): SCollection[T]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def filter(f: (T) ⇒ Boolean): SCollection[T]

def finalize(): Unit

def flatMap[U](f: (T) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): SCollection[U]

def fold(implicit mon: Monoid[T]): SCollection[T]

def fold(zeroValue: T)(op: (T, T) ⇒ T): SCollection[T]

final def getClass(): Class[_]

def groupBy[K](f: (T) ⇒ K)(implicit arg0: ClassTag[K]): SCollection[(K, Iterable[T])]

def hashCode(): Int

def hashLookup[V](that: SCollection[(T, V)])(implicit arg0: ClassTag[V]): SCollection[(T, Iterable[V])]

def intersection(that: SCollection[T]): SCollection[T]

final def isInstanceOf[T0]: Boolean

def keyBy[K](f: (T) ⇒ K)(implicit arg0: ClassTag[K]): SCollection[(K, T)]

def map[U](f: (T) ⇒ U)(implicit arg0: ClassTag[U]): SCollection[U]

def materialize: Future[Tap[T]]

def max(implicit ord: Ordering[T]): SCollection[T]

def mean(implicit ev: Numeric[T]): SCollection[Double]

def min(implicit ord: Ordering[T]): SCollection[T]

def name: String

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def partition(numPartitions: Int, f: (T) ⇒ Int): Seq[SCollection[T]]

def quantilesApprox(numQuantiles: Int)(implicit ord: Ordering[T]): SCollection[Iterable[T]]

def randomSplit(weights: Array[Double]): Array[SCollection[T]]

def reduce(op: (T, T) ⇒ T): SCollection[T]

def sample(withReplacement: Boolean, fraction: Double): SCollection[T]

def sample(sampleSize: Int): SCollection[Iterable[T]]

def saveAsAvroFile(path: String, numShards: Int = 0, schema: Schema = null): Future[Tap[T]]

def saveAsBigQuery(tableSpec: String, schema: TableSchema = null, writeDisposition: WriteDisposition = null, createDisposition: CreateDisposition = null)(implicit ev: <:<[T, TableRow]): Future[Tap[TableRow]]

def saveAsBigQuery(table: TableReference, schema: TableSchema, writeDisposition: WriteDisposition, createDisposition: CreateDisposition)(implicit ev: <:<[T, TableRow]): Future[Tap[TableRow]]

def saveAsDatastore(datasetId: String)(implicit ev: <:<[T, Entity]): Future[Tap[Entity]]

def saveAsObjectFile(path: String, suffix: String = ".obj", numShards: Int = 0): Future[Tap[T]]

def saveAsPubsub(topic: String)(implicit ev: <:<[T, String]): Future[Tap[String]]

def saveAsTableRowJsonFile(path: String, numShards: Int = 0)(implicit ev: <:<[T, TableRow]): Future[Tap[TableRow]]

def saveAsTextFile(path: String, suffix: String = ".txt", numShards: Int = 0)(implicit ev: <:<[T, String]): Future[Tap[String]]

def setCoder(coder: Coder[T]): SCollection[T]

def setName(name: String): SCollection[T]

def subtract(that: SCollection[T]): SCollection[T]

def sum(implicit sg: Semigroup[T]): SCollection[T]

final def synchronized[T0](arg0: ⇒ T0): T0

def take(num: Long): SCollection[T]

def timestampBy(f: (T) ⇒ Instant): SCollection[T]

def toString(): String

def toWindowed: WindowedSCollection[T]

def top(num: Int)(implicit ord: Ordering[T]): SCollection[Iterable[T]]

def union(that: SCollection[T]): SCollection[T]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

def windowByDays(number: Int, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]

def windowByMonths(number: Int, options: WindowOptions[IntervalWindow] = WindowOptions()): SCollection[T]