PairSkewedSCollectionFunctions

Instance Constructors

new PairSkewedSCollectionFunctions(self: SCollection[(K, V)])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val self: SCollection[(K, V)]
def skewedFullOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Option[V], Option[W]))]

N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
Perform a skewed full outer join where some keys on the left hand may be hot, i.e.appear more thanhotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind.
cms
left hand side key com.twitter.algebird.CMSMonoid
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = CMS.aggregator[K](eps, delta, seed) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedJoin(logMetadata, hotKeyThreshold = 8500, cms=hotKeyCMS)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.
def skewedFullOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long = 9000, eps: Double = 0.001, seed: Int = 42, delta: Double = 1e-10, sampleFraction: Double = 1.0, withReplacement: Boolean = true)(implicit arg0: Coder[W], hasher: CMSHasher[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Option[V], Option[W]))]

N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
Perform a skewed full join where some keys on the left hand may be hot, i.e. appear more than hotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind. If you sample input via sampleFraction make sure to adjust hotKeyThreshold accordingly.
eps
One-sided error bound on the error of each point query, i.e. frequency estimate. Must lie in (0, 1).
seed
A seed to initialize the random number generator used to create the pairwise independent hash functions.
delta
A bound on the probability that a query estimate does not lie within some small interval (an interval that depends on eps) around the truth. Must lie in (0, 1).
sampleFraction
left side sample fraction. Default is 1.0 - no sampling.
withReplacement
whether to use sampling with replacement, see SCollection.sample.
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val p = logs.skewedLeftJoin(logMetadata)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.
def skewedJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, W))]

N to 1 skew-proof flavor of PairSCollectionFunctions.join.
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
Perform a skewed join where some keys on the left hand may be hot, i.e. appear more than hotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind.
cms
left hand side key com.twitter.algebird.CMSMonoid
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = CMS.aggregator[K](eps, delta, seed) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedJoin(logMetadata, hotKeyThreshold = 8500, cms=hotKeyCMS)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.
def skewedJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long = 9000, eps: Double = 0.001, seed: Int = 42, delta: Double = 1e-10, sampleFraction: Double = 1.0, withReplacement: Boolean = true)(implicit arg0: Coder[W], hasher: CMSHasher[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, W))]

N to 1 skew-proof flavor of PairSCollectionFunctions.join.
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
Perform a skewed join where some keys on the left hand may be hot, i.e. appear more than hotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind. If you sample input via sampleFraction make sure to adjust hotKeyThreshold accordingly.
eps
One-sided error bound on the error of each point query, i.e. frequency estimate. Must lie in (0, 1).
seed
A seed to initialize the random number generator used to create the pairwise independent hash functions.
delta
A bound on the probability that a query estimate does not lie within some small interval (an interval that depends on eps) around the truth. Must lie in (0, 1).
sampleFraction
left side sample fraction. Default is 1.0 - no sampling.
withReplacement
whether to use sampling with replacement, see SCollection.sample.
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val p = logs.skewedJoin(logMetadata)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.
def skewedLeftOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]

N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
Perform a skewed left join where some keys on the left hand may be hot, i.e. appear more than hotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind.
cms
left hand side key com.twitter.algebird.CMSMonoid
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = CMS.aggregator[K](eps, delta, seed) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedJoin(logMetadata, hotKeyThreshold = 8500, cms=hotKeyCMS)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.
def skewedLeftOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long = 9000, eps: Double = 0.001, seed: Int = 42, delta: Double = 1e-10, sampleFraction: Double = 1.0, withReplacement: Boolean = true)(implicit arg0: Coder[W], hasher: CMSHasher[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]

N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
Perform a skewed left join where some keys on the left hand may be hot, i.e. appear more than hotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind. If you sample input via sampleFraction make sure to adjust hotKeyThreshold accordingly.
eps
One-sided error bound on the error of each point query, i.e. frequency estimate. Must lie in (0, 1).
seed
A seed to initialize the random number generator used to create the pairwise independent hash functions.
delta
A bound on the probability that a query estimate does not lie within some small interval (an interval that depends on eps) around the truth. Must lie in (0, 1).
sampleFraction
left side sample fraction. Default is 1.0 - no sampling.
withReplacement
whether to use sampling with replacement, see SCollection.sample.
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val p = logs.skewedLeftJoin(logMetadata)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Deprecated Value Members

def skewedLeftJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]

N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
Perform a skewed left join where some keys on the left hand may be hot, i.e. appear more than hotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind.
cms
left hand side key com.twitter.algebird.CMSMonoid
Annotations
@deprecated
Deprecated
(Since version 0.8.0) Use SCollection[(K, V)].skewedLeftOuterJoin(rhs, hotKeyThreshold, cms) instead.
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = CMS.aggregator[K](eps, delta, seed) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedJoin(logMetadata, hotKeyThreshold = 8500, cms=hotKeyCMS)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.
def skewedLeftJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long = 9000, eps: Double = 0.001, seed: Int = 42, delta: Double = 1e-10, sampleFraction: Double = 1.0, withReplacement: Boolean = true)(implicit arg0: Coder[W], hasher: CMSHasher[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]

N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
Perform a skewed left join where some keys on the left hand may be hot, i.e. appear more than hotKeyThreshold times. Frequency of a key is estimated with 1 - delta probability, and the estimate is within eps * N of the true frequency. true frequency <= estimate <= true frequency + eps * N, where N is the total size of the left hand side stream so far.
hotKeyThreshold
key with hotKeyThreshold values will be considered hot. Some runners have inefficient GroupByKey implementation for groups with more than 10K values. Thus it is recommended to set hotKeyThreshold to below 10K, keep upper estimation error in mind. If you sample input via sampleFraction make sure to adjust hotKeyThreshold accordingly.
eps
One-sided error bound on the error of each point query, i.e. frequency estimate. Must lie in (0, 1).
seed
A seed to initialize the random number generator used to create the pairwise independent hash functions.
delta
A bound on the probability that a query estimate does not lie within some small interval (an interval that depends on eps) around the truth. Must lie in (0, 1).
sampleFraction
left side sample fraction. Default is 1.0 - no sampling.
withReplacement
whether to use sampling with replacement, see SCollection.sample.
Annotations
@deprecated
Deprecated
(Since version 0.8.0) Use SCollection[(K, V)].skewedLeftOuterJoin(rhs) instead.
Example:
1. // Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val p = logs.skewedLeftJoin(logMetadata)
  Read more about CMS: com.twitter.algebird.CMSMonoid.
Note
Make sure to import com.twitter.algebird.CMSHasherImplicits before using this join.

Related Doc: package values

class PairSkewedSCollectionFunctions[K, V] extends AnyRef

Instance Constructors

new PairSkewedSCollectionFunctions(self: SCollection[(K, V)])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val self: SCollection[(K, V)]

def skewedFullOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Option[V], Option[W]))]

def skewedJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, W))]

def skewedLeftOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Deprecated Value Members

def skewedLeftJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]

Inherited from AnyRef

Inherited from Any

Join Operations

Ungrouped