PairRDDFunctions

Instance Constructors

new PairRDDFunctions(rdd: RDD[(K, V)])(implicit arg0: ClassTag[K], arg1: ClassTag[V])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fullOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

Perform a full outer join of this and other.
Perform a full outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements in this have key k. Uses the default Partitioner to partition the output RDD.
During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.
For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.
def fullOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

Perform a full outer join of this and other.
Perform a full outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements in this have key k. Uses the org.apache.spark.HashPartitioner to partition the results.
During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.
For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.
def fullOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

Perform a full outer join of this and other.
Perform a full outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements in this have key k. Uses the given Partitioner to partition the output RDD.
During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.
For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def leftOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

Perform a left outer join of this and other.
Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Uses the default Partitioner to partition the results.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, consider using other.rightOuterMergeJoin(this) if other is the larger of the two RDDs being joined.
def leftOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

Perform a left outer join of this and other.
Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Uses org.apache.spark.HashPartitioner to partition the results into {partitions} partitions.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, consider using other.rightOuterMergeJoin(this) if other is the larger of the two RDDs being joined.
def leftOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

Perform a left outer join of this and other.
Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Uses the given Partitioner to partition the output RDD.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, consider using other.rightOuterMergeJoin(this) if other is the larger of the two RDDs being joined.
def mergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

Return an RDD containing all pairs of elements with matching keys in this and other.
Return an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Uses the default Partitioner to partition the results.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.
def mergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

Return an RDD containing all pairs of elements with matching keys in this and other.
Return an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Uses org.apache.spark.HashPartitioner to partition the results into {partitions} partitions.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.
def mergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

Return an RDD containing all pairs of elements with matching keys in this and other.
Return an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Uses the given Partitioner to partition the output RDD.
During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.
For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def rightOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

Perform a right outer join of this and other.
Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Uses the default Partitioner to partition the results.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, consider using other.leftOuterMergeJoin(this) if other is the larger of the two RDDs being joined.
def rightOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

Perform a right outer join of this and other.
Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Uses the org.apache.spark.HashPartitioner to partition the results.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, consider using other.leftOuterMergeJoin(this) if other is the larger of the two RDDs being joined.
def rightOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

Perform a right outer join of this and other.
Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Uses the given Partitioner to partition the output RDD.
During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.
For performance reasons, consider using other.leftOuterMergeJoin(this) if other is the larger of the two RDDs being joined.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package rdd

class PairRDDFunctions[K, V] extends Serializable

Instance Constructors

new PairRDDFunctions(rdd: RDD[(K, V)])(implicit arg0: ClassTag[K], arg1: ClassTag[V])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def fullOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

def fullOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

def fullOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def leftOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

def leftOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

def leftOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

def mergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

def mergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

def mergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def rightOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

def rightOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

def rightOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped