Class

com.hindog.spark.rdd

PairRDDFunctions

Related Doc: package rdd

Permalink

class PairRDDFunctions[K, V] extends Serializable

Merge-join operators that provide scalable equivalents to the existing Spark RDD join, leftOuterJoin, rightOuterJoin, fullOuterJoin operators.

Refer to the documentation for MergeJoin for implementation details.

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PairRDDFunctions
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PairRDDFunctions(rdd: RDD[(K, V)])(implicit arg0: ClassTag[K], arg1: ClassTag[V])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. def fullOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

    Permalink

    Perform a full outer join of this and other.

    Perform a full outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements in this have key k. Uses the default Partitioner to partition the output RDD.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.

    For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.

  10. def fullOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

    Permalink

    Perform a full outer join of this and other.

    Perform a full outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements in this have key k. Uses the org.apache.spark.HashPartitioner to partition the results.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.

    For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.

  11. def fullOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], Option[W]))]

    Permalink

    Perform a full outer join of this and other.

    Perform a full outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in other, or the pair (k, (Some(v), None)) if no elements in other have key k. Similarly, for each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in this, or the pair (k, (None, Some(w))) if no elements in this have key k. Uses the given Partitioner to partition the output RDD.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.

    For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.

  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  15. def leftOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

    Permalink

    Perform a left outer join of this and other.

    Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Uses the default Partitioner to partition the results.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, consider using other.rightOuterMergeJoin(this) if other is the larger of the two RDDs being joined.

  16. def leftOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

    Permalink

    Perform a left outer join of this and other.

    Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Uses org.apache.spark.HashPartitioner to partition the results into {partitions} partitions.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, consider using other.rightOuterMergeJoin(this) if other is the larger of the two RDDs being joined.

  17. def leftOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, Option[W]))]

    Permalink

    Perform a left outer join of this and other.

    Perform a left outer join of this and other. For each element (k, v) in this, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in other, or the pair (k, (v, None)) if no elements in other have key k. Uses the given Partitioner to partition the output RDD.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, consider using other.rightOuterMergeJoin(this) if other is the larger of the two RDDs being joined.

  18. def mergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

    Permalink

    Return an RDD containing all pairs of elements with matching keys in this and other.

    Return an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Uses the default Partitioner to partition the results.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.

  19. def mergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

    Permalink

    Return an RDD containing all pairs of elements with matching keys in this and other.

    Return an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Uses org.apache.spark.HashPartitioner to partition the results into {partitions} partitions.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.

  20. def mergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (V, W))]

    Permalink

    Return an RDD containing all pairs of elements with matching keys in this and other.

    Return an RDD containing all pairs of elements with matching keys in this and other. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in this and (k, v2) is in other. Uses the given Partitioner to partition the output RDD.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and spilled to disk, if necessary, so they may be iterated across for every value of (k) in this.

    For performance reasons, the side of the join that has the largest number of values per unique key grouping, on average, should be this and joined against other so that the likelihood of a spill occurring with other will be reduced.

  21. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  22. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  23. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  24. def rightOuterMergeJoin[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

    Permalink

    Perform a right outer join of this and other.

    Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Uses the default Partitioner to partition the results.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, consider using other.leftOuterMergeJoin(this) if other is the larger of the two RDDs being joined.

  25. def rightOuterMergeJoin[W](other: RDD[(K, W)], partitions: Int)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

    Permalink

    Perform a right outer join of this and other.

    Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Uses the org.apache.spark.HashPartitioner to partition the results.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, consider using other.leftOuterMergeJoin(this) if other is the larger of the two RDDs being joined.

  26. def rightOuterMergeJoin[W](other: RDD[(K, W)], partitioner: Partitioner)(implicit arg0: ClassTag[W], ord: Ordering[K]): RDD[(K, (Option[V], W))]

    Permalink

    Perform a right outer join of this and other.

    Perform a right outer join of this and other. For each element (k, w) in other, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in this, or the pair (k, (None, w)) if no elements in this have key k. Uses the given Partitioner to partition the output RDD.

    During the join, values for any given (k) that is present in the right side will accumulate into memory and (if necessary) spilled to disk so they may be iterated across for every value of (k) in this. There is no accumulation of values for this, only other.

    For performance reasons, consider using other.leftOuterMergeJoin(this) if other is the larger of the two RDDs being joined.

  27. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  28. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  29. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped