Class

com.hindog.spark.rdd

MergeJoinRDD

Related Doc: package rdd

Permalink

class MergeJoinRDD[K, V, W, Out] extends RDD[Out]

:: @DeveloperApi ::

RDD implementation for merge-join that uses a shuffle to partition and sort by keys using an implicit Ordering for K, and then delegates to an instance of MergeJoin to perform the actual merge logic.

There is an optimization in place to avoid a shuffle in some cases where left or right are guaranteed to be partition-sorted already (ie: via repartitionAndSortWithinPartitions)

Annotations
@DeveloperApi()
Linear Supertypes
RDD[Out], Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. MergeJoinRDD
  2. RDD
  3. Logging
  4. Serializable
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new MergeJoinRDD(left: RDD[(K, V)], right: RDD[(K, W)], partitionJoiner: (MergeJoinPartition[K, V, W], TaskContext) ⇒ Joiner[K, V, W, Out], part: Partitioner, serializer: Option[Serializer] = None)(implicit arg0: ClassTag[K], arg1: ClassTag[V], arg2: ClassTag[W], arg3: ClassTag[Out], ord: Ordering[K])

    Permalink

    left

    The left RDD to be used in the join

    right

    The right RDD to be used in the join

    partitionJoiner

    A function to create the Joiner implementation to use to perform the join

    part

    The partitioner to use

    serializer

    The serializer to use, otherwise use the default

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. def ++(other: RDD[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. def aggregate[U](zeroValue: U)(seqOp: (U, Out) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U

    Permalink
    Definition Classes
    RDD
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def cache(): MergeJoinRDD.this.type

    Permalink
    Definition Classes
    RDD
  8. def cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(Out, U)]

    Permalink
    Definition Classes
    RDD
  9. def checkpoint(): Unit

    Permalink
    Definition Classes
    RDD
  10. def clearDependencies(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    RDD
  11. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. def coalesce(numPartitions: Int, shuffle: Boolean)(implicit ord: Ordering[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  13. def collect[U](f: PartialFunction[Out, U])(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  14. def collect(): Array[Out]

    Permalink
    Definition Classes
    RDD
  15. def compute(part: Partition, context: TaskContext): Iterator[Out]

    Permalink
    Definition Classes
    MergeJoinRDD → RDD
    Annotations
    @DeveloperApi()
  16. def context: SparkContext

    Permalink
    Definition Classes
    RDD
  17. def count(): Long

    Permalink
    Definition Classes
    RDD
  18. def countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]

    Permalink
    Definition Classes
    RDD
    Annotations
    @Experimental()
  19. def countApproxDistinct(relativeSD: Double): Long

    Permalink
    Definition Classes
    RDD
  20. def countApproxDistinct(p: Int, sp: Int): Long

    Permalink
    Definition Classes
    RDD
    Annotations
    @Experimental()
  21. def countByValue()(implicit ord: Ordering[Out]): Map[Out, Long]

    Permalink
    Definition Classes
    RDD
  22. def countByValueApprox(timeout: Long, confidence: Double)(implicit ord: Ordering[Out]): PartialResult[Map[Out, BoundedDouble]]

    Permalink
    Definition Classes
    RDD
    Annotations
    @Experimental()
  23. final def dependencies: Seq[Dependency[_]]

    Permalink
    Definition Classes
    RDD
  24. def distinct(): RDD[Out]

    Permalink
    Definition Classes
    RDD
  25. def distinct(numPartitions: Int)(implicit ord: Ordering[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  26. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  27. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  28. def filter(f: (Out) ⇒ Boolean): RDD[Out]

    Permalink
    Definition Classes
    RDD
  29. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  30. def first(): Out

    Permalink
    Definition Classes
    RDD
  31. def firstParent[U](implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  32. def flatMap[U](f: (Out) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  33. def fold(zeroValue: Out)(op: (Out, Out) ⇒ Out): Out

    Permalink
    Definition Classes
    RDD
  34. def foreach(f: (Out) ⇒ Unit): Unit

    Permalink
    Definition Classes
    RDD
  35. def foreachPartition(f: (Iterator[Out]) ⇒ Unit): Unit

    Permalink
    Definition Classes
    RDD
  36. def getCheckpointFile: Option[String]

    Permalink
    Definition Classes
    RDD
  37. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  38. def getDependencies: Seq[Dependency[_]]

    Permalink
    Attributes
    protected
    Definition Classes
    MergeJoinRDD → RDD
  39. def getPartitions: Array[Partition]

    Permalink
    Attributes
    protected
    Definition Classes
    MergeJoinRDD → RDD
  40. def getPreferredLocations(split: Partition): Seq[String]

    Permalink
    Attributes
    protected
    Definition Classes
    RDD
  41. def getStorageLevel: StorageLevel

    Permalink
    Definition Classes
    RDD
  42. def glom(): RDD[Array[Out]]

    Permalink
    Definition Classes
    RDD
  43. def groupBy[K](f: (Out) ⇒ K, p: Partitioner)(implicit kt: ClassTag[K], ord: Ordering[K]): RDD[(K, Iterable[Out])]

    Permalink
    Definition Classes
    RDD
  44. def groupBy[K](f: (Out) ⇒ K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[Out])]

    Permalink
    Definition Classes
    RDD
  45. def groupBy[K](f: (Out) ⇒ K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[Out])]

    Permalink
    Definition Classes
    RDD
  46. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  47. val id: Int

    Permalink
    Definition Classes
    RDD
  48. def intersection(other: RDD[Out], numPartitions: Int): RDD[Out]

    Permalink
    Definition Classes
    RDD
  49. def intersection(other: RDD[Out], partitioner: Partitioner)(implicit ord: Ordering[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  50. def intersection(other: RDD[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  51. def isCheckpointed: Boolean

    Permalink
    Definition Classes
    RDD
  52. def isEmpty(): Boolean

    Permalink
    Definition Classes
    RDD
  53. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  54. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  55. final def iterator(split: Partition, context: TaskContext): Iterator[Out]

    Permalink
    Definition Classes
    RDD
  56. def keyBy[K](f: (Out) ⇒ K): RDD[(K, Out)]

    Permalink
    Definition Classes
    RDD
  57. def localCheckpoint(): MergeJoinRDD.this.type

    Permalink
    Definition Classes
    RDD
  58. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  63. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  64. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  65. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  66. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  67. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  68. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  69. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  70. def map[U](f: (Out) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  71. def mapPartitions[U](f: (Iterator[Out]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  72. def mapPartitionsWithIndex[U](f: (Int, Iterator[Out]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
  73. def max()(implicit ord: Ordering[Out]): Out

    Permalink
    Definition Classes
    RDD
  74. def min()(implicit ord: Ordering[Out]): Out

    Permalink
    Definition Classes
    RDD
  75. var name: String

    Permalink
    Definition Classes
    RDD
  76. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  77. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  78. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  79. def parent[U](j: Int)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  80. val partitioner: Option[Partitioner]

    Permalink
    Definition Classes
    MergeJoinRDD → RDD
  81. final def partitions: Array[Partition]

    Permalink
    Definition Classes
    RDD
  82. def persist(): MergeJoinRDD.this.type

    Permalink
    Definition Classes
    RDD
  83. def persist(newLevel: StorageLevel): MergeJoinRDD.this.type

    Permalink
    Definition Classes
    RDD
  84. def pipe(command: Seq[String], env: Map[String, String], printPipeContext: ((String) ⇒ Unit) ⇒ Unit, printRDDElement: (Out, (String) ⇒ Unit) ⇒ Unit, separateWorkingDir: Boolean): RDD[String]

    Permalink
    Definition Classes
    RDD
  85. def pipe(command: String, env: Map[String, String]): RDD[String]

    Permalink
    Definition Classes
    RDD
  86. def pipe(command: String): RDD[String]

    Permalink
    Definition Classes
    RDD
  87. final def preferredLocations(split: Partition): Seq[String]

    Permalink
    Definition Classes
    RDD
  88. def randomSplit(weights: Array[Double], seed: Long): Array[RDD[Out]]

    Permalink
    Definition Classes
    RDD
  89. def reduce(f: (Out, Out) ⇒ Out): Out

    Permalink
    Definition Classes
    RDD
  90. def repartition(numPartitions: Int)(implicit ord: Ordering[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  91. def sample(withReplacement: Boolean, fraction: Double, seed: Long): RDD[Out]

    Permalink
    Definition Classes
    RDD
  92. def saveAsObjectFile(path: String): Unit

    Permalink
    Definition Classes
    RDD
  93. def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit

    Permalink
    Definition Classes
    RDD
  94. def saveAsTextFile(path: String): Unit

    Permalink
    Definition Classes
    RDD
  95. def setName(_name: String): MergeJoinRDD.this.type

    Permalink
    Definition Classes
    RDD
  96. def sortBy[K](f: (Out) ⇒ K, ascending: Boolean, numPartitions: Int)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  97. def sparkContext: SparkContext

    Permalink
    Definition Classes
    RDD
  98. def subtract(other: RDD[Out], p: Partitioner)(implicit ord: Ordering[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  99. def subtract(other: RDD[Out], numPartitions: Int): RDD[Out]

    Permalink
    Definition Classes
    RDD
  100. def subtract(other: RDD[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  101. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  102. def take(num: Int): Array[Out]

    Permalink
    Definition Classes
    RDD
  103. def takeOrdered(num: Int)(implicit ord: Ordering[Out]): Array[Out]

    Permalink
    Definition Classes
    RDD
  104. def takeSample(withReplacement: Boolean, num: Int, seed: Long): Array[Out]

    Permalink
    Definition Classes
    RDD
  105. def toDebugString: String

    Permalink
    Definition Classes
    RDD
  106. def toJavaRDD(): JavaRDD[Out]

    Permalink
    Definition Classes
    RDD
  107. def toLocalIterator: Iterator[Out]

    Permalink
    Definition Classes
    RDD
  108. def toString(): String

    Permalink
    Definition Classes
    RDD → AnyRef → Any
  109. def top(num: Int)(implicit ord: Ordering[Out]): Array[Out]

    Permalink
    Definition Classes
    RDD
  110. def treeAggregate[U](zeroValue: U)(seqOp: (U, Out) ⇒ U, combOp: (U, U) ⇒ U, depth: Int)(implicit arg0: ClassTag[U]): U

    Permalink
    Definition Classes
    RDD
  111. def treeReduce(f: (Out, Out) ⇒ Out, depth: Int): Out

    Permalink
    Definition Classes
    RDD
  112. def union(other: RDD[Out]): RDD[Out]

    Permalink
    Definition Classes
    RDD
  113. def unpersist(blocking: Boolean): MergeJoinRDD.this.type

    Permalink
    Definition Classes
    RDD
  114. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  115. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  116. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  117. def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(Out, U)]

    Permalink
    Definition Classes
    RDD
  118. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D])(f: (Iterator[Out], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  119. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D], preservesPartitioning: Boolean)(f: (Iterator[Out], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  120. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C])(f: (Iterator[Out], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  121. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C], preservesPartitioning: Boolean)(f: (Iterator[Out], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  122. def zipPartitions[B, V](rdd2: RDD[B])(f: (Iterator[Out], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  123. def zipPartitions[B, V](rdd2: RDD[B], preservesPartitioning: Boolean)(f: (Iterator[Out], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Permalink
    Definition Classes
    RDD
  124. def zipWithIndex(): RDD[(Out, Long)]

    Permalink
    Definition Classes
    RDD
  125. def zipWithUniqueId(): RDD[(Out, Long)]

    Permalink
    Definition Classes
    RDD

Deprecated Value Members

  1. def filterWith[A](constructA: (Int) ⇒ A)(p: (Out, A) ⇒ Boolean): RDD[Out]

    Permalink
    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and filter

  2. def flatMapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (Out, A) ⇒ Seq[U])(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and flatMap

  3. def foreachWith[A](constructA: (Int) ⇒ A)(f: (Out, A) ⇒ Unit): Unit

    Permalink
    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex and foreach

  4. def mapPartitionsWithContext[U](f: (TaskContext, Iterator[Out]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
    Annotations
    @DeveloperApi() @deprecated
    Deprecated

    (Since version 1.2.0) use TaskContext.get

  5. def mapPartitionsWithSplit[U](f: (Int, Iterator[Out]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 0.7.0) use mapPartitionsWithIndex

  6. def mapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (Out, A) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Permalink
    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use mapPartitionsWithIndex

  7. def toArray(): Array[Out]

    Permalink
    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 1.0.0) use collect

Inherited from RDD[Out]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped