# RichPairDataset

### Related Doc: package datasetops

#### implicit final class RichPairDataset[K, V] extends AnyVal

Linear Supertypes
AnyVal, Any
Ordering
1. Alphabetic
2. By Inheritance
Inherited
1. RichPairDataset
2. AnyVal
3. Any
1. Hide All
2. Show All
Visibility
1. Public
2. All

### Value Members

1. #### final def !=(arg0: Any): Boolean

Definition Classes
Any
2. #### final def ##(): Int

Definition Classes
Any
3. #### final def ==(arg0: Any): Boolean

Definition Classes
Any
4. #### def aggByKey[B, U](zero: B, reduce: (B, V) ⇒ B, merge: (B, B) ⇒ B, finish: (B) ⇒ U)(implicit envK: Encoder[K], envV: Encoder[V], encB: Encoder[B], encU: Encoder[U]): Dataset[(K, U)]

Use a zero element and scala functions (reduce, merge, and finish) to aggregate the key-value Dataset's values for each key.

Use a zero element and scala functions (reduce, merge, and finish) to aggregate the key-value Dataset's values for each key.

```scala> val zero = 1.0
scala> val reduce = (x: Double, y: Int) => x / y
scala> val merge = (x: Double, y: Double) => x + y
scala> val finish = (x: Double) => x * 10
scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3)).toDS.aggByKey(zero, reduce, merge, finish).show
+-----+------------------+
|value|       anon\$1(int)|
+-----+------------------+
|    1|18.333333333333332|
+-----+------------------+```
5. #### def aggByKey[B, U](zero: () ⇒ B, reduce: (B, V) ⇒ B, merge: (B, B) ⇒ B, finish: (B) ⇒ U)(implicit envK: Encoder[K], envV: Encoder[V], encB: Encoder[B], encU: Encoder[U]): Dataset[(K, U)]

Use scala functions to aggregate the key-value Dataset's values for each key: zero, reduce, merge, and finish.

Use scala functions to aggregate the key-value Dataset's values for each key: zero, reduce, merge, and finish.

```scala> val zero = () => 1.0
scala> val reduce = (x: Double, y: Int) => x / y
scala> val merge = (x: Double, y: Double) => x + y
scala> val finish = (x: Double) => x * 10
scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3)).toDS.aggByKey(zero, reduce, merge, finish).show
+-----+------------------+
|value|       anon\$1(int)|
+-----+------------------+
|    1|18.333333333333332|
+-----+------------------+```
6. #### def aggByKey[U1, U2, U3, U4](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4])(implicit encK: Encoder[K], encV: Encoder[V]): Dataset[(K, U1, U2, U3, U4)]

Use four TypedColumns to aggregate the key-value Dataset's values for each key.

Use four TypedColumns to aggregate the key-value Dataset's values for each key.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> val agg = typed.sum((x: Int) => x)
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.aggByKey(agg, agg, agg, agg).show
+-----+-------------------+-------------------+-------------------+-------------------+
|value|TypedSumDouble(int)|TypedSumDouble(int)|TypedSumDouble(int)|TypedSumDouble(int)|
+-----+-------------------+-------------------+-------------------+-------------------+
|    2|                4.0|                4.0|                4.0|                4.0|
|    1|                5.0|                5.0|                5.0|                5.0|
+-----+-------------------+-------------------+-------------------+-------------------+```
7. #### def aggByKey[U1, U2, U3](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3])(implicit encK: Encoder[K], encV: Encoder[V]): Dataset[(K, U1, U2, U3)]

Use three TypedColumns to aggregate the key-value Dataset's values for each key.

Use three TypedColumns to aggregate the key-value Dataset's values for each key.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.aggByKey(typed.avg(x => x + 2), typed.sum(x => x), typed.avg(x => x - 2)).show
+-----+-----------------+-------------------+-----------------+
|value|TypedAverage(int)|TypedSumDouble(int)|TypedAverage(int)|
+-----+-----------------+-------------------+-----------------+
|    1|              4.5|                5.0|              0.5|
|    2|              6.0|                4.0|              2.0|
+-----+-----------------+-------------------+-----------------+```
8. #### def aggByKey[U1, U2](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2])(implicit encK: Encoder[K], encV: Encoder[V]): Dataset[(K, U1, U2)]

Use two TypedColumns to aggregate the key-value Dataset's values for each key.

Use two TypedColumns to aggregate the key-value Dataset's values for each key.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.aggByKey(typed.avg(x => x + 2), typed.sum(x => x)).show
+-----+-----------------+-------------------+
|value|TypedAverage(int)|TypedSumDouble(int)|
+-----+-----------------+-------------------+
|    1|              4.5|                5.0|
|    2|              6.0|                4.0|
+-----+-----------------+-------------------+```
9. #### def aggByKey[U1](col1: TypedColumn[V, U1])(implicit encK: Encoder[K], encV: Encoder[V]): Dataset[(K, U1)]

Use a TypedColumn to aggregate the key-value Dataset's values for each key.

Use a TypedColumn to aggregate the key-value Dataset's values for each key.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.aggByKey(typed.avg(x => x + 2)).show
+-----+-----------------+
|value|TypedAverage(int)|
+-----+-----------------+
|    1|              4.5|
|    2|              6.0|
+-----+-----------------+```
10. #### final def asInstanceOf[T0]: T0

Definition Classes
Any
11. #### def countByKey()(implicit encK: Encoder[K]): Dataset[(K, Long)]

Count the number rows in the key-value Dataset with each key.

Count the number rows in the key-value Dataset with each key.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.countByKey.show
+-----+--------+
|value|count(1)|
+-----+--------+
|    1|       2|
|    2|       1|
+-----+--------+```

13. #### def flatMapValues[U](f: (V) ⇒ TraversableOnce[U])(implicit encKU: Encoder[(K, U)]): Dataset[(K, U)]

Flat-map the key-value Dataset's values for each key, with the provided function.

Flat-map the key-value Dataset's values for each key, with the provided function.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2)).toDS.flatMapValues{ case x => List(x, x + 1) }.show
+---+---+
| _1| _2|
+---+---+
|  1|  2|
|  1|  3|
+---+---+```
14. #### def fullOuterJoinOnKey[V1](other: Dataset[(K, V1)])(implicit encKV: Encoder[(K, V)], encKV1: Encoder[(K, V1)], encKVOptV1: Encoder[(K, (Option[V], Option[V1]))]): Dataset[(K, (Option[V], Option[V1]))]

Full outer join with another key-value Dataset on their keys.

Full outer join with another key-value Dataset on their keys.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.fullOuterJoinByKey(Seq((1, 4), (1, 5)).toDS).show
+---+--------+
| _1|      _2|
+---+--------+
|  1|   [2,4]|
|  1|   [2,5]|
|  1|   [3,4]|
|  1|   [3,5]|
|  2|[4,null]|
+---+--------+```
15. #### def getClass(): Class[_ <: AnyVal]

Definition Classes
AnyVal → Any
16. #### final def isInstanceOf[T0]: Boolean

Definition Classes
Any
17. #### def joinOnKey[V1](other: Dataset[(K, V1)])(implicit encKV: Encoder[(K, V)], encKV1: Encoder[(K, V1)], encKVV1: Encoder[(K, (V, V1))]): Dataset[(K, (V, V1))]

Inner join with another key-value Dataset on their keys.

Inner join with another key-value Dataset on their keys.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.joinByKey(Seq((1,4)).toDS).show
+---+-----+
| _1|   _2|
+---+-----+
|  1|[2,4]|
|  1|[3,4]|
+---+-----+```
18. #### def keys(implicit encK: Encoder[K]): Dataset[K]

Discard the key-value Dataset's values, leaving only the keys.

Discard the key-value Dataset's values, leaving only the keys.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.keys.show
+-----+
|value|
+-----+
|    1|
|    1|
|    2|
+-----+```
19. #### def leftOuterJoinOnKey[V1](other: Dataset[(K, V1)])(implicit encKV: Encoder[(K, V)], encKV1: Encoder[(K, V1)], encKVOptV1: Encoder[(K, (V, Option[V1]))]): Dataset[(K, (V, Option[V1]))]

Left outer join with another key-value Dataset on their keys.

Left outer join with another key-value Dataset on their keys.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.leftOuterJoinByKey(Seq((1,4)).toDS).show
+---+--------+
| _1|      _2|
+---+--------+
|  1|   [2,4]|
|  1|   [3,4]|
|  2|[4,null]|
+---+--------+```
20. #### def mapValues[U](f: (V) ⇒ U)(implicit encKU: Encoder[(K, U)]): Dataset[(K, U)]

Apply a provided function to the values of the key-value Dataset.

Apply a provided function to the values of the key-value Dataset.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3)).toDS.mapValues(_ + 2).show
+---+---+
| _1| _2|
+---+---+
|  1|  4|
|  1|  5|
+---+---+```
21. #### def partitionByKey(implicit encKV: Encoder[(K, V)]): Dataset[(K, V)]

Partition the key-value Dataset by key.

Partition the key-value Dataset by key.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> val ds = Seq((1, 2), (1, 3), (2, 4)).toDS
scala> ds.rdd.partitions
res1: Array[org.apache.spark.Partition] = Array([email protected]20fe, [email protected]20ff, [email protected]2100)

scala> ds.partitionByKey.rdd.partitions
res2: Array[org.apache.spark.Partition] = Array([email protected]0, [email protected]1, [email protected]2, [email protected]3, [email protected]4, [email protected]5, [email protected]6, [email protected]7)```
22. #### def partitionByKey(numPartitions: Int)(implicit encKV: Encoder[(K, V)]): Dataset[(K, V)]

Partition the key-value Dataset by key, up to the maximum given number of partitions.

Partition the key-value Dataset by key, up to the maximum given number of partitions.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> val ds = Seq((1, 2), (1, 3), (2, 4)).toDS
scala> ds.rdd.partitions
res1: Array[org.apache.spark.Partition] = Array([email protected]20fe, [email protected]20ff, [email protected]2100)

scala> ds.partitionByKey(1).rdd.partitions
res1: Array[org.apache.spark.Partition] = Array([email protected]0)```
23. #### def reduceByKey(f: (V, V) ⇒ V)(implicit encK: Encoder[K], encV: Encoder[V]): Dataset[(K, V)]

Reduce the key-value Dataset's values for each key, with the provided function.

Reduce the key-value Dataset's values for each key, with the provided function.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3)).toDS.reduceByKey(_ + _).show
+-----+---------------------+
|value|ReduceAggregator(int)|
+-----+---------------------+
|    1|                    5|
+-----+---------------------+```
24. #### def rightOuterJoinOnKey[V1](other: Dataset[(K, V1)])(implicit encKV: Encoder[(K, V)], encKV1: Encoder[(K, V1)], encKVOptV1: Encoder[(K, (Option[V], V1))]): Dataset[(K, (Option[V], V1))]

Right outer join with another key-value Dataset on their keys.

Right outer join with another key-value Dataset on their keys.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.rightOuterJoinByKey(Seq((1,4)).toDS).show
+---+-----+
| _1|   _2|
+---+-----+
|  1|[3,4]|
|  1|[2,4]|
+---+-----+```
25. #### def sortWithinPartitionsByKey(implicit encKV: Encoder[(K, V)]): Dataset[(K, V)]

Sort the key-value Dataset within partitions by key.

Sort the key-value Dataset within partitions by key.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> val ds = Seq((1, 2), (3, 1), (2, 2), (2, 6), (1, 1)).toDS
scala> ds.partitionByKey(2).rdd.glom.map(_.map(_._1).toSeq).collect
res56: Array[Seq[Int]] = Array(WrappedArray(2, 2), WrappedArray(1, 3, 1))

scala> ds.partitionByKey(2).sortWithinPartitionsByKey.rdd.glom.map(_.map(_._1).toSeq).collect
res57: Array[Seq[Int]] = Array(WrappedArray(2, 2), WrappedArray(1, 1, 3))```
26. #### def toString(): String

Definition Classes
Any
27. #### def values(implicit encV: Encoder[V]): Dataset[V]

Discard the key-value Dataset's keys, leaving only the values.

Discard the key-value Dataset's keys, leaving only the values.

```scala> import com.tresata.spark.datasetops.RichPairDataset
scala> Seq((1, 2), (1, 3), (2, 4)).toDS.values.show
+-----+
|value|
+-----+
|    2|
|    3|
|    4|
+-----+```