

trait DList[A] extends DataSinks with Persistent[Seq[A]]

A list that is distributed across multiple machines.

It supports a few Traversable-like methods:

- parallelDo: a 'map' operation transforming elements of the list in parallel - ++: to concatenate 2 DLists - groupByKey: to group a list of (key, value) elements by key, so as to get (key, values) - combine: a parallel 'reduce' operation - materialise: transforms a distributed list into a non-distributed list

Type Members

  1. abstract type C <: CompNode

  2. type T = DList[A]

Abstract Value Members

  1. abstract def ++(ins: DList[A]*): DList[A]

    Concatenate one or more distributed lists to this distributed list.

  2. abstract def addSink(sink: Sink): T

  3. abstract def combine[K, V](f: Reduction[V])(implicit ev: <:<[A, (K, Iterable[V])], wk: WireFormat[K], wv: WireFormat[V]): DList[(K, V)]

    Apply an associative function to reduce the collection of values to a single value in a key-value-collection distributed list.

  4. abstract def compressWith(codec: CompressionCodec, compressionType: CompressionType = CompressionType.BLOCK): T

  5. abstract def groupByKey[K, V](implicit ev: <:<[A, (K, V)], wk: WireFormat[K], gpk: Grouping[K], wv: WireFormat[V]): DList[(K, Iterable[V])]

    Group the values of a distributed list with key-value elements by key.

  6. abstract def materialise: DObject[Iterable[A]]

    Turn a distributed list into a normal, non-distributed collection that can be accessed by the client

  7. abstract def parallelDo[B](dofn: DoFn[A, B])(implicit arg0: WireFormat[B]): DList[B]

  8. abstract def parallelDo[B, E](env: DObject[E], dofn: EnvDoFn[A, B, E])(implicit arg0: WireFormat[B], arg1: WireFormat[E]): DList[B]

    Apply a specified function to "chunks" of elements from the distributed list to produce zero or more output elements.

    Apply a specified function to "chunks" of elements from the distributed list to produce zero or more output elements. The resulting output elements from the many "chunks" form a new distributed list

  9. abstract def updateSinks(f: (Seq[Sink]) ⇒ Seq[Sink]): T

Concrete Value Members

  7. def by[K](kf: (A) ⇒ K)(implicit arg0: WireFormat[K]): DList[(K, A)]

    Create a new distributed list that is keyed based on a specified function.

  9. def collect[B](pf: PartialFunction[A, B])(implicit arg0: WireFormat[B]): DList[B]

    Build a new DList by applying a partial function to all elements of this DList on which the function is defined

  10. def compress: T

    Definition Classes
  11. def count(p: (A) ⇒ Boolean): DObject[Int]

    Count the number of elements in the list which satisfy a predicate.

  12. def distinct: DList[A]

    Build a new distributed list from this list without any duplicate elements.

  15. def filter(p: (A) ⇒ Boolean): DList[A]

    Keep elements from the distributed list that pass a specified predicate function

  16. def filterNot(p: (A) ⇒ Boolean): DList[A]

    Keep elements from the distributed list that do not pass a specified predicate function

  18. def flatten[B](implicit ev: <:<[A, Iterable[B]], mB: Manifest[B], wtB: WireFormat[B]): DList[B]

    Converts a distributed list of iterable values into to a distributed list in which all the values are concatenated.

  20. def groupBy[K](f: (A) ⇒ K)(implicit arg0: WireFormat[K], arg1: Grouping[K]): DList[(K, Iterable[A])]

    Group the values of a distributed list according to some discriminator function.

  21. def groupByKeyWith[K, V](grouping: Grouping[K])(implicit ev: <:<[A, (K, V)], wfk: WireFormat[K], wfv: WireFormat[V]): DList[(K, Iterable[V])]

    Group the values of a distributed list with key-value elements by key.

    Group the values of a distributed list with key-value elements by key. And explicitly take the grouping that should be used. This is best used when you're doing things like secondary sorts, or groupings with strange logic (like making sure None's / nulls are sprayed across all reducers

  22. def groupWith[K](f: (A) ⇒ K)(gpk: Grouping[K])(implicit arg0: WireFormat[K]): DList[(K, Iterable[A])]

    Group the value of a distributed list according to some discriminator function and some grouping function.

  24. def head: DObject[A]


    the head of the DList as a DObject. This is an unsafe operation

  25. def headOption: DObject[Option[A]]


    the head of the DList as a DObject containing an Option

  26. def isEqual(to: DList[A])(implicit cmp: Grouping[A]): DObject[Boolean]

    Returns if the other DList has the same elements.

    Returns if the other DList has the same elements. A DList is unordered so order isn't considered. The Grouping required isn't very special and almost any will work (including grouping designed for secondary sorting) but for completeness, it is required to send two equal As to the same partition, and sortCompare provide total ordering

  28. def keys[K, V](implicit ev: <:<[A, (K, V)], mwk: WireFormat[K], mwv: WireFormat[V]): DList[K]

    Create a distributed list containing just the keys of a key-value distributed list.

  29. def length: DObject[Int]

    The length of the distributed list.

  30. def map[B](f: (A) ⇒ B)(implicit arg0: WireFormat[B]): DList[B]

    For each element of the distributed list produce a new element by applying a specified function.

    For each element of the distributed list produce a new element by applying a specified function. The resulting collection of elements form a new distributed list

  31. def mapFlatten[B](f: (A) ⇒ Iterable[B])(implicit arg0: WireFormat[B]): DList[B]

    For each element of the distributed list produce zero or more elements by applying a specified function.

    For each element of the distributed list produce zero or more elements by applying a specified function. The resulting collection of elements form a new distributed list

  32. def max(implicit cmp: Ordering[A]): DObject[A]

    Find the largest element in the distributed list.

  33. def maxBy[B](f: (A) ⇒ B)(cmp: Ordering[B]): DObject[A]

    Find the largest element in the distributed list.

  34. def min(implicit cmp: Ordering[A]): DObject[A]

    Find the smallest element in the distributed list.

  35. def minBy[B](f: (A) ⇒ B)(cmp: Ordering[B]): DObject[A]

    Find the smallest element in the distributed list.

  39. def parallelDo[B](fn: (A, ScoobiJobContext) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter2): DList[B]

  40. def parallelDo[B](fn: (A, Heartbeat) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter1): DList[B]

  41. def parallelDo[B](fn: (A, Counters) ⇒ B)(implicit wf: WireFormat[B], p: ImplicitParameter): DList[B]

  42. def parallelDo[B](fn: (A, Emitter[B]) ⇒ Unit)(implicit arg0: WireFormat[B]): DList[B]

  43. def partition(p: (A) ⇒ Boolean): (DList[A], DList[A])

    Partitions this distributed list into a pair of distributed lists according to some predicate.

    Partitions this distributed list into a pair of distributed lists according to some predicate. The first distributed list consists of elements that satisfy the predicate and the second of all elements that don't.

  44. def product(implicit num: Numeric[A]): DObject[A]

    Multiply up the elements of this distribute list.

  45. def reduce(op: Reduction[A]): DObject[A]

    Reduce the elements of this distributed list using the specified associative binary operator.

    Reduce the elements of this distributed list using the specified associative binary operator. The order in which the elements are reduced is unspecified and may be non-deterministic

  46. def reduceOption(op: Reduction[A]): DObject[Option[A]]

    Reduce the elements of this distributed list using the specified associative binary operator and a default value if the list is empty.

    Reduce the elements of this distributed list using the specified associative binary operator and a default value if the list is empty. The order in which the elements are reduced is unspecified and may be non-deterministic

  47. def shuffle: DList[A]

    Randomly shuffle a DList.

  48. def size: DObject[Int]

    The size of the distributed list.

  49. def sum(implicit num: Numeric[A]): DObject[A]

    Sum up the elements of this distribute list.

  52. def values[K, V](implicit ev: <:<[A, (K, V)], mwk: WireFormat[K], mwv: WireFormat[V]): DList[V]

    Create a distributed list containing just the values of a key-value distributed list.

  56. implicit def wf: WireFormat[A]

  57. def withFilter(p: (A) ⇒ Boolean): DList[A]

    the withFilter method

  58. def zipWithIndex: DList[(A, Long)]

    Add an index (Long) to the DList where the index is between 0 and .

    Add an index (Long) to the DList where the index is between 0 and .size-1 of the DList

Deprecated Value Members

  1. def flatMap[B](f: (A) ⇒ Iterable[B])(implicit arg0: WireFormat[B]): DList[B]


    (Since version 0.7.0) use mapFlatten instead because DList is not a subclass of Iterator and a well-behaved flatMap operation accepts an argument: A => DList[B]

  2. def materialize: DObject[Iterable[A]]


    (Since version 0.6.0) use materialise instead

