com.twitter.scalding.typed

KeyedListLike

trait KeyedListLike[K, +T, +This[K, +T] <: KeyedListLike[K, T, This]] extends Serializable

Represents sharded lists of items of type T There are exactly two fundamental operations: toTypedPipe: marks the end of the grouped-on-key operations. mapValueStream: further transforms all values, in order, one at a time, with a function from Iterator to another Iterator

Linear Supertypes
Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. KeyedListLike
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def bufferedTake(n: Int): This[K, T]

    This is like take except that the items are kept in memory and we attempt to partially execute on the mappers if possible For very large values of n, this could create memory pressure.

    This is like take except that the items are kept in memory and we attempt to partially execute on the mappers if possible For very large values of n, this could create memory pressure. (as you may aggregate n items in a memory heap for each key) If you get OOM issues, try to resolve using the method take instead.

  2. abstract def filterKeys(fn: (K) ⇒ Boolean): This[K, T]

    filter keys on a predicate.

    filter keys on a predicate. More efficient than filter if you are only looking at keys

  3. abstract def mapGroup[V](smfn: (K, Iterator[T]) ⇒ Iterator[V]): This[K, V]

    Operate on an Iterator[T] of all the values for each key at one time.

    Operate on an Iterator[T] of all the values for each key at one time. Prefer this to toList, when you can avoid accumulating the whole list in memory. Prefer sum, which is partially executed map-side by default. Use mapValueStream when you don't care about the key for the group.

    Iterator is always Non-empty. Note, any key that has all values removed will not appear in subsequent .mapGroup/mapValueStream

  4. abstract def toTypedPipe: TypedPipe[(K, T)]

    End of the operations on values.

    End of the operations on values. From this point on the keyed structure is lost and another shuffle is generally required to reconstruct it

Concrete Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def aggregate[B, C](agg: Aggregator[T, B, C]): This[K, C]

    Use Algebird Aggregator to do the reduction

  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def count(fn: (T) ⇒ Boolean): This[K, Long]

    For each key, count the number of values that satisfy a predicate

  10. def distinctSize: This[K, Long]

    For each key, give the number of unique values.

    For each key, give the number of unique values. WARNING: May OOM. This assumes the values for each key can fit in memory.

  11. def distinctValues: This[K, T]

    For each key, remove duplicate values.

    For each key, remove duplicate values. WARNING: May OOM. This assumes the values for each key can fit in memory.

  12. def drop(n: Int): This[K, T]

    For each key, selects all elements except first n ones.

  13. def dropWhile(p: (T) ⇒ Boolean): This[K, T]

    For each key, Drops longest prefix of elements that satisfy the given predicate.

  14. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  16. def filter(fn: ((K, T)) ⇒ Boolean): This[K, T]

    .

    .filter(fn).toTypedPipe == .toTypedPipe.filter(fn) It is generally better to avoid going back to a TypedPipe as long as possible: this minimizes the times we go in and out of cascading/hadoop types.

  17. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. def flatMapValues[V](fn: (T) ⇒ TraversableOnce[V]): This[K, V]

    Similar to mapValues, but works like flatMap, returning a collection of outputs for each value input.

  19. def flattenValues[U](implicit ev: <:<[T, TraversableOnce[U]]): This[K, U]

    flatten the values Useful after sortedTake, for instance

  20. def fold[V](f: Fold[T, V]): This[K, V]

    Folds are composable aggregations that make one pass over the data.

    Folds are composable aggregations that make one pass over the data. If you need to do several custom folds over the same data, use Fold.join and this method

  21. def foldLeft[B](z: B)(fn: (B, T) ⇒ B): This[K, B]

    For each key, fold the values.

    For each key, fold the values. see scala.collection.Iterable.foldLeft

  22. def foldWithKey[V](fn: (K) ⇒ Fold[T, V]): This[K, V]

    If the fold depends on the key, use this method to construct the fold for each key

  23. def forall(fn: (T) ⇒ Boolean): This[K, Boolean]

    For each key, check to see if a predicate is true for all Values

  24. def forceToReducers: This[K, T]

    This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle.

    This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle. For expert tuning

  25. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  26. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  27. def head: This[K, T]

    Use this to get the first value encountered.

    Use this to get the first value encountered. prefer this to take(1).

  28. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  29. def keys: TypedPipe[K]

    Convert to a TypedPipe and only keep the keys

  30. def mapValueStream[V](smfn: (Iterator[T]) ⇒ Iterator[V]): This[K, V]

    Use this when you don't care about the key for the group, otherwise use mapGroup

  31. def mapValues[V](fn: (T) ⇒ V): This[K, V]

    This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once.

    This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once. An unoptimized implementation is: mapValueStream { _.map { fn } } but for Grouped we can avoid resorting to mapValueStream

  32. def max[B >: T](implicit cmp: Ordering[B]): This[K, T]

    For each key, give the maximum value

  33. def maxBy[B](fn: (T) ⇒ B)(implicit cmp: Ordering[B]): This[K, T]

    For each key, give the maximum value by some function

  34. def min[B >: T](implicit cmp: Ordering[B]): This[K, T]

    For each key, give the minimum value

  35. def minBy[B](fn: (T) ⇒ B)(implicit cmp: Ordering[B]): This[K, T]

    For each key, give the minimum value by some function

  36. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  37. final def notify(): Unit

    Definition Classes
    AnyRef
  38. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  39. def product[U >: T](implicit ring: Ring[U]): This[K, U]

    For each key, Return the product of all the values

  40. def reduce[U >: T](fn: (U, U) ⇒ U): This[K, U]

    reduce with fn which must be associative and commutative.

    reduce with fn which must be associative and commutative. Like the above this can be optimized in some Grouped cases. If you don't have a commutative operator, use reduceLeft

  41. def reduceLeft[U >: T](fn: (U, U) ⇒ U): This[K, U]

    Similar to reduce but always on the reduce-side (never optimized to mapside), and named for the scala function.

    Similar to reduce but always on the reduce-side (never optimized to mapside), and named for the scala function. fn need not be associative and/or commutative. Makes sense when you want to reduce, but in a particular sorted order. the old value comes in on the left.

  42. def scanLeft[B](z: B)(fn: (B, T) ⇒ B): This[K, B]

    For each key, scanLeft the values.

    For each key, scanLeft the values. see scala.collection.Iterable.scanLeft

  43. def size: This[K, Long]

    For each key, give the number of values

  44. def sortWithTake[U >: T](k: Int)(lessThan: (U, U) ⇒ Boolean): This[K, Seq[T]]

    Like the above, but with a less than operation for the ordering

  45. def sortedReverseTake(k: Int)(implicit ord: Ordering[_ >: T]): This[K, Seq[T]]

    Take the largest k things according to the implicit ordering.

    Take the largest k things according to the implicit ordering. Useful for top-k without having to call ord.reverse

  46. def sortedTake(k: Int)(implicit ord: Ordering[_ >: T]): This[K, Seq[T]]

    This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result.

    This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result. This is faster than using .take if k * (number of Keys) is small enough to fit in memory.

  47. def sum[U >: T](implicit sg: Semigroup[U]): This[K, U]

    Add all items according to the implicit Semigroup If there is no sorting, we default to assuming the Semigroup is commutative.

    Add all items according to the implicit Semigroup If there is no sorting, we default to assuming the Semigroup is commutative. If you don't want that, define an ordering on the Values, sort or .forceToReducers.

    Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce

  48. def sumLeft[U >: T](implicit sg: Semigroup[U]): This[K, U]

    Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce/reduceLeft

  49. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  50. def take(n: Int): This[K, T]

    For each key, Selects first n elements.

    For each key, Selects first n elements. Don't use this if n == 1, head is faster in that case.

  51. def takeWhile(p: (T) ⇒ Boolean): This[K, T]

    For each key, Takes longest prefix of elements that satisfy the given predicate.

  52. def toList: This[K, List[T]]

    AVOID THIS IF POSSIBLE For each key, accumulate all the values into a List.

    AVOID THIS IF POSSIBLE For each key, accumulate all the values into a List. WARNING: May OOM Only use this method if you are sure all the values will fit in memory. You really should try to ask why you need all the values, and if you want to do some custom reduction, do it in mapGroup or mapValueStream

  53. def toSet[U >: T]: This[K, Set[U]]

    AVOID THIS IF POSSIBLE Same risks apply here as to toList: you may OOM.

    AVOID THIS IF POSSIBLE Same risks apply here as to toList: you may OOM. See toList. Note that toSet needs to be parameterized even though toList does not. This is because List is covariant in its type parameter in the scala API, but Set is invariant. See: http://stackoverflow.com/questions/676615/why-is-scalas-immutable-set-not-covariant-in-its-type

  54. def toString(): String

    Definition Classes
    AnyRef → Any
  55. def values: TypedPipe[T]

    Convert to a TypedPipe and only keep the values

  56. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  57. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  58. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped