com.twitter.scalding.typed

KeyedListLike

trait KeyedListLike[K, +T, +This[K, +T] <: KeyedListLike[K, T, This]] extends Serializable

Represents sharded lists of items of type T There are exactly two the fundamental operations: toTypedPipe: marks the end of the grouped-on-key operations. mapValueStream: further transforms all values, in order, one at a time, with a function from Iterator to another Iterator

Linear Supertypes
Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. KeyedListLike
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def filterKeys(fn: (K) ⇒ Boolean): This[K, T]

    filter keys on a predicate.

    filter keys on a predicate. More efficient than filter if you are only looking at keys

  2. abstract def mapGroup[V](smfn: (K, Iterator[T]) ⇒ Iterator[V]): This[K, V]

    Operate on an Iterator[T] of all the values for each key at one time.

    Operate on an Iterator[T] of all the values for each key at one time. Avoid accumulating the whole list in memory if you can. Prefer sum, which is partially executed map-side by default.

  3. abstract def toTypedPipe: TypedPipe[(K, T)]

    End of the operations on values.

    End of the operations on values. From this point on the keyed structure is lost and another shuffle is generally required to reconstruct it

Concrete Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def aggregate[B, C](agg: Aggregator[T, B, C]): This[K, C]

    Use Algebird Aggregator to do the reduction

  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def count(fn: (T) ⇒ Boolean): This[K, Long]

  10. def drop(n: Int): This[K, T]

    Selects all elements except first n ones.

  11. def dropWhile(p: (T) ⇒ Boolean): This[K, T]

    Drops longest prefix of elements that satisfy the given predicate.

  12. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  14. def filter(fn: ((K, T)) ⇒ Boolean): This[K, T]

    .

    .filter(fn).toTypedPipe == .toTypedPipe.filter(fn) It is generally better to avoid going back to a TypedPipe as long as possible: this minimizes the times we go in and out of cascading/hadoop types.

  15. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def foldLeft[B](z: B)(fn: (B, T) ⇒ B): This[K, B]

  17. def forall(fn: (T) ⇒ Boolean): This[K, Boolean]

  18. def forceToReducers: This[K, T]

    This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle.

    This is just short hand for mapValueStream(identity), it makes sure the planner sees that you want to force a shuffle. For expert tuning

  19. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  20. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  21. def head: This[K, T]

    Use this to get the first value encountered.

    Use this to get the first value encountered. prefer this to take(1).

  22. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  23. def keys: TypedPipe[K]

  24. def mapValueStream[V](smfn: (Iterator[T]) ⇒ Iterator[V]): This[K, V]

    Use this when you don't care about the key for the group, otherwise use mapGroup

  25. def mapValues[V](fn: (T) ⇒ V): This[K, V]

    This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once.

    This is a special case of mapValueStream, but can be optimized because it doesn't need all the values for a given key at once. An unoptimized implementation is: mapValueStream { _.map { fn } } but for Grouped we can avoid resorting to mapValueStream

  26. def max[B >: T](implicit cmp: Ordering[B]): This[K, T]

  27. def maxBy[B](fn: (T) ⇒ B)(implicit cmp: Ordering[B]): This[K, T]

  28. def min[B >: T](implicit cmp: Ordering[B]): This[K, T]

  29. def minBy[B](fn: (T) ⇒ B)(implicit cmp: Ordering[B]): This[K, T]

  30. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  31. final def notify(): Unit

    Definition Classes
    AnyRef
  32. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  33. def product[U >: T](implicit ring: Ring[U]): This[K, U]

  34. def reduce[U >: T](fn: (U, U) ⇒ U): This[K, U]

    reduce with fn which must be associative and commutative.

    reduce with fn which must be associative and commutative. Like the above this can be optimized in some Grouped cases. If you don't have a commutative operator, use reduceLeft

  35. def reduceLeft[U >: T](fn: (U, U) ⇒ U): This[K, U]

  36. def scanLeft[B](z: B)(fn: (B, T) ⇒ B): This[K, B]

  37. def size: This[K, Long]

  38. def sortWithTake[U >: T](k: Int)(lessThan: (U, U) ⇒ Boolean): This[K, Seq[T]]

    Like the above, but with a less than operation for the ordering

  39. def sortedReverseTake(k: Int)(implicit ord: Ordering[_ >: T]): This[K, Seq[T]]

    Take the largest k things according to the implicit ordering.

    Take the largest k things according to the implicit ordering. Useful for top-k without having to call ord.reverse

  40. def sortedTake(k: Int)(implicit ord: Ordering[_ >: T]): This[K, Seq[T]]

    This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result.

    This implements bottom-k (smallest k items) on each mapper for each key, then sends those to reducers to get the result. This is faster than using .take if k * (number of Keys) is small enough to fit in memory.

  41. def sum[U >: T](implicit sg: Semigroup[U]): This[K, U]

    If there is no ordering, we default to assuming the Semigroup is commutative.

    If there is no ordering, we default to assuming the Semigroup is commutative. If you don't want that, define an ordering on the Values, or .forceToReducers.

    Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce

  42. def sumLeft[U >: T](implicit sg: Semigroup[U]): This[K, U]

    Semigroups MAY have a faster implementation of sum for iterators, so prefer using sum/sumLeft to reduce/reduceLeft

  43. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  44. def take(n: Int): This[K, T]

    Selects first n elements.

    Selects first n elements. Don't use this if n == 1, head is faster in that case.

  45. def takeWhile(p: (T) ⇒ Boolean): This[K, T]

    Takes longest prefix of elements that satisfy the given predicate.

  46. def toList: This[K, List[T]]

  47. def toSet[U >: T]: This[K, Set[U]]

  48. def toString(): String

    Definition Classes
    AnyRef → Any
  49. def values: TypedPipe[T]

  50. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  52. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped