Merge two TypedPipes (no order is guaranteed) This is only realized when a group (or join) is performed.
Merge two TypedPipes (no order is guaranteed) This is only realized when a group (or join) is performed.
Same as groupAll.
Same as groupAll.aggregate.values
Put the items in this into the keys, and unit as the value in a Group in some sense, this is the dual of groupAll
Put the items in this into the keys, and unit as the value in a Group in some sense, this is the dual of groupAll
Filter and map.
Filter and map. See scala.collection.List.collect.
collect { case Some(x) => fn(x)
}
Attach a ValuePipe to each element this TypedPipe
Attach a ValuePipe to each element this TypedPipe
prints the current pipe to stdout
prints the current pipe to stdout
Returns the set of distinct elements in the TypedPipe
Returns the set of distinct elements in the TypedPipe
Returns the set of distinct elements identified by a given lambda extractor in the TypedPipe
Returns the set of distinct elements identified by a given lambda extractor in the TypedPipe
Merge two TypedPipes of different types by using Either
Merge two TypedPipes of different types by using Either
Sometimes useful for implementing custom joins with groupBy + mapValueStream when you know that the value/key can fit in memory.
Sometimes useful for implementing custom joins with groupBy + mapValueStream when you know that the value/key can fit in memory. Beware.
Keep only items that satisfy this predicate
Keep only items that satisfy this predicate
If T is a (K, V) for some V, then we can use this function to filter.
If T is a (K, V) for some V, then we can use this function to filter. This is here to match the function in KeyedListLike, where it is optimized
Keep only items that don't satisfy the predicate.
Keep only items that don't satisfy the predicate.
filterNot
is the same as filter
with a negated predicate.
common pattern of attaching a value and then filter
common pattern of attaching a value and then filter
common pattern of attaching a value and then flatMap
common pattern of attaching a value and then flatMap
flatten an Iterable
flatten an Iterable
flatten just the values This is more useful on KeyedListLike, but added here to reduce assymmetry in the APIs
flatten just the values This is more useful on KeyedListLike, but added here to reduce assymmetry in the APIs
Force a materialization of this pipe prior to the next operation.
Force a materialization of this pipe prior to the next operation. This is useful if you filter almost everything before a hashJoin, for instance.
If you are going to create two branches or forks, it may be more efficient to call this method first which will create a node in the cascading graph.
If you are going to create two branches or forks, it may be more efficient to call this method first which will create a node in the cascading graph. Without this, both full branches of the fork will be put into separate cascading.
Ideally the planner would see this
This is the default means of grouping all pairs with the same key.
This is the default means of grouping all pairs with the same key. Generally this triggers 1 Map/Reduce transition
Send all items to a single reducer
Send all items to a single reducer
Given a key function, add the key, then call .
Given a key function, add the key, then call .group
Forces a shuffle by randomly assigning each item into one of the partitions.
Forces a shuffle by randomly assigning each item into one of the partitions.
This is for the case where you mappers take a long time, and it is faster to shuffle them to more reducers and then operate.
You probably want shard if you are just forcing a shuffle.
These operations look like joins, but they do not force any communication of the current TypedPipe.
These operations look like joins, but they do not force any communication of the current TypedPipe. They are mapping operations where this pipe is streamed through one item at a time.
WARNING These behave semantically very differently than cogroup. This is because we handle (K,V) tuples on the left as we see them. The iterable on the right is over all elements with a matching key K, and it may be empty if there are no values for this key K.
Do an inner-join without shuffling this TypedPipe, but replicating argument to all tasks
Do an inner-join without shuffling this TypedPipe, but replicating argument to all tasks
Do an leftjoin without shuffling this TypedPipe, but replicating argument to all tasks
Do an leftjoin without shuffling this TypedPipe, but replicating argument to all tasks
For each element, do a map-side (hash) left join to look up a value
For each element, do a map-side (hash) left join to look up a value
Just keep the keys, or .
Just keep the keys, or ._1 (if this type is a Tuple2)
uses hashJoin but attaches None if thatPipe is empty
uses hashJoin but attaches None if thatPipe is empty
ValuePipe may be empty, so, this attaches it as an Option cross is the same as leftCross(p).
ValuePipe may be empty, so, this attaches it as an Option cross is the same as leftCross(p).collect { case (t, Some(v)) => (t, v) }
limit the output to at most count items.
limit the output to at most count items. useful for debugging, but probably that's about it. The number may be less than count, and not sampled particular method
Transform each element via the function f
Transform each element via the function f
Transform only the values (sometimes requires giving the types due to scala type inference)
Transform only the values (sometimes requires giving the types due to scala type inference)
common pattern of attaching a value and then map
common pattern of attaching a value and then map
Used to force a shuffle into a given size of nodes.
Used to force a shuffle into a given size of nodes. Only use this if your mappers are taking far longer than the time to shuffle.
Build a sketch of this TypedPipe so that you can do a skew-join with another Grouped
Build a sketch of this TypedPipe so that you can do a skew-join with another Grouped
Reasonably common shortcut for cases of associative/commutative reduction returns a typed pipe with only one element.
Reasonably common shortcut for cases of associative/commutative reduction returns a typed pipe with only one element.
Reasonably common shortcut for cases of associative/commutative reduction by Key
Reasonably common shortcut for cases of associative/commutative reduction by Key
This does a sum of values WITHOUT triggering a shuffle.
This does a sum of values WITHOUT triggering a shuffle. the contract is, if followed by a group.sum the result is the same with or without this present, and it never increases the number of items. BUT due to the cost of caching, it might not be faster if there is poor key locality.
It is only useful for expert tuning, and best avoided unless you are struggling with performance problems. If you are not sure you need this, you probably don't.
The main use case is to reduce the values down before a key expansion such as is often done in a data cube.
swap the keys with the values
swap the keys with the values
This actually runs all the pure map functions in one Cascading Each This approach is more efficient than untyped scalding because we don't use TupleConverters/Setters after each map.
This actually runs all the pure map functions in one Cascading Each This approach is more efficient than untyped scalding because we don't use TupleConverters/Setters after each map.
use a TupleUnpacker to flatten U out into a cascading Tuple
use a TupleUnpacker to flatten U out into a cascading Tuple
Just keep the values, or .
Just keep the values, or ._2 (if this type is a Tuple2)
Safely write to a TypedSink[T].
Safely write to a TypedSink[T]. If you want to write to a Source (not a Sink) you need to do something like: toPipe(fieldNames).write(dest)
a pipe equivalent to the current pipe.
This is an instance of a TypedPipe that wraps a cascading Pipe