ReduceOperations

Abstract Value Members

abstract def mapReduceMap[T, X, U](fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(redfn: (X, X) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U]): Self

Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)
Type T is the type of the input field (input to map, T => X) Type X is the intermediate type, which your reduce function operates on (reduce is (X,X) => X) Type U is the final result type, (final map is: X => U)
The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.
Assumed to be a commutative operation. If you don't want that, use .forceToReducers

Concrete Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def aggregate[A, B, C](fieldDef: (Fields, Fields))(ag: Aggregator[A, B, C])(implicit startConv: TupleConverter[A], middleSetter: TupleSetter[B], middleConv: TupleConverter[B], endSetter: TupleSetter[C]): Self

Pretty much a synonym for mapReduceMap with the methods collected into a trait.
def approximateUniqueCount[T](f: (Fields, Fields), errPercent: Double = 1.0)(implicit arg0: (T) ⇒ Array[Byte], arg1: TupleConverter[T]): Self

Approximate number of unique values We use about m = (104/errPercent)^2 bytes of memory per key Uses .toString.getBytes to serialize the data so you MUST ensure that .toString is an equivalance on your counted fields (i.e. x.toString == y.toString if and only if x == y)
Approximate number of unique values We use about m = (104/errPercent)^2 bytes of memory per key Uses .toString.getBytes to serialize the data so you MUST ensure that .toString is an equivalance on your counted fields (i.e. x.toString == y.toString if and only if x == y)
For each key:
```
10% error ~ 256 bytes
5% error ~ 1kB
2% error ~ 4kB
1% error ~ 16kB
0.5% error ~ 64kB
0.25% error ~ 256kB
```
final def asInstanceOf[T0]: T0

Definition Classes
Any
def average(f: Symbol): Self
def average(f: (Fields, Fields)): Self

uses a more stable online algorithm which should be suitable for large numbers of records
uses a more stable online algorithm which should be suitable for large numbers of records
Similar To
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def count[T](fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

This is count with a predicate: only counts the tuples for which fn(tuple) is true
def dot[T](left: Fields, right: Fields, result: Fields)(implicit ttconv: TupleConverter[(T, T)], ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

First do "times" on each pair, then "plus" them all together.
First do "times" on each pair, then "plus" them all together.
Example
```
groupBy('x) { _.dot('y,'z, 'ydotz) }
```
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def forall[T](fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def head(f: Symbol*): Self
def head(fd: (Fields, Fields)): Self

Return the first, useful probably only for sorted case.
def histogram(f: (Fields, Fields), binWidth: Double = 1.0): Self
def hyperLogLog[T](f: (Fields, Fields), errPercent: Double = 1.0)(implicit arg0: (T) ⇒ Array[Byte], arg1: TupleConverter[T]): Self
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def last(f: Symbol*): Self
def last(fd: (Fields, Fields)): Self
def mapList[T, R](fieldDef: (Fields, Fields))(fn: (List[T]) ⇒ R)(implicit conv: TupleConverter[T], setter: TupleSetter[R]): Self

Collect all the values into a List[T] and then operate on that list.
Collect all the values into a List[T] and then operate on that list. This fundamentally uses as much memory as it takes to store the list. This gives you the list in the reverse order it was encounted (it is built as a stack for efficiency reasons). If you care about order, call .reverse in your fn
STRONGLY PREFER TO AVOID THIS. Try reduce or plus and an O(1) memory algorithm.
def mapPlusMap[T, X, U](fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U], sgX: Semigroup[X]): Self
def max(f: Symbol*): Self
def max(fieldDef: (Fields, Fields)): Self
def min(f: Symbol*): Self
def min(fieldDef: (Fields, Fields)): Self
def mkString(fieldDef: Symbol): Self
def mkString(fieldDef: Symbol, sep: String): Self
def mkString(fieldDef: Symbol, start: String, sep: String, end: String): Self

these will only be called if a tuple is not passed, meaning just one column
def mkString(fieldDef: (Fields, Fields)): Self
def mkString(fieldDef: (Fields, Fields), sep: String): Self
def mkString(fieldDef: (Fields, Fields), start: String, sep: String, end: String): Self

Similar to the scala.collection.Iterable.mkString takes the source and destination fieldname, which should be a single field.
Similar to the scala.collection.Iterable.mkString takes the source and destination fieldname, which should be a single field. The result will be start, each item.toString separated by sep, followed by end for convenience there several common variants below
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def pivot(fieldDef: (Fields, Fields), defaultVal: Any = null): Self

Opposite of RichPipe.unpivot.
Opposite of RichPipe.unpivot. See SQL/Excel for more on this function converts a row-wise representation into a column-wise one.
Example
```
pivot(('feature, 'value) -> ('clicks, 'impressions, 'requests))
```
it will find the feature named "clicks", and put the value in the column with the field named clicks.
Absent fields result in null unless a default value is provided. Unnamed output fields are ignored.
Note
Duplicated fields will result in an error.
Hint
if you want more precision, first do a
```
map('value -> value) { x : AnyRef => Option(x) }
```
and you will have non-nulls for all present values, and Nones for values that were present but previously null. All nulls in the final output will be those truly missing. Similarly, if you want to check if there are any items present that shouldn't be:
```
map('feature -> 'feature) { fname : String =>
  if (!goodFeatures(fname)) { throw new Exception("ohnoes") }
  else fname
}
```
def reduce[T](fieldDef: Symbol*)(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]): Self
def reduce[T](fieldDef: (Fields, Fields))(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]): Self

Apply an associative/commutative operation on the left field.
Apply an associative/commutative operation on the left field.
Example
```
reduce(('mass,'allids)->('totalMass, 'idset)) { (left:(Double,Set[Long]),right:(Double,Set[Long])) =>
  (left._1 + right._1, left._2 ++ right._2)
}
```
Equivalent to a mapReduceMap with trivial (identity) map functions.
Assumed to be a commutative operation. If you don't want that, use .forceToReducers
The previous output goes into the reduce function on the left, like foldLeft, so if your operation is faster for the accumulator to be on one side, be aware.
def size(thisF: Fields): Self
def size: Self

How many values are there for this key
def sizeAveStdev(fieldDef: (Fields, Fields)): Self

Compute the count, ave and standard deviation in one pass example: g.sizeAveStdev('x -> ('cntx, 'avex, 'stdevx))
def sortWithTake[T](f: (Fields, Fields), k: Int)(lt: (T, T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

Equivalent to sorting by a comparison function then take-ing k items.
Equivalent to sorting by a comparison function then take-ing k items. This is MUCH more efficient than doing a total sort followed by a take, since these bounded sorts are done on the mapper, so only a sort of size k is needed.
Example
```
sortWithTake( ('clicks, 'tweet) -> 'topClicks, 5) {
  fn : (t0 :(Long,Long), t1:(Long,Long) => t0._1 < t1._1 }
```
topClicks will be a List[(Long,Long)]
def sortedReverseTake[T](f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]): Self

Reverse of above when the implicit ordering makes sense.
def sortedTake[T](f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]): Self

Same as above but useful when the implicit ordering makes sense.
def sum[T](fs: Symbol*)(implicit sg: Semigroup[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

The same as sum(fs -> fs) Assumed to be a commutative operation.
The same as sum(fs -> fs) Assumed to be a commutative operation. If you don't want that, use .forceToReducers
def sum[T](fd: (Fields, Fields))(implicit sg: Semigroup[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

Use Semigroup.plus to compute a sum.
Use Semigroup.plus to compute a sum. Not called sum to avoid conflicting with standard sum Your Semigroup[T] should be associated and commutative, else this doesn't make sense
Assumed to be a commutative operation. If you don't want that, use .forceToReducers
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def times[T](fs: Symbol*)(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

The same as times(fs -> fs)
def times[T](fd: (Fields, Fields))(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

Returns the product of all the items in this grouping
def toList[T](fieldDef: (Fields, Fields))(implicit conv: TupleConverter[T]): Self

Convert a subset of fields into a list of Tuples.
Convert a subset of fields into a list of Tuples. Need to provide the types of the tuple fields.
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package scalding

trait ReduceOperations[+Self <: ReduceOperations[Self]] extends Serializable

Abstract Value Members

abstract def mapReduceMap[T, X, U](fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(redfn: (X, X) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U]): Self

Concrete Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def aggregate[A, B, C](fieldDef: (Fields, Fields))(ag: Aggregator[A, B, C])(implicit startConv: TupleConverter[A], middleSetter: TupleSetter[B], middleConv: TupleConverter[B], endSetter: TupleSetter[C]): Self

def approximateUniqueCount[T](f: (Fields, Fields), errPercent: Double = 1.0)(implicit arg0: (T) ⇒ Array[Byte], arg1: TupleConverter[T]): Self

final def asInstanceOf[T0]: T0

def average(f: Symbol): Self

def average(f: (Fields, Fields)): Self

Similar To

def clone(): AnyRef

def count[T](fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

def dot[T](left: Fields, right: Fields, result: Fields)(implicit ttconv: TupleConverter[(T, T)], ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

Example

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def forall[T](fieldDef: (Fields, Fields))(fn: (T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

final def getClass(): Class[_]

def hashCode(): Int

def head(f: Symbol*): Self

def head(fd: (Fields, Fields)): Self

def histogram(f: (Fields, Fields), binWidth: Double = 1.0): Self

def hyperLogLog[T](f: (Fields, Fields), errPercent: Double = 1.0)(implicit arg0: (T) ⇒ Array[Byte], arg1: TupleConverter[T]): Self

final def isInstanceOf[T0]: Boolean

def last(f: Symbol*): Self

def last(fd: (Fields, Fields)): Self

def mapList[T, R](fieldDef: (Fields, Fields))(fn: (List[T]) ⇒ R)(implicit conv: TupleConverter[T], setter: TupleSetter[R]): Self

def mapPlusMap[T, X, U](fieldDef: (Fields, Fields))(mapfn: (T) ⇒ X)(mapfn2: (X) ⇒ U)(implicit startConv: TupleConverter[T], middleSetter: TupleSetter[X], middleConv: TupleConverter[X], endSetter: TupleSetter[U], sgX: Semigroup[X]): Self

def max(f: Symbol*): Self

def max(fieldDef: (Fields, Fields)): Self

def min(f: Symbol*): Self

def min(fieldDef: (Fields, Fields)): Self

def mkString(fieldDef: Symbol): Self

def mkString(fieldDef: Symbol, sep: String): Self

def mkString(fieldDef: Symbol, start: String, sep: String, end: String): Self

def mkString(fieldDef: (Fields, Fields)): Self

def mkString(fieldDef: (Fields, Fields), sep: String): Self

def mkString(fieldDef: (Fields, Fields), start: String, sep: String, end: String): Self

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def pivot(fieldDef: (Fields, Fields), defaultVal: Any = null): Self

Example

Note

Hint

def reduce[T](fieldDef: Symbol*)(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]): Self

def reduce[T](fieldDef: (Fields, Fields))(fn: (T, T) ⇒ T)(implicit setter: TupleSetter[T], conv: TupleConverter[T]): Self

Example

def size(thisF: Fields): Self

def size: Self

def sizeAveStdev(fieldDef: (Fields, Fields)): Self

def sortWithTake[T](f: (Fields, Fields), k: Int)(lt: (T, T) ⇒ Boolean)(implicit arg0: TupleConverter[T]): Self

Example

def sortedReverseTake[T](f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]): Self

def sortedTake[T](f: (Fields, Fields), k: Int)(implicit conv: TupleConverter[T], ord: Ordering[T]): Self

def sum[T](fs: Symbol*)(implicit sg: Semigroup[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

def sum[T](fd: (Fields, Fields))(implicit sg: Semigroup[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

final def synchronized[T0](arg0: ⇒ T0): T0

def times[T](fs: Symbol*)(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

def times[T](fd: (Fields, Fields))(implicit ring: Ring[T], tconv: TupleConverter[T], tset: TupleSetter[T]): Self

def toList[T](fieldDef: (Fields, Fields))(implicit conv: TupleConverter[T]): Self

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped