Builder classes used internally to implement coGroups (joins).
Csv value source separated by commas and quotes wrapping all fields
Sets up an implicit dateRange to use in your sources and an implicit timezone.
Mix this in for delimited schemes such as TSV or one-separated values By default, TSV is given
This is a base class for File-based sources
This handles the mapReduceMap work on the map-side of the operation.
Implements reductions on top of a simple abstraction for the Fields-API We use the f-bounded polymorphism trick to return the type called Self in each operation.
This controls the sequence of reductions that happen inside a particular grouping operation.
Represents a grouping which is the transition from map to reduce phase in hadoop.
thrown when validateTaps fails
Allows working with an iterable object defined in the job (on the submitter) to be used within a Job as you would a Pipe/RichPipe
This class is used to construct unit tests for scalding jobs.
This Source writes out the TupleEntry as a simple JSON object, using the field names as keys and the string representation of the values.
Represents sharded lists of items of type T
MapReduceMapBy Class
This handles the mapReduceMap work on the map-side of the operation.
Usually as soon as we open a source, we read and do some mapping operation on a single column or set of columns.
An implementation of map-side combining which is appropriate for associative and commutative functions If a cacheSize is given, it is used, else we query the config for cascading.
There are three ways to run jobs sourceStrictness is set to true
Delimited files source allowing to override separator and quotation characters and header configuration
A tap that output nothing.
This just blindly uses the first public constructor with the same arity as the fields size
One separated value (commonly used by Pig)
Implements reductions on top of a simple abstraction for the Fields-API This is for associative and commutive operations (particularly Monoids play a big role here)
Packs a tuple into any object with set methods, e.
Scala 2.
Represents a strategy for replicating rows when performing skewed joins.
See https://github.
See https://github.
Every source must have a correct toString method.
A simple trait for releasable resource.
Implements reductions on top of a simple abstraction for the Fields-API We use the f-bounded polymorphism trick to return the type called Self in each operation.
Ensures that a _SUCCESS file is present in the Source path.
Memory only testing for unit tests
The fields here are ('offset, 'line)
This will automatically produce a globbed version of the given path.
Tab separated value source
Mixed in to both TupleConverter and TupleSetter to improve arity safety of cascading jobs before we run anything on Hadoop.
Represents a phase in a distributed computation on an input data source Wraps a cascading Pipe object, and holds the transformation done up until that point
Provide handlers and mapping for exceptions
This object has all the implicit functions and values that are used to make the scalding DSL.
TODO: at the next binary incompatible version remove the AbstractFunction2/scala.
A source outputs nothing.
A helper for working with class reflection.
Provide apply method for creating XHandlers with default or custom settings and contain messages and mapping
implicits for the type-safe DSL import TDsl.
Base class for classes which pack a Tuple into a serializable object.
Base class for objects which unpack an object into a tuple.
factory methods for TypedPipe
Allows you to set the types, prefer this: If T is a subclass of Product, we assume it is a tuple.
(Since version 0.8.3) Using Ordering.fromLessThan, duh..