



package scalding

Linear Supertypes
AnyRef, Any
  1. Alphabetic
  2. By inheritance
  1. scalding
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
  1. Public
  2. All

Type Members

  1. sealed abstract class AccessMode extends AnyRef

  2. class AdaptiveMapsideCache[K, V] extends MapsideCache[K, V]

  3. trait ArgHelper extends AnyRef

  4. trait BaseNullSource extends Source

  5. case class BooleanArg(key: String, description: String) extends DescribedArg with Product with Serializable

  6. class BufferOp[I, T, X] extends BaseOperation[Any] with Buffer[Any] with ScaldingPrepare[Any]

  7. abstract class CascadeJob extends Job

  8. class CascadeTest extends JobTest

  9. trait CascadingLocal extends Mode

  10. trait CaseClassPackers extends LowPriorityTuplePackers

  11. class CleanupIdentityFunction extends BaseOperation[Any] with Function[Any] with ScaldingPrepare[Any]

  12. class CoGroupBuilder extends GroupBuilder


    Builder classes used internally to implement coGroups (joins).

    Builder classes used internally to implement coGroups (joins). Can also be used for more generalized joins, e.g., star joins.

  13. class CollectFunction[S, T] extends BaseOperation[Any] with Function[Any] with ScaldingPrepare[Any]

  14. trait Config extends Serializable


    This is a wrapper class on top of Map[String, String]

  15. trait CounterVerification extends Job


    Allows custom counter verification logic when the job completes.

  16. case class Csv(p: String, separator: String = ",", fields: Fields = Fields.ALL, skipHeader: Boolean = false, writeHeader: Boolean = false, quote: String = "\"", sinkMode: SinkMode = SinkMode.REPLACE) extends FixedPathSource with DelimitedScheme with Product with Serializable


    Csv value source separated by commas and quotes wrapping all fields

  17. trait DefaultDateRangeJob extends Job


    Sets up an implicit dateRange to use in your sources and an implicit timezone.

    Sets up an implicit dateRange to use in your sources and an implicit timezone. Example args: --date 2011-10-02 2011-10-04 --tz UTC If no timezone is given, Pacific is assumed.

  18. trait DelimitedScheme extends SchemedSource


    Mix this in for delimited schemes such as TSV or one-separated values By default, TSV is given

  19. sealed trait DescribedArg extends AnyRef

  20. class DescriptionValidationException extends RuntimeException

  21. sealed trait Execution[+T] extends Serializable


    Execution[T] represents and computation that can be run and will produce a value T and keep track of counters incremented inside of TypedPipes using a Stat.

    Execution[T] represents and computation that can be run and will produce a value T and keep track of counters incremented inside of TypedPipes using a Stat.

    Execution[T] is the recommended way to compose multistep computations that involve branching (if/then), intermediate calls to remote services, file operations, or looping (e.g. testing for convergence).

    Library functions are encouraged to implement functions from TypedPipes or ValuePipes to Execution[R] for some result R. Refrain from calling run in library code. Let the caller of your library call run.

    Note this is a Monad, meaning flatMap composes in series as you expect. It is also an applicative functor, which means zip (called join in some libraries) composes two Executions is parallel. Prefer zip to flatMap if you want to run two Executions in parallel.

  22. trait ExecutionApp extends Serializable

  23. trait ExecutionContext extends AnyRef

  24. trait ExecutionCounters extends AnyRef


    This represents the counters portion of the JobStats that are returned.

    This represents the counters portion of the JobStats that are returned. Counters are just a vector of longs with counter name, group keys.

  25. abstract class ExecutionJob[+T] extends Job


    This is a simple job that allows you to launch Execution[T] instances using scalding.Tool and scald.rb.

    This is a simple job that allows you to launch Execution[T] instances using scalding.Tool and scald.rb. You cannot print the graph.

  26. sealed trait Field[T] extends Serializable

  27. trait FieldConversions extends LowPriorityFieldConversions

  28. abstract class FileSource extends SchemedSource with LocalSourceOverride with HfsTapProvider


    This is a base class for File-based sources

  29. class FilterFunction[T] extends BaseOperation[Any] with Filter[Any] with ScaldingPrepare[Any]

  30. abstract class FixedPathSource extends FileSource

  31. class FlatMapFunction[S, T] extends BaseOperation[Any] with Function[Any] with ScaldingPrepare[Any]

  32. case class FlowState(sourceMap: Map[String, Source] = Map.empty, flowConfigUpdates: Set[(String, String)] = Set()) extends Product with Serializable


    Immutable state that we attach to the Flow using the FlowStateMap

  33. class FoldAggregator[T, X] extends BaseOperation[X] with Aggregator[X] with ScaldingPrepare[X]

  34. abstract class FoldFunctor[X] extends Functor


    This handles the mapReduceMap work on the map-side of the operation.

    This handles the mapReduceMap work on the map-side of the operation. The code below attempts to be optimal with respect to memory allocations and performance, not functional style purity.

  35. trait FoldOperations[+Self <: FoldOperations[Self]] extends ReduceOperations[Self] with Sortable[Self]


    Implements reductions on top of a simple abstraction for the Fields-API We use the f-bounded polymorphism trick to return the type called Self in each operation.

  36. class FutureCache[-K, V] extends AnyRef


    This is a map for values that are produced in futures as is common in Execution

  37. trait GeneratedTupleAdders extends AnyRef

  38. trait GeneratedTupleConverters extends LowPriorityTupleConverters

  39. trait GeneratedTupleSetters extends LowPriorityTupleSetters

  40. class GroupBuilder extends FoldOperations[GroupBuilder] with StreamOperations[GroupBuilder]


    This controls the sequence of reductions that happen inside a particular grouping operation.

    This controls the sequence of reductions that happen inside a particular grouping operation. Not all elements can be combined, for instance, a scanLeft/foldLeft generally requires a sorting but such sorts are (at least for now) incompatible with doing a combine which includes some map-side reductions.

  41. type Grouped[K, +V] = scalding.typed.Grouped[K, V]

  42. case class HadoopArgs(toArray: Array[String]) extends Product with Serializable

  43. trait HadoopMode extends Mode

  44. case class HadoopTest(conf: Configuration, buffers: (Source) ⇒ Option[Buffer[Tuple]]) extends HadoopMode with TestMode with Product with Serializable

  45. case class Hdfs(strict: Boolean, conf: Configuration) extends HadoopMode with Product with Serializable

  46. class HelpException extends RuntimeException

  47. trait HfsConfPropertySetter extends HfsTapProvider

  48. trait HfsTapProvider extends AnyRef

  49. case class IntField[T](id: Integer)(implicit ord: Ordering[T], mf: Option[Manifest[T]]) extends Field[T] with Product with Serializable

  50. class IntegralComparator extends Comparator[AnyRef] with Hasher[AnyRef] with Serializable

  51. class InvalidJoinModeException extends Exception

  52. class InvalidSourceException extends RuntimeException


    thrown when validateTaps fails

  53. class InvalidSourceTap extends SourceTap[JobConf, RecordReader[_, _]]


    InvalidSourceTap used in createTap method when we want to defer the failures to validateTaps method.

    InvalidSourceTap used in createTap method when we want to defer the failures to validateTaps method.

    This is used because for Job classes, createTap method on sources is called when the class is initialized. In most cases though, we want any exceptions to be thrown by validateTaps method, which is called subsequently during flow planning.

    hdfsPaths represents user-supplied list that was detected as not containing any valid paths.

  54. case class IterableSource[+T](iter: Iterable[T], inFields: Fields = Fields.NONE)(implicit set: TupleSetter[T], conv: TupleConverter[T]) extends Source with Mappable[T] with Product with Serializable


    Allows working with an iterable object defined in the job (on the submitter) to be used within a Job as you would a Pipe/RichPipe

    Allows working with an iterable object defined in the job (on the submitter) to be used within a Job as you would a Pipe/RichPipe

    These lists should probably be very tiny by Hadoop standards. If they are getting large, you should probably dump them to HDFS and use the normal mechanisms to address the data (a FileSource).

  55. class Job extends FieldConversions with Serializable


    Job is a convenience class to make using Scalding easier.

    Job is a convenience class to make using Scalding easier. Subclasses of Job automatically have a number of nice implicits to enable more concise syntax, including: conversion from Pipe, Source or Iterable to RichPipe conversion from Source or Iterable to Pipe conversion to collections or Tuple[1-22] to cascading.tuple.Fields

    Additionally, the job provides an implicit Mode and FlowDef so that functions that register starts or ends of a flow graph, specifically anything that reads or writes data on Hadoop, has the needed implicits available.

    If you want to write code outside of a Job, you will want to either:

    make all methods that may read or write data accept implicit FlowDef and Mode parameters.


    write code that rather than returning values, it returns a (FlowDef, Mode) => T, these functions can be combined Monadically using algebird.monad.Reader.

  56. case class JobStats(toMap: Map[String, Any]) extends Product with Serializable

  57. class JobTest extends AnyRef


    This class is used to construct unit tests for scalding jobs.

    This class is used to construct unit tests for scalding jobs. You should not use it unless you are writing tests. For examples of how to do that, see the tests included in the main scalding repository:

  58. trait JoinAlgorithms extends AnyRef

  59. sealed abstract class JoinMode extends AnyRef

  60. type KeyedList[K, +V] = scalding.typed.KeyedList[K, V]

  61. case class ListArg(key: String, description: String) extends DescribedArg with Product with Serializable

  62. case class Local(strictSources: Boolean) extends CascadingLocal with Product with Serializable

  63. trait LocalSourceOverride extends SchemedSource


    A trait which provides a method to create a local tap.

  64. trait LocalTapSource extends SchemedSource with LocalSourceOverride


    Use this class to add support for Cascading local mode via the Hadoop tap.

    Use this class to add support for Cascading local mode via the Hadoop tap. Put another way, this runs a Hadoop tap outside of Hadoop in the Cascading local mode

  65. trait LowPriorityFieldConversions extends AnyRef

  66. trait LowPriorityTupleConverters extends Serializable

  67. trait LowPriorityTupleGetter extends Serializable

  68. trait LowPriorityTuplePackers extends Serializable

  69. trait LowPriorityTupleSetters extends Serializable

  70. trait LowPriorityTupleUnpackers extends AnyRef

  71. class MRMAggregator[T, X, U] extends BaseOperation[Tuple] with Aggregator[Tuple] with ScaldingPrepare[Tuple]

  72. class MRMBy[T, X, U] extends AggregateBy


    MapReduceMapBy Class

  73. class MRMFunctor[T, X] extends FoldFunctor[X]


    This handles the mapReduceMap work on the map-side of the operation.

    This handles the mapReduceMap work on the map-side of the operation. The code below attempts to be optimal with respect to memory allocations and performance, not functional style purity.

  74. class MapFunction[S, T] extends BaseOperation[Any] with Function[Any] with ScaldingPrepare[Any]

  75. trait Mappable[+T] extends Source with TypedSource[T]


    Usually as soon as we open a source, we read and do some mapping operation on a single column or set of columns.

    Usually as soon as we open a source, we read and do some mapping operation on a single column or set of columns. T is the type of the single column. If doing multiple columns T will be a TupleN representing the types, e.g. (Int,Long,String)

    Prefer to use TypedSource unless you are working with the fields API

    NOTE: If we don't make this extend Source, established implicits are ambiguous when TDsl is in scope.

  76. trait Mappable1[A] extends Source with Mappable[(A)]

  77. trait Mappable10[A, B, C, D, E, F, G, H, I, J] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J)]

  78. trait Mappable11[A, B, C, D, E, F, G, H, I, J, K] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K)]

  79. trait Mappable12[A, B, C, D, E, F, G, H, I, J, K, L] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L)]

  80. trait Mappable13[A, B, C, D, E, F, G, H, I, J, K, L, M] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M)]

  81. trait Mappable14[A, B, C, D, E, F, G, H, I, J, K, L, M, N] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N)]

  82. trait Mappable15[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O)]

  83. trait Mappable16[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P)]

  84. trait Mappable17[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q)]

  85. trait Mappable18[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R)]

  86. trait Mappable19[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S)]

  87. trait Mappable2[A, B] extends Source with Mappable[(A, B)]

  88. trait Mappable20[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T)]

  89. trait Mappable21[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U)]

  90. trait Mappable22[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V] extends Source with Mappable[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V)]

  91. trait Mappable3[A, B, C] extends Source with Mappable[(A, B, C)]

  92. trait Mappable4[A, B, C, D] extends Source with Mappable[(A, B, C, D)]

  93. trait Mappable5[A, B, C, D, E] extends Source with Mappable[(A, B, C, D, E)]

  94. trait Mappable6[A, B, C, D, E, F] extends Source with Mappable[(A, B, C, D, E, F)]

  95. trait Mappable7[A, B, C, D, E, F, G] extends Source with Mappable[(A, B, C, D, E, F, G)]

  96. trait Mappable8[A, B, C, D, E, F, G, H] extends Source with Mappable[(A, B, C, D, E, F, G, H)]

  97. trait Mappable9[A, B, C, D, E, F, G, H, I] extends Source with Mappable[(A, B, C, D, E, F, G, H, I)]

  98. sealed trait MapsideCache[K, V] extends AnyRef

  99. class MapsideReduce[V] extends BaseOperation[MapsideCache[Tuple, V]] with Function[MapsideCache[Tuple, V]] with ScaldingPrepare[MapsideCache[Tuple, V]]

  100. class MemoryTap[In, Out] extends Tap[Properties, In, Out]

  101. class MemoryTupleEntryCollector extends TupleEntryCollector

  102. trait Mode extends Serializable

  103. case class ModeException(message: String) extends RuntimeException with Product with Serializable

  104. case class ModeLoadException(message: String, origin: ClassNotFoundException) extends RuntimeException with Product with Serializable

  105. abstract class MostRecentGoodSource extends TimePathedSource

  106. case class MultipleDelimitedFiles(f: Fields, separator: String, quote: String, skipHeader: Boolean, writeHeader: Boolean, p: String*) extends FixedPathSource with DelimitedScheme with Product with Serializable


    Delimited files source allowing to override separator and quotation characters and header configuration

  107. case class MultipleSequenceFiles(p: String*) extends FixedPathSource with SequenceFileScheme with LocalTapSource with Product with Serializable

  108. case class MultipleTextLineFiles(p: String*) extends FixedPathSource with TextLineScheme with Product with Serializable

  109. case class MultipleTsvFiles(p: Seq[String], fields: Fields = Fields.ALL, skipHeader: Boolean = false, writeHeader: Boolean = false) extends FixedPathSource with DelimitedScheme with Product with Serializable


    Allows the use of multiple Tsv input paths.

    Allows the use of multiple Tsv input paths. The Tsv files will be process through your flow as if they are a single pipe. Tsv files must have the same schema. For more details on how multiple files are handled check the cascading docs.

  110. case class MultipleWritableSequenceFiles[K <: Writable, V <: Writable](p: Seq[String], f: Fields)(implicit evidence$7: Manifest[K], evidence$8: Manifest[V]) extends FixedPathSource with WritableSequenceFileScheme with LocalTapSource with TypedSource[(K, V)] with Product with Serializable


    This is only a TypedSource as sinking into multiple directories is not well defined

  111. class NamedPoolThreadFactory extends ThreadFactory

  112. case class NonHadoopArgs(toArray: Array[String]) extends Product with Serializable

  113. class NullTap[Config, Input, Output, SourceContext, SinkContext] extends SinkTap[Config, Output]


    A tap that output nothing.

    A tap that output nothing. It is used to drive execution of a task for side effect only. This can be used to drive a pipe without actually writing to HDFS.

  114. class OffsetTextLine extends FixedPathSource with Mappable[(Long, String)] with TextSourceScheme


    Alternate typed TextLine source that keeps both 'offset and 'line fields.

  115. case class OptionalArg(key: String, description: String) extends DescribedArg with Product with Serializable

  116. case class OptionalSource[T](src: Mappable[T]) extends Source with Mappable[T] with Product with Serializable

  117. class OrderedConstructorConverter[T] extends TupleConverter[T]

  118. class OrderedTuplePacker[T] extends TuplePacker[T]


    This just blindly uses the first public constructor with the same arity as the fields size

  119. case class Osv(p: String, f: Fields = Fields.ALL, sinkMode: SinkMode = SinkMode.REPLACE) extends FixedPathSource with DelimitedScheme with Product with Serializable


    One separated value (commonly used by Pig)

  120. abstract class PartitionSource extends SchemedSource with HfsTapProvider


    This is a base class for partition-based output sources

  121. case class PartitionedSequenceFile(basePath: String, partition: Partition, sequenceFields: Fields, sinkMode: SinkMode) extends PartitionSource with SequenceFileScheme with Product with Serializable


    An implementation of SequenceFile output, split over a partition tap.

    An implementation of SequenceFile output, split over a partition tap.


    The root path for the output.


    The partitioning strategy to use.


    The set of fields to use for the sequence file.


    How to handle conflicts with existing output.

  122. case class PartitionedTsv(basePath: String, partition: Partition, writeHeader: Boolean, tsvFields: Fields, sinkMode: SinkMode) extends PartitionSource with DelimitedScheme with Product with Serializable


    An implementation of TSV output, split over a partition tap.

    An implementation of TSV output, split over a partition tap.


    The root path for the output.


    The partitioning strategy to use.


    Flag to indicate that the header should be written to the file.


    How to handle conflicts with existing output.

  123. case class PipeDebug(output: Output = Output.STDERR, prefix: String = null, printFieldsEvery: Option[Int] = None, printTuplesEvery: Int = 1) extends Product with Serializable


    This is a builder for Cascading's Debug object.

    This is a builder for Cascading's Debug object. The default instance is the same default as cascading's new Debug() This is based on work by:

  124. trait ReduceOperations[+Self <: ReduceOperations[Self]] extends Serializable


    Implements reductions on top of a simple abstraction for the Fields-API This is for associative and commutive operations (particularly Monoids and Semigroups play a big role here)

    Implements reductions on top of a simple abstraction for the Fields-API This is for associative and commutive operations (particularly Monoids and Semigroups play a big role here)

    We use the f-bounded polymorphism trick to return the type called Self in each operation.

  125. class ReflectionSetter[T] extends TupleSetter[T]

  126. class ReflectionTupleConverter[T] extends TupleConverter[T]

  127. class ReflectionTuplePacker[T] extends TuplePacker[T]


    Packs a tuple into any object with set methods, e.g.

    Packs a tuple into any object with set methods, e.g. thrift or proto objects. TODO: verify that protobuf setters for field camel_name are of the form setCamelName. In that case this code works for proto.

  128. class ReflectionTupleUnpacker[T] extends TupleUnpacker[T]

  129. case class RequiredArg(key: String, description: String) extends DescribedArg with Product with Serializable

  130. case class RichFields(toFieldList: List[Field[_]]) extends Fields with Product with Serializable

  131. class RichFlowDef extends AnyRef


    This is an enrichment-pattern class for cascading.flow.FlowDef.

    This is an enrichment-pattern class for cascading.flow.FlowDef. The rule is to never use this class directly in input or return types, but only to add methods to FlowDef.

  132. class RichPathFilter extends AnyRef

  133. class RichPipe extends Serializable with JoinAlgorithms


    This is an enrichment-pattern class for cascading.pipe.Pipe.

    This is an enrichment-pattern class for cascading.pipe.Pipe. The rule is to never use this class directly in input or return types, but only to add methods to Pipe.

  134. class SampleWithReplacement extends BaseOperation[Poisson] with Function[Poisson] with ScaldingPrepare[Poisson]

  135. class ScaldingMultiSourceTap extends MultiSourceTap[Tap[JobConf, RecordReader[_, _], OutputCollector[_, _]], JobConf, RecordReader[_, _]]

  136. trait ScaldingPrepare[C] extends Operation[C]

  137. class ScanLeftIterator[T, U] extends Iterator[U] with Serializable


    Scala 2.8 Iterators don't support scanLeft so we have to reimplement The Scala 2.9 implementation creates an off-by-one bug with the unused fields in the Fields API

  138. abstract class SchemedSource extends Source


    A base class for sources that take a scheme trait.

  139. class ScriptJob extends Job

  140. case class SequenceFile(p: String, f: Fields = Fields.ALL, sinkMode: SinkMode = SinkMode.REPLACE) extends FixedPathSource with SequenceFileScheme with LocalTapSource with Product with Serializable

  141. trait SequenceFileScheme extends SchemedSource

  142. abstract class SideEffectBaseOperation[C] extends BaseOperation[C] with ScaldingPrepare[C]

  143. class SideEffectBufferOp[I, T, C, X] extends SideEffectBaseOperation[C] with Buffer[C]

  144. class SideEffectFlatMapFunction[S, C, T] extends SideEffectBaseOperation[C] with Function[C]

  145. class SideEffectMapFunction[S, C, T] extends SideEffectBaseOperation[C] with Function[C]

  146. trait SingleMappable[T] extends Source with Mappable[T]


    Mappable extension that defines the proper converter implementation for a Mappable with a single item.

  147. sealed abstract class SkewReplication extends AnyRef


    Represents a strategy for replicating rows when performing skewed joins.

  148. case class SkewReplicationA(replicationFactor: Int = 1) extends SkewReplication with Product with Serializable



  149. case class SkewReplicationB(maxKeysInMemory: Int = 1E6.toInt, maxReducerOutput: Int = 1E7.toInt) extends SkewReplication with Product with Serializable



  150. trait Sortable[+Self] extends AnyRef

  151. abstract class Source extends Serializable


    Every source must have a correct toString method.

    Every source must have a correct toString method. If you use case classes for instances of sources, you will get this for free. This is one of the several reasons we recommend using cases classes is needed if the Source is going to have any methods attached that run on mappers or reducers, which will happen if you implement transformForRead or transformForWrite.

  152. trait Stat extends Serializable

  153. case class StatKey(counter: String, group: String) extends Serializable with Product with Serializable

  154. trait Stateful extends AnyRef


    A simple trait for releasable resource.

    A simple trait for releasable resource. Provides noop implementation.

  155. class StatsFlowListener extends FlowListener


    FlowListener that checks counter values against a function.

  156. trait StreamOperations[+Self <: StreamOperations[Self]] extends Sortable[Self] with Serializable


    Implements reductions on top of a simple abstraction for the Fields-API We use the f-bounded polymorphism trick to return the type called Self in each operation.

  157. case class StringField[T](id: String)(implicit ord: Ordering[T], mf: Option[Manifest[T]]) extends Field[T] with Product with Serializable

  158. trait SuccessFileSource extends FileSource


    Ensures that a _SUCCESS file is present in every directory included by a glob, as well as the requirements of FileSource.pathIsGood.

    Ensures that a _SUCCESS file is present in every directory included by a glob, as well as the requirements of FileSource.pathIsGood. The set of directories to check for _SUCCESS is determined by examining the list of all paths returned by globPaths and adding parent directories of the non-hidden files encountered. pathIsGood should still be considered just a best-effort test. As an illustration the following layout with an in-flight job is accepted for the glob dir*/*:


    Similarly if dir1 is physically empty pathIsGood is still true for dir*/* above

    On the other hand it will reject an empty output directory of a finished job:


  159. class SummingMapsideCache[K, V] extends MapsideCache[K, V]

  160. abstract class TemplateSource extends SchemedSource with HfsTapProvider


    This is a base class for template based output sources

  161. case class TemplatedSequenceFile(basePath: String, template: String, sequenceFields: Fields = Fields.ALL, pathFields: Fields = Fields.ALL, sinkMode: SinkMode = SinkMode.REPLACE) extends TemplateSource with SequenceFileScheme with Product with Serializable


    An implementation of SequenceFile output, split over a template tap.

    An implementation of SequenceFile output, split over a template tap.


    The root path for the output.


    The java formatter style string to use as the template. e.g. %s/%s.


    The set of fields to use for the sequence file.


    The set of fields to apply to the path.


    How to handle conflicts with existing output.

  162. case class TemplatedTsv(basePath: String, template: String, pathFields: Fields = Fields.ALL, writeHeader: Boolean = false, sinkMode: SinkMode = SinkMode.REPLACE, fields: Fields = Fields.ALL) extends TemplateSource with DelimitedScheme with Product with Serializable


    An implementation of TSV output, split over a template tap.

    An implementation of TSV output, split over a template tap.


    The root path for the output.


    The java formatter style string to use as the template. e.g. %s/%s.


    The set of fields to apply to the path.


    Flag to indicate that the header should be written to the file.


    How to handle conflicts with existing output.


    The set of fields to apply to the output.

  163. case class Test(buffers: (Source) ⇒ Option[Buffer[Tuple]]) extends TestMode with CascadingLocal with Product with Serializable


    Memory only testing for unit tests

  164. trait TestMode extends Mode

  165. class TestTapFactory extends Serializable

  166. class TextLine extends FixedPathSource with TextLineScheme

  167. trait TextLineScheme extends SchemedSource with TextSourceScheme with SingleMappable[String]

  168. trait TextSourceScheme extends SchemedSource


    The fields here are ('offset, 'line)

  169. abstract class TimePathedSource extends TimeSeqPathedSource


    This will automatically produce a globbed version of the given path.

    This will automatically produce a globbed version of the given path. THIS MEANS YOU MUST END WITH A / followed by * to match a file For writing, we write to the directory specified by the END time.

  170. abstract class TimeSeqPathedSource extends FileSource

  171. class Tool extends Configured with org.apache.hadoop.util.Tool

  172. case class Tsv(p: String, fields: Fields = Fields.ALL, skipHeader: Boolean = false, writeHeader: Boolean = false, sinkMode: SinkMode = SinkMode.REPLACE) extends FixedPathSource with DelimitedScheme with Product with Serializable


    Tab separated value source

  173. trait TupleArity extends AnyRef


    Mixed in to both TupleConverter and TupleSetter to improve arity safety of cascading jobs before we run anything on Hadoop.

  174. trait TupleConverter[T] extends Serializable with TupleArity


    Typeclass to represent converting from cascading TupleEntry to some type T.

    Typeclass to represent converting from cascading TupleEntry to some type T. The most common application is to convert to scala Tuple objects for use with the Fields API. The typed API internally manually handles its mapping to cascading Tuples, so the implicit resolution mechanism is not used.

    WARNING: if you are seeing issues with the singleConverter being found when you expect something else, you may have an issue where the enclosing scope needs to take an implicit TupleConverter of the correct type.

    Unfortunately, the semantics we want (prefer to flatten tuples, but otherwise put everything into one postition in the tuple) are somewhat difficlut to encode in scala.

  175. trait TupleGetter[T] extends Serializable


    Typeclass roughly equivalent to a Lens, which allows getting items out of a tuple.

    Typeclass roughly equivalent to a Lens, which allows getting items out of a tuple. This is useful because cascading has type coercion (string to int, for instance) that users expect in the fields API. This code is not used in the typesafe API, which does not allow suc silent coercion. See the generated TupleConverters for an example of where this is used

  176. trait TuplePacker[T] extends Serializable


    Typeclass for packing a cascading Tuple into some type T, this is used to put fields of a cascading tuple into Thrift, Protobuf, or case classes, for instance, but you can add your own instances to control how this is done.

  177. trait TupleSetter[T] extends Serializable with TupleArity


    Typeclass to represent converting back to (setting into) a cascading Tuple This looks like it can be contravariant, but it can't because of our approach of falling back to the singleSetter, you really want the most specific setter you can get.

    Typeclass to represent converting back to (setting into) a cascading Tuple This looks like it can be contravariant, but it can't because of our approach of falling back to the singleSetter, you really want the most specific setter you can get. Put more directly: a TupleSetter[Any] is not just as good as TupleSetter[(Int, Int)] from the scalding DSL's point of view. The latter will flatten the (Int, Int), but the former won't.

  178. trait TupleUnpacker[T] extends Serializable

  179. class TupleUnpackerException extends Exception

  180. trait TypeDescriptor[T] extends Serializable


    This class is used to bind together a Fields instance which may contain a type array via getTypes, a TupleConverter and TupleSetter, which are inverses of one another.

    This class is used to bind together a Fields instance which may contain a type array via getTypes, a TupleConverter and TupleSetter, which are inverses of one another. Note the size of the Fields object and the arity values for the converter and setter are all the same. Note in the com.twitter.scalding.macros package there are macros to generate this for case classes, which may be very convenient.

    @implicitNotFound( ... )
  181. class TypedBufferOp[K, V, U] extends BaseOperation[Any] with Buffer[Any] with ScaldingPrepare[Any]


    In the typed API every reduce operation is handled by this Buffer

  182. class TypedMapsideReduce[K, V] extends BaseOperation[MapsideCache[K, V]] with Function[MapsideCache[K, V]] with ScaldingPrepare[MapsideCache[K, V]]

  183. type TypedPipe[+T] = scalding.typed.TypedPipe[T]

  184. trait TypedSeperatedFile extends Serializable


    Trait to assist with creating objects such as TypedTsv to read from separated files.

    Trait to assist with creating objects such as TypedTsv to read from separated files. Override separator, skipHeader, writeHeader as needed.

  185. type TypedSink[-T] = scalding.typed.TypedSink[T]

  186. trait TypedSink1[A] extends TypedSink[(A)]

  187. trait TypedSink10[A, B, C, D, E, F, G, H, I, J] extends TypedSink[(A, B, C, D, E, F, G, H, I, J)]

  188. trait TypedSink11[A, B, C, D, E, F, G, H, I, J, K] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K)]

  189. trait TypedSink12[A, B, C, D, E, F, G, H, I, J, K, L] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L)]

  190. trait TypedSink13[A, B, C, D, E, F, G, H, I, J, K, L, M] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M)]

  191. trait TypedSink14[A, B, C, D, E, F, G, H, I, J, K, L, M, N] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N)]

  192. trait TypedSink15[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O)]

  193. trait TypedSink16[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P)]

  194. trait TypedSink17[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q)]

  195. trait TypedSink18[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R)]

  196. trait TypedSink19[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S)]

  197. trait TypedSink2[A, B] extends TypedSink[(A, B)]

  198. trait TypedSink20[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T)]

  199. trait TypedSink21[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U)]

  200. trait TypedSink22[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V] extends TypedSink[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V)]

  201. trait TypedSink3[A, B, C] extends TypedSink[(A, B, C)]

  202. trait TypedSink4[A, B, C, D] extends TypedSink[(A, B, C, D)]

  203. trait TypedSink5[A, B, C, D, E] extends TypedSink[(A, B, C, D, E)]

  204. trait TypedSink6[A, B, C, D, E, F] extends TypedSink[(A, B, C, D, E, F)]

  205. trait TypedSink7[A, B, C, D, E, F, G] extends TypedSink[(A, B, C, D, E, F, G)]

  206. trait TypedSink8[A, B, C, D, E, F, G, H] extends TypedSink[(A, B, C, D, E, F, G, H)]

  207. trait TypedSink9[A, B, C, D, E, F, G, H, I] extends TypedSink[(A, B, C, D, E, F, G, H, I)]

  208. type TypedSource[+T] = scalding.typed.TypedSource[T]

  209. trait TypedSource1[A] extends TypedSource[(A)]

  210. trait TypedSource10[A, B, C, D, E, F, G, H, I, J] extends TypedSource[(A, B, C, D, E, F, G, H, I, J)]

  211. trait TypedSource11[A, B, C, D, E, F, G, H, I, J, K] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K)]

  212. trait TypedSource12[A, B, C, D, E, F, G, H, I, J, K, L] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L)]

  213. trait TypedSource13[A, B, C, D, E, F, G, H, I, J, K, L, M] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M)]

  214. trait TypedSource14[A, B, C, D, E, F, G, H, I, J, K, L, M, N] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N)]

  215. trait TypedSource15[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O)]

  216. trait TypedSource16[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P)]

  217. trait TypedSource17[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q)]

  218. trait TypedSource18[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R)]

  219. trait TypedSource19[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S)]

  220. trait TypedSource2[A, B] extends TypedSource[(A, B)]

  221. trait TypedSource20[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T)]

  222. trait TypedSource21[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U)]

  223. trait TypedSource22[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V] extends TypedSource[(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V)]

  224. trait TypedSource3[A, B, C] extends TypedSource[(A, B, C)]

  225. trait TypedSource4[A, B, C, D] extends TypedSource[(A, B, C, D)]

  226. trait TypedSource5[A, B, C, D, E] extends TypedSource[(A, B, C, D, E)]

  227. trait TypedSource6[A, B, C, D, E, F] extends TypedSource[(A, B, C, D, E, F)]

  228. trait TypedSource7[A, B, C, D, E, F, G] extends TypedSource[(A, B, C, D, E, F, G)]

  229. trait TypedSource8[A, B, C, D, E, F, G, H] extends TypedSource[(A, B, C, D, E, F, G, H)]

  230. trait TypedSource9[A, B, C, D, E, F, G, H, I] extends TypedSource[(A, B, C, D, E, F, G, H, I)]

  231. case class UniqueID(get: String) extends Product with Serializable


    Used to inject a typed unique identifier to uniquely name each scalding flow.

    Used to inject a typed unique identifier to uniquely name each scalding flow. This is here mostly to deal with the case of testing where there are many concurrent threads running Flows. Users should never have to worry about these

  232. trait UtcDateRangeJob extends Job with DefaultDateRangeJob

  233. type ValuePipe[+T] = scalding.typed.ValuePipe[T]

  234. case class WritableSequenceFile[K <: Writable, V <: Writable](p: String, f: Fields, sinkMode: SinkMode = SinkMode.REPLACE)(implicit evidence$3: Manifest[K], evidence$4: Manifest[V]) extends FixedPathSource with WritableSequenceFileScheme with LocalTapSource with TypedSink[(K, V)] with TypedSource[(K, V)] with Product with Serializable

  235. trait WritableSequenceFileScheme extends SchemedSource

  236. class XHandler extends AnyRef


    Provide handlers and mapping for exceptions

  237. class FixedPathTypedDelimited[T] extends FixedPathSource with TypedDelimited[T]


    (Since version 2015-07) Use FixedTypedText instead

  238. trait TupleConversions extends AnyRef


    (Since version 0.9.0) This trait does nothing now

  239. trait TypedDelimited[T] extends SchemedSource with DelimitedScheme with Mappable[T] with TypedSink[T]


    Allows you to set the types, prefer this: If T is a subclass of Product, we assume it is a tuple.

    Allows you to set the types, prefer this: If T is a subclass of Product, we assume it is a tuple. If it is not, wrap T in a Tuple1: e.g. TypedTsv[Tuple1[List[Int]]]


    (Since version 2015-07) Use TypedTextDelimited instead

Value Members

  1. object AcceptAllPathFilter extends PathFilter

  2. object ArgHelp extends ArgHelper

  3. object BijectedOrderedSerialization

  4. object CascadeTest

  5. object CascadingTokenUpdater

  6. object CastHfsTap

  7. object Config extends Serializable

  8. object Dsl extends FieldConversions with Serializable


    This object has all the implicit functions and values that are used to make the scalding DSL, which includes the functions for automatically creating cascading.tuple.Fields objects from scala tuples of Strings, Symbols or Ints, as well as the cascading.pipe.Pipe enrichment to RichPipe which adds the scala.collections-like API to Pipe.

    This object has all the implicit functions and values that are used to make the scalding DSL, which includes the functions for automatically creating cascading.tuple.Fields objects from scala tuples of Strings, Symbols or Ints, as well as the cascading.pipe.Pipe enrichment to RichPipe which adds the scala.collections-like API to Pipe.

    It's useful to import Dsl._ when you are writing scalding code outside of a Job.

  9. object Execution extends Serializable


    Execution has many methods for creating Execution[T] instances, which are the preferred way to compose computations in scalding libraries.

  10. object ExecutionApp extends Serializable

  11. object ExecutionContext

  12. object ExecutionCounters


    The companion gives several ways to create ExecutionCounters from other CascadingStats, JobStats, or Maps

  13. object ExecutionUtil

  14. object ExpandLibJarsGlobs

  15. object Field extends Serializable

  16. object FileSource extends Serializable

  17. object FixedPathTypedDelimited extends Serializable

  18. object FlowStateMap


    This is a mutable threadsafe store for attaching scalding information to the mutable flowDef

    This is a mutable threadsafe store for attaching scalding information to the mutable flowDef

    NOTE: there is a subtle bug in scala regarding case classes with multiple sets of arguments, and their equality. For this reason, we use Source.sourceId as the key in this map

  19. object FunctionImplicits

  20. object HadoopSchemeInstance

  21. object HiddenFileFilter extends PathFilter

  22. object IdentityFunction extends BaseOperation[Any] with Function[Any] with ScaldingPrepare[Any]

  23. object InnerJoinMode extends JoinMode with Product with Serializable

  24. object Job extends Serializable

  25. object JobStats extends Serializable

  26. object JobTest

  27. object JoinAlgorithms extends Serializable

  28. object LineNumber

  29. object MapsideCache

  30. object MapsideReduce extends Serializable


    An implementation of map-side combining which is appropriate for associative and commutative functions If a cacheSize is given, it is used, else we query the config for cascading.aggregateby.threshold (standard cascading param for an equivalent case) else we use a default value of 100,000

    An implementation of map-side combining which is appropriate for associative and commutative functions If a cacheSize is given, it is used, else we query the config for cascading.aggregateby.threshold (standard cascading param for an equivalent case) else we use a default value of 100,000

    This keeps a cache of keys up to the cache-size, summing values as keys collide On eviction, or completion of this Operation, the key-value pairs are put into outputCollector.

    This NEVER spills to disk and generally never be a performance penalty. If you have poor locality in the keys, you just don't get any benefit but little added cost.

    Note this means that you may still have repeated keys in the output even on a single mapper since the key space may be so large that you can't fit all of them in the cache at the same time.

    You can use this with the Fields-API by doing:

    val msr = new MapsideReduce(Semigroup.from(fn), 'key, 'value, None)
    // MUST map onto the same key,value space (may be multiple fields)
    val mapSideReduced = pipe.eachTo(('key, 'value) -> ('key, 'value)) { _ => msr }

    That said, this is equivalent to AggregateBy, and the only value is that it is much simpler than AggregateBy. AggregateBy assumes several parallel reductions are happening, and thus has many loops, and array lookups to deal with that. Since this does many fewer allocations, and has a smaller code-path it may be faster for the typed-API.

  31. object Mode extends Serializable

  32. object MultipleWritableSequenceFiles extends Serializable

  33. object NullSource extends Source with BaseNullSource


    A source outputs nothing.

    A source outputs nothing. It is used to drive execution of a task for side effect only.

  34. object OffsetTextLine extends Serializable


    Alternate typed TextLine source that keeps both 'offset and 'line fields.

  35. object OuterJoinMode extends JoinMode with Product with Serializable

  36. object PartitionedSequenceFile extends Serializable


    An implementation of SequenceFile output, split over a partition tap.

    An implementation of SequenceFile output, split over a partition tap.

    apply assumes user wants a DelimitedPartition (the only strategy bundled with Cascading).

  37. object PartitionedTsv extends Serializable


    An implementation of TSV output, split over a partition tap.

    An implementation of TSV output, split over a partition tap.

    Similar to TemplateSource, but with addition of tsvFields, to let users explicitly specify which fields they want to see in the TSV (allows user to discard path fields).

    apply assumes user wants a DelimitedPartition (the only strategy bundled with Cascading).

  38. object Read extends AccessMode with Product with Serializable

  39. object ReflectionUtils


    A helper for working with class reflection.

    A helper for working with class reflection. Allows us to avoid code repetition.

  40. object RichFields extends Serializable

  41. object RichPathFilter

  42. object RichPipe extends Serializable

  43. object RichXHandler


    Provide apply method for creating XHandlers with default or custom settings and contain messages and mapping

  44. object RuntimeStats extends Serializable


    Wrapper around a FlowProcess useful, for e.g.

    Wrapper around a FlowProcess useful, for e.g. incrementing counters.

  45. object Stat extends Serializable

  46. object StatKey extends Serializable

  47. object Stats

  48. object StringUtility

  49. object SuccessFileFilter extends PathFilter

  50. val TDsl: scalding.typed.TDsl.type


    The objects for the Typed-API live in the scalding.typed package but are aliased here.

  51. object TestTapFactory extends Serializable


    Use this to create Taps for testing.

  52. object TextLine extends Serializable

  53. object TimePathedSource extends Serializable

  54. object Tool

  55. object Tracing


    Calling init registers "com.twitter.scalding" as a "tracing boundary" for Cascading.

    Calling init registers "com.twitter.scalding" as a "tracing boundary" for Cascading. That means that when Cascading sends trace information to a DocumentService such as Driven, the trace will have information about the caller of Scalding instead of about the internals of Scalding. com.twitter.scalding.Job and its subclasses will automatically initialize Tracing.

    register and unregister methods are provided for testing, but should not be needed for most development

  56. object TupleConverter extends GeneratedTupleConverters

  57. object TupleGetter extends LowPriorityTupleGetter

  58. object TuplePacker extends CaseClassPackers

  59. object TupleSetter extends GeneratedTupleSetters

  60. object TupleUnpacker extends LowPriorityTupleUnpackers with Serializable


    Typeclass for objects which unpack an object into a tuple.

    Typeclass for objects which unpack an object into a tuple. The packer can verify the arity, types, and also the existence of the getter methods at plan time, without having the job blow up in the middle of a run.

  61. object TypeDescriptor extends Serializable

  62. object TypedCsv extends TypedSeperatedFile


    Typed comma separated values file

  63. object TypedOsv extends TypedSeperatedFile


    Typed one separated values file (commonly used by Pig)

  64. val TypedPipe: scalding.typed.TypedPipe.type

  65. object TypedPipeChecker


    This class is used to assist with testing a TypedPipe

  66. object TypedPsv extends TypedSeperatedFile


    Typed pipe separated values flile

  67. object TypedTsv extends TypedSeperatedFile


    Typed tab separated values file

  68. object UniqueID extends Serializable

  69. object WritableSequenceFile extends Serializable

  70. object Write extends AccessMode with Product with Serializable

  71. package bdd

  72. package cascading_interop

  73. package filecache

  74. package macros

  75. package mathematics

  76. package reducer_estimation

  77. val scaldingVersion: String


    Make sure this is in sync with version.sbt

  78. package serialization

  79. package source

  80. package typed


Inherited from AnyRef

Inherited from Any
