Package

org.apache.spark

rdd

Permalink

package rdd

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. rdd
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class FunctionRecorder extends Serializable

    Permalink
  2. class InstrumentedOrderedRDDFunctions[K, V] extends Serializable

    Permalink

    A version of OrderedRDDFunctions which enables instrumentation of its operations.

    A version of OrderedRDDFunctions which enables instrumentation of its operations. For more details and usage instructions see the MetricsContext class.

  3. abstract class InstrumentedOutputFormat[K, V] extends OutputFormat[K, V]

    Permalink

    Implementation of org.apache.hadoop.mapreduce.OutputFormat, which instruments its RecordWriter's write method.

    Implementation of org.apache.hadoop.mapreduce.OutputFormat, which instruments its RecordWriter's write method. Classes should extend this one and provide the class of the underlying output format using the outputFormatClass method.

    This class is intended for use with the methods in InstrumentedPairRDDFunctions that save hadoop files (saveAs*HadoopFile).

  4. class InstrumentedPairRDDFunctions[K, V] extends Serializable

    Permalink

    A version of PairRDDFunctions which enables instrumentation of its operations.

    A version of PairRDDFunctions which enables instrumentation of its operations. For more details and usage instructions see the MetricsContext class.

  5. class InstrumentedRDD[T] extends RDD[T]

    Permalink

    An RDD which instruments its operations.

    An RDD which instruments its operations. For further details and usage instructions see the MetricsContext class.

    Note

    This class needs to be in the org.apache.spark.rdd package, otherwise Spark will record the incorrect call site (which in turn becomes the stage name). This can be fixed when we use Spark 1.1.1 (needs SPARK-1853).

  6. class InstrumentedRDDFunctions[T] extends AnyRef

    Permalink

    Functions which permit creation of instrumented RDDs, as well as the ability to stop instrumentation by calling the unInstrument method.

    Functions which permit creation of instrumented RDDs, as well as the ability to stop instrumentation by calling the unInstrument method. For more details and usage instructions see the MetricsContext class.

  7. class Timer extends Serializable

    Permalink

    Represents a timer, for timing a function.

    Represents a timer, for timing a function. Call the time function, passing the function to time.

    For recording metrics the Timer either uses the passed-in MetricsRecorder if it is defined, or it looks in the Metrics.Recorder field for a recorder. If neither of these are defined then no metrics are recorded (the function is executed without recording metrics).

    The overhead of recording metrics has been measured at around 100 nanoseconds on an Intel i7-3720QM. The overhead of calling the time method when no metrics are being recorded (a recorder is not defined) is negligible.

    Note

    This class needs to be in the org.apache.spark.rdd package, otherwise Spark records somewhere in the time method as the call site (which in turn becomes the stage name). This can be fixed when Spark 1.1.1 is released (needs SPARK-1853).

Value Members

  1. object InstrumentedPairRDDFunctions extends Serializable

    Permalink
  2. object InstrumentedRDD extends Serializable

    Permalink
  3. object MetricsContext

    Permalink

    Contains implicit conversions which enable instrumentation of Spark operations.

    Contains implicit conversions which enable instrumentation of Spark operations. This class should be used instead of org.apache.spark.SparkContext when instrumentation is required. Usage is as follows:

    import org.bdgenomics.utils.instrumentation.Metrics._
    import org.apache.spark.rdd.MetricsContext._
    Metrics.initialize(sparkContext)
    val instrumentedRDD = rdd.instrument()

    Then, when any operations are performed on instrumentedRDD the RDD operation will be instrumented, along with any functions that operate on its data. All subsequent RDD operations will be instrumented until the unInstrument method is called on an RDD.

    Note

    When using this class, it is not a good idea to import SparkContext._, as the implicit conversions in there may conflict with those in here -- instead it is better to import only the specific parts of SparkContext that are needed.

Inherited from AnyRef

Inherited from Any

Ungrouped