A version of OrderedRDDFunctions which enables instrumentation of its operations.
A version of OrderedRDDFunctions which enables instrumentation of its operations. For more details and usage instructions see the MetricsContext class.
Implementation of org.apache.hadoop.mapreduce.OutputFormat, which instruments its
RecordWriter's write
method.
Implementation of org.apache.hadoop.mapreduce.OutputFormat, which instruments its
RecordWriter's write
method. Classes should extend this one and provide the class of the underlying
output format using the outputFormatClass
method.
This class is intended for use with the methods in InstrumentedPairRDDFunctions that save hadoop files
(saveAs*HadoopFile
).
A version of PairRDDFunctions which enables instrumentation of its operations.
A version of PairRDDFunctions which enables instrumentation of its operations. For more details and usage instructions see the MetricsContext class.
An RDD which instruments its operations.
An RDD which instruments its operations. For further details and usage instructions see the MetricsContext class.
This class needs to be in the org.apache.spark.rdd package, otherwise Spark will record the incorrect call site (which in turn becomes the stage name). This can be fixed when we use Spark 1.1.1 (needs SPARK-1853).
Functions which permit creation of instrumented RDDs, as well as the ability to stop instrumentation by
calling the unInstrument
method.
Functions which permit creation of instrumented RDDs, as well as the ability to stop instrumentation by
calling the unInstrument
method. For more details and usage instructions see the MetricsContext class.
Represents a timer, for timing a function.
Represents a timer, for timing a function. Call the time
function, passing the function to time.
For recording metrics the Timer either uses the passed-in MetricsRecorder if it is defined, or it looks in the Metrics.Recorder field for a recorder. If neither of these are defined then no metrics are recorded (the function is executed without recording metrics).
The overhead of recording metrics has been measured at around 100 nanoseconds on an Intel i7-3720QM. The overhead
of calling the time
method when no metrics are being recorded (a recorder is not defined) is negligible.
This class needs to be in the org.apache.spark.rdd package, otherwise Spark records somewhere in the
time
method as the call site (which in turn becomes the stage name).
This can be fixed when Spark 1.1.1 is released (needs SPARK-1853).
Contains implicit conversions which enable instrumentation of Spark operations.
Contains implicit conversions which enable instrumentation of Spark operations. This class should be used instead of org.apache.spark.SparkContext when instrumentation is required. Usage is as follows:
import org.bdgenomics.utils.instrumentation.Metrics._ import org.apache.spark.rdd.MetricsContext._ Metrics.initialize(sparkContext) val instrumentedRDD = rdd.instrument()
Then, when any operations are performed on instrumentedRDD
the RDD operation will be instrumented, along
with any functions that operate on its data. All subsequent RDD operations will be instrumented until
the unInstrument
method is called on an RDD.
When using this class, it is not a good idea to import SparkContext._
, as the implicit conversions in there
may conflict with those in here -- instead it is better to import only the specific parts of SparkContext
that are needed.