Package org.broadinstitute.hellbender.tools.spark.pipelines.metrics
package org.broadinstitute.hellbender.tools.spark.pipelines.metrics
-
ClassDescriptionCollects base distribution per cycle in SAM/BAM/CRAM file(s).Collects insert size distribution information in alignment data.Runs multiple metrics collection modules for a given alignment file.Collects quality yield metrics in SAM/BAM/CRAM file(s).Worker class to collect insert size metrics, add metrics to file, and provides accessors to stats of groups of different level.Program to generate a data table and chart of mean quality by cycle from a BAM file.MetricsCollectorSpark<T extends MetricsArgumentCollection>Each metrics collector has to be able to run from 4 different contexts: - a standalone walker tool - the
org.broadinstitute.hellbender.metrics.analysis.CollectMultipleMetrics
walker tool - a standalone Spark tool - theCollectMultipleMetricsSpark
tool In order to allow a single collector implementation to be shared across all of these contexts (standalone and CollectMultiple, Spark and non-Spark), collectors should be factored into the following classes, where X in the class names represents the specific type of metrics being collected: XMetrics extendsMetricBase
: defines the aggregate metrics that we're trying to collect XMetricsArgumentCollection: defines parameters for XMetrics, extendsMetricsArgumentCollection
XMetricsCollector: processes a single read, and has a reduce/combiner For multi level collectors, XMetricsCollector is composed of several classes: XMetricsCollector extendsMultiLevelReducibleCollector
< XMetrics, HISTOGRAM_KEY, XMetricsCollectorArgs, XMetricsPerUnitCollector> XMetricsPerUnitCollector: per level collector, implementsPerUnitMetricCollector
<XMetrics, HISTOGRAM_KEY, XMetricsCollectorArgs> (requires a combiner) XMetricsCollectorArgs per-record argument (type argument forMultiLevelReducibleCollector
) XMetricsCollectorSpark: adapter/bridge between RDD and the (read-based) XMetricsCollector, implementsMetricsCollectorSpark
CollectXMetrics extends org.broadinstitute.hellbender.metrics.analysis.SinglePassSamProgram
CollectXMetricsSpark extendsMetricsCollectorSparkTool
The following schematic shows the general relationships of these collector component classes in the context of various tools, with the arrows indicating a "delegates to" relationship via composition or inheritance: CollectXMetrics CollectMultipleMetrics \ / \ / v v _______________________________________ | XMetricsCollector =========|=========> MultiLevelReducibleCollector | | | | | V | | | XMetrics | V | XMetricsCollectorArgumentCollection | PerUnitXMetricCollector --------------------------------------- ^ | | XMetricsCollectorSpark ^ ^ / \ / \ CollectXMetricsSpark CollectMultipleMetricsSpark The general lifecycle of a Spark collector (XMetricsCollectorSpark in the diagram above) looks like this: CollectorType collector = new CollectorType () CollectorArgType args = // get metric-specific input arguments // NOTE: getDefaultReadFilters is called before the collector's initialize // method is called, so the read filters cannot access argument values ReadFilter filter == collector.getDefaultReadFilters(); // pass the input arguments to the collector for initialization collector.initialize(args, defaultMetricsHeaders); collector.collectMetrics( getReads().filter(filter), samFileHeader ); collector.saveMetrics(getReadSourceName()); Base class for standalone Spark metrics collector tools.Charts quality score distribution within a BAM file.QualityYieldMetricsCollector for Spark.