Package org.broadinstitute.hellbender.tools.spark.pipelines.metrics


package org.broadinstitute.hellbender.tools.spark.pipelines.metrics
  • Class
    Description
    Collects base distribution per cycle in SAM/BAM/CRAM file(s).
    Collects insert size distribution information in alignment data.
    Runs multiple metrics collection modules for a given alignment file.
     
     
    Collects quality yield metrics in SAM/BAM/CRAM file(s).
    Worker class to collect insert size metrics, add metrics to file, and provides accessors to stats of groups of different level.
    Program to generate a data table and chart of mean quality by cycle from a BAM file.
    Each metrics collector has to be able to run from 4 different contexts: - a standalone walker tool - the org.broadinstitute.hellbender.metrics.analysis.CollectMultipleMetrics walker tool - a standalone Spark tool - the CollectMultipleMetricsSpark tool In order to allow a single collector implementation to be shared across all of these contexts (standalone and CollectMultiple, Spark and non-Spark), collectors should be factored into the following classes, where X in the class names represents the specific type of metrics being collected: XMetrics extends MetricBase: defines the aggregate metrics that we're trying to collect XMetricsArgumentCollection: defines parameters for XMetrics, extends MetricsArgumentCollection XMetricsCollector: processes a single read, and has a reduce/combiner For multi level collectors, XMetricsCollector is composed of several classes: XMetricsCollector extends MultiLevelReducibleCollector< XMetrics, HISTOGRAM_KEY, XMetricsCollectorArgs, XMetricsPerUnitCollector> XMetricsPerUnitCollector: per level collector, implements PerUnitMetricCollector<XMetrics, HISTOGRAM_KEY, XMetricsCollectorArgs> (requires a combiner) XMetricsCollectorArgs per-record argument (type argument for MultiLevelReducibleCollector) XMetricsCollectorSpark: adapter/bridge between RDD and the (read-based) XMetricsCollector, implements MetricsCollectorSpark CollectXMetrics extends org.broadinstitute.hellbender.metrics.analysis.SinglePassSamProgram CollectXMetricsSpark extends MetricsCollectorSparkTool The following schematic shows the general relationships of these collector component classes in the context of various tools, with the arrows indicating a "delegates to" relationship via composition or inheritance: CollectXMetrics CollectMultipleMetrics \ / \ / v v _______________________________________ | XMetricsCollector =========|=========> MultiLevelReducibleCollector | | | | | V | | | XMetrics | V | XMetricsCollectorArgumentCollection | PerUnitXMetricCollector --------------------------------------- ^ | | XMetricsCollectorSpark ^ ^ / \ / \ CollectXMetricsSpark CollectMultipleMetricsSpark The general lifecycle of a Spark collector (XMetricsCollectorSpark in the diagram above) looks like this: CollectorType collector = new CollectorType() CollectorArgType args = // get metric-specific input arguments // NOTE: getDefaultReadFilters is called before the collector's initialize // method is called, so the read filters cannot access argument values ReadFilter filter == collector.getDefaultReadFilters(); // pass the input arguments to the collector for initialization collector.initialize(args, defaultMetricsHeaders); collector.collectMetrics( getReads().filter(filter), samFileHeader ); collector.saveMetrics(getReadSourceName());
    Base class for standalone Spark metrics collector tools.
    Charts quality score distribution within a BAM file.
    QualityYieldMetricsCollector for Spark.