org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals

All Implemented Interfaces:: Serializable, org.broadinstitute.barclay.argparser.CommandLinePluginProvider

@DocumentedFeature public final class CreateReadCountPanelOfNormals extends SparkCommandLineProgram

Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel. The resulting PoN can be used with DenoiseReadCounts to denoise other samples.

The input read counts are first transformed to log2 fractional coverages and preprocessed according to specified filtering and imputation parameters. Singular value decomposition (SVD) is then performed to find the first number-of-eigensamples principal components, which are stored in the PoN. Some or all of these principal components can then be used for denoising case samples with DenoiseReadCounts; it is assumed that the principal components used represent systematic sequencing biases (rather than statistical noise). Examining the singular values, which are also stored in the PoN, may be useful in determining the appropriate number of principal components to use for denoising.

If annotated intervals are provided, explicit GC-bias correction will be performed by GCBiasCorrector before filtering and SVD. GC-content information for the intervals will be stored in the PoN and used to perform explicit GC-bias correction identically in DenoiseReadCounts. Note that if annotated intervals are not provided, it is still likely that GC-bias correction is implicitly performed by the SVD denoising process (i.e., some of the principal components arise from GC bias).

Note that such SVD denoising cannot distinguish between variance due to systematic sequencing biases and that due to true common germline CNVs present in the panel; signal from the latter may thus be inadvertently denoised away. Furthermore, variance arising from coverage on the sex chromosomes may also significantly contribute to the principal components if the panel contains samples of mixed sex. Therefore, if sex chromosomes are not excluded from coverage collection, it is strongly recommended that users avoid creating panels of mixed sex and take care to denoise case samples only with panels containing only individuals of the same sex as the case samples. (See GermlineCNVCaller, which avoids these issues by simultaneously learning a probabilistic model for systematic bias and calling rare and common germline CNVs for samples in the panel.)

Inputs

Counts files (TSV or HDF5 output of CollectReadCounts).
(Optional) GC-content annotated-intervals file from AnnotateIntervals. Explicit GC-bias correction will be performed on the panel samples and identically for subsequent case samples.

Outputs

Panel-of-normals file. This is an HDF5 file containing the panel data in the paths defined in HDF5SVDReadCountPanelOfNormals. HDF5 files may be viewed using hdfview or loaded in Python using PyTables or h5py.

Usage examples

     gatk CreateReadCountPanelOfNormals \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.hdf5 \
          ... \
          -O cnv.pon.hdf5

     gatk CreateReadCountPanelOfNormals \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.tsv \
          ... \
          --annotated-intervals annotated_intervals.tsv \
          -O cnv.pon.hdf5

See Also:

Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
CommandLineProgram.AutoCloseableNoCheckedExceptions
Field Summary

Fields

Modifier and Type

Field

Description

static final String

EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME

static final String

EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME

static final String

IMPUTE_ZEROS_LONG_NAME

static final String

MAXIMUM_CHUNK_SIZE

static final String

MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME

static final String

MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME

static final String

MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor Summary

Constructors

Constructor

Description

CreateReadCountPanelOfNormals()
Method Summary

Modifier and Type

Method

Description

protected void

runPipeline(org.apache.spark.api.java.JavaSparkContext ctx)

Runs the pipeline.

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
afterPipeline, doWork, getProgramName

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getPluginDescriptors, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME
  
  public static final String MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME
  See Also:
  
  Constant Field Values
- MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME
  
  public static final String MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME
  See Also:
  
  Constant Field Values
- MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME
  
  public static final String MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME
  See Also:
  
  Constant Field Values
- EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME
  
  public static final String EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME
  See Also:
  
  Constant Field Values
- IMPUTE_ZEROS_LONG_NAME
  
  public static final String IMPUTE_ZEROS_LONG_NAME
  See Also:
  
  Constant Field Values
- EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME
  
  public static final String EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME
  See Also:
  
  Constant Field Values
- MAXIMUM_CHUNK_SIZE
  
  public static final String MAXIMUM_CHUNK_SIZE
  See Also:
  
  Constant Field Values
Constructor Details
- CreateReadCountPanelOfNormals
  
  public CreateReadCountPanelOfNormals()
Method Details
- runPipeline
  
  protected void runPipeline(org.apache.spark.api.java.JavaSparkContext ctx)
  
  Description copied from class: SparkCommandLineProgram
  
  Runs the pipeline.
  
  Specified by:
  
  runPipeline in class SparkCommandLineProgram

Class CreateReadCountPanelOfNormals

Inputs

Outputs

Usage examples

Nested Class Summary

Nested classes/interfaces inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Field Summary

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Details

MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME

MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME

MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME

EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME

IMPUTE_ZEROS_LONG_NAME

EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME

MAXIMUM_CHUNK_SIZE

Constructor Details

CreateReadCountPanelOfNormals

Method Details

runPipeline