CreateReadCountPanelOfNormals (gatk 4.2.4.0 API)

java.lang.Object
- org.broadinstitute.hellbender.cmdline.CommandLineProgram
- - org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
  - - org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals

All Implemented Interfaces:

java.io.Serializable, org.broadinstitute.barclay.argparser.CommandLinePluginProvider
```
@DocumentedFeature
public final class CreateReadCountPanelOfNormals
extends SparkCommandLineProgram
```
Creates a panel of normals (PoN) for read-count denoising given the read counts for samples in the panel. The resulting PoN can be used with DenoiseReadCounts to denoise other samples.
The input read counts are first transformed to log2 fractional coverages and preprocessed according to specified filtering and imputation parameters. Singular value decomposition (SVD) is then performed to find the first number-of-eigensamples principal components, which are stored in the PoN. Some or all of these principal components can then be used for denoising case samples with DenoiseReadCounts; it is assumed that the principal components used represent systematic sequencing biases (rather than statistical noise). Examining the singular values, which are also stored in the PoN, may be useful in determining the appropriate number of principal components to use for denoising.

If annotated intervals are provided, explicit GC-bias correction will be performed by GCBiasCorrector before filtering and SVD. GC-content information for the intervals will be stored in the PoN and used to perform explicit GC-bias correction identically in DenoiseReadCounts. Note that if annotated intervals are not provided, it is still likely that GC-bias correction is implicitly performed by the SVD denoising process (i.e., some of the principal components arise from GC bias).

Note that such SVD denoising cannot distinguish between variance due to systematic sequencing biases and that due to true common germline CNVs present in the panel; signal from the latter may thus be inadvertently denoised away. Furthermore, variance arising from coverage on the sex chromosomes may also significantly contribute to the principal components if the panel contains samples of mixed sex. Therefore, if sex chromosomes are not excluded from coverage collection, it is strongly recommended that users avoid creating panels of mixed sex and take care to denoise case samples only with panels containing only individuals of the same sex as the case samples. (See GermlineCNVCaller, which avoids these issues by simultaneously learning a probabilistic model for systematic bias and calling rare and common germline CNVs for samples in the panel.)

Inputs
- Counts files (TSV or HDF5 output of CollectReadCounts).
- (Optional) GC-content annotated-intervals file from AnnotateIntervals. Explicit GC-bias correction will be performed on the panel samples and identically for subsequent case samples.
Outputs
- Panel-of-normals file. This is an HDF5 file containing the panel data in the paths defined in HDF5SVDReadCountPanelOfNormals. HDF5 files may be viewed using hdfview or loaded in python using PyTables or h5py.
Usage examples
```
     gatk CreateReadCountPanelOfNormals \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.hdf5 \
          ... \
          -O cnv.pon.hdf5
 
```
```
     gatk CreateReadCountPanelOfNormals \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.tsv \
          ... \
          --annotated-intervals annotated_intervals.tsv \
          -O cnv.pon.hdf5
 
```
See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME`
`static java.lang.String`	`EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME`
`static java.lang.String`	`IMPUTE_ZEROS_LONG_NAME`
`static java.lang.String`	`MAXIMUM_CHUNK_SIZE`
`static java.lang.String`	`MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME`
`static java.lang.String`	`MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME`
`static java.lang.String`	`MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME`

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY

Constructor Summary

Constructors
Constructor and Description

CreateReadCountPanelOfNormals()

Constructors
Constructor and Description
`CreateReadCountPanelOfNormals()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method and Description

protected void runPipeline(org.apache.spark.api.java.JavaSparkContext ctx)
Runs the pipeline.
- Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
  afterPipeline, doWork, getProgramName
- Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
  customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getPluginDescriptors, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`runPipeline(org.apache.spark.api.java.JavaSparkContext ctx)` Runs the pipeline.

- Field Detail
  - MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME
```
public static final java.lang.String MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME
```
public static final java.lang.String MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME
```
public static final java.lang.String MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME
```
public static final java.lang.String EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - IMPUTE_ZEROS_LONG_NAME
```
public static final java.lang.String IMPUTE_ZEROS_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME
```
public static final java.lang.String EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - MAXIMUM_CHUNK_SIZE
```
public static final java.lang.String MAXIMUM_CHUNK_SIZE
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - CreateReadCountPanelOfNormals
```
public CreateReadCountPanelOfNormals()
```
- Method Detail
  - runPipeline
```
protected void runPipeline(org.apache.spark.api.java.JavaSparkContext ctx)
```
    Description copied from class: SparkCommandLineProgram
    
    Runs the pipeline.
    
    Specified by:
    
    runPipeline in class SparkCommandLineProgram

Class CreateReadCountPanelOfNormals

Inputs

Outputs

Usage examples

Field Summary

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Detail

MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME

MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME

MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME

EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME

IMPUTE_ZEROS_LONG_NAME

EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME

MAXIMUM_CHUNK_SIZE

Constructor Detail

CreateReadCountPanelOfNormals

Method Detail

runPipeline