Class CreateReadCountPanelOfNormals
- All Implemented Interfaces:
Serializable
,org.broadinstitute.barclay.argparser.CommandLinePluginProvider
DenoiseReadCounts
to denoise other samples.
The input read counts are first transformed to log2 fractional coverages and preprocessed
according to specified filtering and imputation parameters. Singular value decomposition (SVD)
is then performed to find the first number-of-eigensamples
principal components,
which are stored in the PoN. Some or all of these principal components can then be used for
denoising case samples with DenoiseReadCounts
; it is assumed that the principal components used
represent systematic sequencing biases (rather than statistical noise). Examining the singular values,
which are also stored in the PoN, may be useful in determining the appropriate number
of principal components to use for denoising.
If annotated intervals are provided, explicit GC-bias correction will be performed by GCBiasCorrector
before filtering and SVD. GC-content information for the intervals will be stored in the PoN
and used to perform explicit GC-bias correction identically in DenoiseReadCounts
.
Note that if annotated intervals are not provided, it is still likely that GC-bias correction is
implicitly performed by the SVD denoising process (i.e., some of the principal components arise from GC bias).
Note that such SVD denoising cannot distinguish between variance due to systematic sequencing biases and that
due to true common germline CNVs present in the panel; signal from the latter may thus be inadvertently denoised
away. Furthermore, variance arising from coverage on the sex chromosomes may also significantly contribute
to the principal components if the panel contains samples of mixed sex. Therefore, if sex chromosomes
are not excluded from coverage collection, it is strongly recommended that users avoid creating panels of
mixed sex and take care to denoise case samples only with panels containing only individuals of the same sex
as the case samples. (See GermlineCNVCaller
, which avoids these issues by simultaneously learning
a probabilistic model for systematic bias and calling rare and common germline CNVs for samples in the panel.)
Inputs
-
Counts files (TSV or HDF5 output of
CollectReadCounts
). -
(Optional) GC-content annotated-intervals file from
AnnotateIntervals
. Explicit GC-bias correction will be performed on the panel samples and identically for subsequent case samples.
Outputs
-
Panel-of-normals file.
This is an HDF5 file containing the panel data in the paths defined in
HDF5SVDReadCountPanelOfNormals
. HDF5 files may be viewed using hdfview or loaded in Python using PyTables or h5py.
Usage examples
gatk CreateReadCountPanelOfNormals \ -I sample_1.counts.hdf5 \ -I sample_2.counts.hdf5 \ ... \ -O cnv.pon.hdf5
gatk CreateReadCountPanelOfNormals \ -I sample_1.counts.hdf5 \ -I sample_2.counts.tsv \ ... \ --annotated-intervals annotated_intervals.tsv \ -O cnv.pon.hdf5
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
CommandLineProgram.AutoCloseableNoCheckedExceptions
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
static final String
static final String
static final String
static final String
static final String
static final String
Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs
Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
runPipeline
(org.apache.spark.api.java.JavaSparkContext ctx) Runs the pipeline.Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
afterPipeline, doWork, getProgramName
Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getPluginDescriptors, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
-
Field Details
-
MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME
- See Also:
-
MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME
- See Also:
-
MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME
- See Also:
-
EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME
- See Also:
-
IMPUTE_ZEROS_LONG_NAME
- See Also:
-
EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME
- See Also:
-
MAXIMUM_CHUNK_SIZE
- See Also:
-
-
Constructor Details
-
CreateReadCountPanelOfNormals
public CreateReadCountPanelOfNormals()
-
-
Method Details
-
runPipeline
protected void runPipeline(org.apache.spark.api.java.JavaSparkContext ctx) Description copied from class:SparkCommandLineProgram
Runs the pipeline.- Specified by:
runPipeline
in classSparkCommandLineProgram
-