@DocumentedFeature public final class CreateReadCountPanelOfNormals extends SparkCommandLineProgram
DenoiseReadCounts
to denoise other samples.
The input read counts are first transformed to log2 fractional coverages and preprocessed
according to specified filtering and imputation parameters. Singular value decomposition (SVD)
is then performed to find the first number-of-eigensamples
principal components,
which are stored in the PoN. Some or all of these principal components can then be used for
denoising case samples with DenoiseReadCounts
; it is assumed that the principal components used
represent systematic sequencing biases (rather than statistical noise). Examining the singular values,
which are also stored in the PoN, may be useful in determining the appropriate number
of principal components to use for denoising.
If annotated intervals are provided, explicit GC-bias correction will be performed by GCBiasCorrector
before filtering and SVD. GC-content information for the intervals will be stored in the PoN
and used to perform explicit GC-bias correction identically in DenoiseReadCounts
.
Note that if annotated intervals are not provided, it is still likely that GC-bias correction is
implicitly performed by the SVD denoising process (i.e., some of the principal components arise from GC bias).
Note that such SVD denoising cannot distinguish between variance due to systematic sequencing biases and that
due to true common germline CNVs present in the panel; signal from the latter may thus be inadvertently denoised
away. Furthermore, variance arising from coverage on the sex chromosomes may also significantly contribute
to the principal components if the panel contains samples of mixed sex. Therefore, if sex chromosomes
are not excluded from coverage collection, it is strongly recommended that users avoid creating panels of
mixed sex and take care to denoise case samples only with panels containing only individuals of the same sex
as the case samples. (See GermlineCNVCaller
, which avoids these issues by simultaneously learning
a probabilistic model for systematic bias and calling rare and common germline CNVs for samples in the panel.)
CollectReadCounts
).
AnnotateIntervals
.
Explicit GC-bias correction will be performed on the panel samples and identically for subsequent case samples.
HDF5SVDReadCountPanelOfNormals
.
HDF5 files may be viewed using hdfview
or loaded in python using PyTables or h5py.
gatk CreateReadCountPanelOfNormals \ -I sample_1.counts.hdf5 \ -I sample_2.counts.hdf5 \ ... \ -O cnv.pon.hdf5
gatk CreateReadCountPanelOfNormals \ -I sample_1.counts.hdf5 \ -I sample_2.counts.tsv \ ... \ --annotated-intervals annotated_intervals.tsv \ -O cnv.pon.hdf5
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME |
static java.lang.String |
EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME |
static java.lang.String |
IMPUTE_ZEROS_LONG_NAME |
static java.lang.String |
MAXIMUM_CHUNK_SIZE |
static java.lang.String |
MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME |
static java.lang.String |
MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME |
static java.lang.String |
MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME |
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
CreateReadCountPanelOfNormals() |
Modifier and Type | Method and Description |
---|---|
protected void |
runPipeline(org.apache.spark.api.java.JavaSparkContext ctx)
Runs the pipeline.
|
afterPipeline, doWork, getProgramName
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getPluginDescriptors, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final java.lang.String MINIMUM_INTERVAL_MEDIAN_PERCENTILE_LONG_NAME
public static final java.lang.String MAXIMUM_ZEROS_IN_SAMPLE_PERCENTAGE_LONG_NAME
public static final java.lang.String MAXIMUM_ZEROS_IN_INTERVAL_PERCENTAGE_LONG_NAME
public static final java.lang.String EXTREME_SAMPLE_MEDIAN_PERCENTILE_LONG_NAME
public static final java.lang.String IMPUTE_ZEROS_LONG_NAME
public static final java.lang.String EXTREME_OUTLIER_TRUNCATION_PERCENTILE_LONG_NAME
public static final java.lang.String MAXIMUM_CHUNK_SIZE
protected void runPipeline(org.apache.spark.api.java.JavaSparkContext ctx)
SparkCommandLineProgram
runPipeline
in class SparkCommandLineProgram