@DocumentedFeature public final class Mutect2 extends AssemblyRegionWalker
Call somatic short mutations via local assembly of haplotypes. Short mutations include single nucleotide (SNA) and insertion and deletion (indel) alterations. The caller uses a Bayesian somatic genotyping model that differs from the original MuTect by Cibulskis et al., 2013 and uses the assembly-based machinery of HaplotypeCaller. Of note, Mutect2 v4.1.0.0 onwards enables joint analysis of multiple samples.
This tool is featured in the Somatic Short Mutation calling Best Practice Workflow. See Tutorial#11136 for a step-by-step description of the workflow and Article#11127 for an overview of what traditional somatic calling entails. For the latest pipeline scripts, see the Mutect2 WDL scripts directory. For pipelines with example data, see the gatk-workflows repository. Although we present the tool for somatic calling, it may apply to other contexts, such as mitochondrial variant calling and detection of somatic mosaicism.
Starting with v4.1.0.0 Mutect2 accomodates extreme high depths, e.g. 20,000X. See the following articles for details on this and additional applications.
Example commands show how to run Mutect2 for typical scenarios. The three modes are (i) tumor-normal mode where a tumor sample is matched with a normal sample in analysis, (ii) tumor-only mode where a single sample's alignment data undergoes analysis, and (iii) mitochondrial mode where sensitive calling at high depths is desirable.
Given a matched normal, Mutect2 is designed to call somatic variants only. The tool includes logic to skip emitting variants that are clearly present in the germline based on provided evidence, e.g. in the matched normal. This is done at an early stage to avoid spending computational resources on germline events. If the variant's germline status is borderline, then Mutect2 will emit the variant to the callset for subsequent filtering by FilterMutectCalls and review.
gatk Mutect2 \ -R reference.fa \ -I tumor.bam \ -I normal.bam \ -normal normal_sample_name \ --germline-resource af-only-gnomad.vcf.gz \ --panel-of-normals pon.vcf.gz \ -O somatic.vcf.gz
Mutect2 also generates a stats file names [output vcf].stats. That is, in the above example the stats file would be named somatic.vcf.gz.stats and would be in the same folder as somatic.vcf.gz. As of GATK 4.1.1 this file is a required input to FilterMutectCalls.
As of v4.1 Mutect2 supports joint calling of multiple tumor and normal samples from the same individual. The
only difference is that -I and
gatk Mutect2 \ -R reference.fa \ -I tumor1.bam \ -I tumor2.bam \ -I normal1.bam \ -I normal2.bam \ -normal normal1_sample_name \ -normal normal2_sample_name \ --germline-resource af-only-gnomad.vcf.gz \ --panel-of-normals pon.vcf.gz \ -O somatic.vcf.gz
This mode runs on a single type of sample, e.g. the tumor or the normal. To create a PoN, call on each normal sample in this mode, then use CreateSomaticPanelOfNormals to generate the PoN.
gatk Mutect2 \ -R reference.fa \ -I sample.bam \ -O single_sample.vcf.gz
To call mutations on a tumor sample, call in this mode using a PoN and germline resource. After FilterMutectCalls filtering, consider additional filtering by functional significance with Funcotator.
gatk Mutect2 \ -R reference.fa \ -I sample.bam \ --germline-resource af-only-gnomad.vcf.gz \ --panel-of-normals pon.vcf.gz \ -O single_sample.vcf.gz
Mutect2 automatically sets parameters appropriately for calling on mitochondria with the
gatk Mutect2 \ -R reference.fa \ -L chrM \ --mitochondria \ -I mitochondria.bam \ -O mitochondria.vcf.gz
The mode accepts only a single sample, which can be provided in multiple files.
This mode force-calls all alleles in force-call-alleles.vcf in addition to any other variants Mutect2 discovers.
gatk Mutect2 \ -R reference.fa \ -I sample.bam \ -alleles force-call-alleles.vcf -O single_sample.vcf.gz
If the sample is suspected to exhibit orientation bias artifacts (such as in the case of FFPE tumor samples) one should also collect F1R2 metrics by adding an --f1r2-tar-gz argument as shown below. This file contains information that can then be passed to LearnReadOrientationModel, which generate an artifact prior table for each tumor sample for FilterMutectCalls to use.
gatk Mutect2 \ -R reference.fa \ -I sample.bam \ --f1r2-tar-gz f1r2.tar.gz \ -O single_sample.vcf.gz
#CHROM POS ID REF ALT QUAL FILTER INFO 1 10067 . T TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC 30.35 PASS AC=3;AF=7.384E-5 1 10108 . CAACCCT C 46514.32 PASS AC=6;AF=1.525E-4 1 10109 . AACCCTAACCCT AAACCCT,* 89837.27 PASS AC=48,5;AF=0.001223,1.273E-4 1 10114 . TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA *,CAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA,T 36728.97 PASS AC=55,9,1;AF=0.001373,2.246E-4,2.496E-5 1 10119 . CT C,* 251.23 PASS AC=5,1;AF=1.249E-4,2.498E-5 1 10120 . TA CA,* 14928.74 PASS AC=10,6;AF=2.5E-4,1.5E-4 1 10128 . ACCCTAACCCTAACCCTAAC A,* 285.71 PASS AC=3,1;AF=7.58E-5,2.527E-5 1 10131 . CT C,* 378.93 PASS AC=7,5;AF=1.765E-4,1.261E-4 1 10132 . TAACCC *,T 18025.11 PASS AC=12,2;AF=3.03E-4,5.049E-5
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_STATS_EXTENSION |
protected M2ArgumentCollection |
MTAC |
static java.lang.String |
MUTECT_STATS_SHORT_NAME |
java.io.File |
outputVCF |
activeProbThreshold, activityProfileOut, ASSEMBLY_PADDING_LONG_NAME, ASSEMBLY_REGION_OUT_LONG_NAME, assemblyRegionOut, assemblyRegionPadding, FORCE_ACTIVE_REGIONS_LONG_NAME, forceActive, MAX_ASSEMBLY_LONG_NAME, MAX_STARTS_LONG_NAME, maxAssemblyRegionSize, maxProbPropagationDistance, maxReadsPerAlignmentStart, MIN_ASSEMBLY_LONG_NAME, minAssemblyRegionSize, PROFILE_OUT_LONG_NAME, PROPAGATION_LONG_NAME, THRESHOLD_LONG_NAME
addOutputSAMProgramRecord, addOutputVCFCommandLine, cloudIndexPrefetchBuffer, cloudPrefetchBuffer, createOutputBamIndex, createOutputBamMD5, createOutputVariantIndex, createOutputVariantMD5, disableBamIndexCaching, features, intervalArgumentCollection, lenientVCFProcessing, outputSitesOnlyVCFs, progressMeter, readArguments, referenceArguments, SECONDS_BETWEEN_PROGRESS_UPDATES_NAME, seqValidationArguments
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
Mutect2() |
Modifier and Type | Method and Description |
---|---|
void |
apply(AssemblyRegion region,
ReferenceContext referenceContext,
FeatureContext featureContext)
Process an individual AssemblyRegion.
|
AssemblyRegionEvaluator |
assemblyRegionEvaluator() |
void |
closeTool()
This method is called by the GATK framework at the end of the
GATKTool.doWork() template method. |
protected ReadsDownsampler |
createDownsampler() |
protected double |
defaultActiveProbThreshold() |
protected int |
defaultAssemblyRegionPadding() |
protected int |
defaultMaxAssemblyRegionSize() |
protected int |
defaultMaxProbPropagationDistance() |
protected int |
defaultMaxReadsPerAlignmentStart() |
protected int |
defaultMinAssemblyRegionSize() |
java.util.List<ReadFilter> |
getDefaultReadFilters()
Returns the default list of CommandLineReadFilters that are used for this tool.
|
java.util.List<java.lang.Class<? extends Annotation>> |
getDefaultVariantAnnotationGroups()
Returns the default list of annotation groups that are used for this tool.
|
protected boolean |
includeReadsWithDeletionsInIsActivePileups() |
ReadTransformer |
makePostReadFilterTransformer()
Returns the post-filter read transformer (simple or composite) that will be applied to the reads after filtering.
|
java.util.Collection<Annotation> |
makeVariantAnnotations()
Returns a list of annotations that can be applied to VariantContexts.
|
void |
onTraversalStart()
Operations performed just prior to the start of traversal.
|
java.lang.Object |
onTraversalSuccess()
Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal).
|
boolean |
useVariantAnnotations()
Must be overridden in order to add annotation arguments to the engine.
|
getProgressMeterRecordLabel, onShutdown, onStartup, requiresReads, requiresReference, traverse
directlyAccessEngineFeatureManager, directlyAccessEngineReadsDataSource, directlyAccessEngineReferenceDataSource
addFeatureInputsAfterInitialization, createSAMWriter, createSAMWriter, createVCFWriter, createVCFWriter, doWork, getBestAvailableSequenceDictionary, getDefaultCloudIndexPrefetchBufferSize, getDefaultCloudPrefetchBufferSize, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotations, getGenomicsDBOptions, getHeaderForFeatures, getHeaderForReads, getHeaderForSAMWriter, getMasterSequenceDictionary, getPluginDescriptors, getReferenceDictionary, getSequenceDictionaryValidationArgumentCollection, getToolName, getTransformedReadStream, getTraversalIntervals, hasFeatures, hasReads, hasReference, hasUserSuppliedIntervals, makePreReadFilterTransformer, makeReadFilter, requiresFeatures, requiresIntervals, transformTraversalIntervals
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final java.lang.String MUTECT_STATS_SHORT_NAME
public static final java.lang.String DEFAULT_STATS_EXTENSION
@ArgumentCollection protected M2ArgumentCollection MTAC
@Argument(fullName="output", shortName="O", doc="File to which variants should be written") public java.io.File outputVCF
protected int defaultMinAssemblyRegionSize()
defaultMinAssemblyRegionSize
in class AssemblyRegionWalker
AssemblyRegionWalker.minAssemblyRegionSize
parameter, if none is provided on the command lineprotected int defaultMaxAssemblyRegionSize()
defaultMaxAssemblyRegionSize
in class AssemblyRegionWalker
AssemblyRegionWalker.maxAssemblyRegionSize
parameter, if none is provided on the command lineprotected int defaultAssemblyRegionPadding()
defaultAssemblyRegionPadding
in class AssemblyRegionWalker
AssemblyRegionWalker.assemblyRegionPadding
parameter, if none is provided on the command lineprotected int defaultMaxReadsPerAlignmentStart()
defaultMaxReadsPerAlignmentStart
in class AssemblyRegionWalker
AssemblyRegionWalker.maxReadsPerAlignmentStart
parameter, if none is provided on the command lineprotected double defaultActiveProbThreshold()
defaultActiveProbThreshold
in class AssemblyRegionWalker
AssemblyRegionWalker.activeProbThreshold
parameter, if none is provided on the command lineprotected int defaultMaxProbPropagationDistance()
defaultMaxProbPropagationDistance
in class AssemblyRegionWalker
AssemblyRegionWalker.maxProbPropagationDistance
parameter, if none is provided on the command lineprotected boolean includeReadsWithDeletionsInIsActivePileups()
includeReadsWithDeletionsInIsActivePileups
in class AssemblyRegionWalker
public boolean useVariantAnnotations()
GATKTool
Annotation
s in the package defined by org.broadinstitute.hellbender.cmdline.GATKPlugin.GATKAnnotationPluginDescriptor#pluginPackageName
and automatically
generate and add command line arguments allowing the user to specify which annotations or groups of annotations to use.
To specify default annotations for a tool simply specify them using GATKTool.getDefaultVariantAnnotationGroups()
or GATKTool.getDefaultVariantAnnotations()
To access instantiated annotation objects simply use GATKTool.makeVariantAnnotations()
.useVariantAnnotations
in class GATKTool
public java.util.List<ReadFilter> getDefaultReadFilters()
AssemblyRegionWalker
WellformedReadFilter
filter with all default options, as well as the ReadFilterLibrary.MappedReadFilter
.
Subclasses can override to provide alternative filters.
Note: this method is called before command line parsing begins, and thus before a SAMFileHeader is
available through {link #getHeaderForReads}.getDefaultReadFilters
in class AssemblyRegionWalker
public ReadTransformer makePostReadFilterTransformer()
GATKTool
ReadTransformer.identity()
.
Default implementation of GATKTool.traverse()
calls this method once before iterating over the reads and reuses
the transformer object to avoid object allocation.
Subclasses can extend to provide own transformers (ie override and call super).
Multiple transformers can be composed by using ReadTransformer
composition methods.makePostReadFilterTransformer
in class GATKTool
public java.util.List<java.lang.Class<? extends Annotation>> getDefaultVariantAnnotationGroups()
GATKTool
GATKTool.getDefaultVariantAnnotations()
. Returned annotation groups are subject to selective enabling/disabling
by the user via the command line. The default implementation returns an empty list.getDefaultVariantAnnotationGroups
in class GATKTool
protected ReadsDownsampler createDownsampler()
createDownsampler
in class AssemblyRegionWalker
public AssemblyRegionEvaluator assemblyRegionEvaluator()
assemblyRegionEvaluator
in class AssemblyRegionWalker
public void onTraversalStart()
GATKTool
onTraversalStart
in class GATKTool
public java.util.Collection<Annotation> makeVariantAnnotations()
GATKTool
GATKTool.getDefaultVariantAnnotations()
and GATKTool.getDefaultVariantAnnotationGroups()
)
along with any annotations command line directives specified by the user (such as enabling other annotations/groups
or disabling default annotations) and returns a collection of all the annotation arguments instantiated.
NOTE: Most tools will not need to override the method, and should only do so in order to provide custom
behavior or processing of the final annotations based on other command line input. To change the default
annotations used by the tool, override GATKTool.getDefaultVariantAnnotations()
instead.
To apply returned annotations to a VariantContext, simply use a VariantAnnotatorEngine
constructed with the discovered annotations.makeVariantAnnotations
in class GATKTool
public java.lang.Object onTraversalSuccess()
GATKTool
onTraversalSuccess
in class GATKTool
public void apply(AssemblyRegion region, ReferenceContext referenceContext, FeatureContext featureContext)
AssemblyRegionWalker
AssemblyRegionWalker.assemblyRegionEvaluator()
. This method will be called once for each active AND inactive region,
and it is up to the implementation how to handle/process active vs. inactive regions.apply
in class AssemblyRegionWalker
region
- region to process (pre-marked as either active or inactive)referenceContext
- reference data overlapping the full extended span of the assembly regionfeatureContext
- features overlapping the full extended span of the assembly regionpublic void closeTool()
GATKTool
GATKTool.doWork()
template method.
It is called regardless of whether the GATKTool.traverse()
has succeeded or not.
It is called after the GATKTool.onTraversalSuccess()
has completed (successfully or not)
but before the GATKTool.doWork()
method returns.
In other words, on successful runs both GATKTool.onTraversalSuccess()
and GATKTool.closeTool()
will be called (in this order) while
on failed runs (when GATKTool.traverse()
causes an exception), only GATKTool.closeTool()
will be called.
The default implementation does nothing.
Subclasses should override this method to close any resources that must be closed regardless of the success of traversal.