Class CreateSomaticPanelOfNormals

All Implemented Interfaces:
org.broadinstitute.barclay.argparser.CommandLinePluginProvider

@DocumentedFeature @BetaFeature public class CreateSomaticPanelOfNormals extends VariantWalker
Create a panel of normals (PoN) containing germline and artifactual sites for use with Mutect2.

The tool takes multiple normal sample callsets produced by Mutect2's tumor-only mode and collates sites present in multiple samples (two by default, set by the --min-sample-count argument) into a sites-only VCF. The PoN captures common artifacts. Mutect2 then uses the PoN to filter variants at the site-level. The --max-germline-probability argument sets the threshold for possible germline variants to be included in the PoN. By default this is set to 0.5, so that likely germline events are excluded. This is usually the correct behavior as germline variants are best handled by probabilistic modeling via Mutect2's --germline-resource argument. A germline resource, such as gnomAD in the case of humans, is a much more refined tool for germline filtering than any PoN could be.

This tool is featured in the Somatic Short Mutation calling Best Practice Workflow. See Tutorial#11136 for a step-by-step description of the workflow and Article#11127 for an overview of what traditional somatic calling entails. For the latest pipeline scripts, see the Mutect2 WDL scripts directory.

Example workflow

Step 1. Run Mutect2 in tumor-only mode for each normal sample.

Note that as of May, 2019 -max-mnp-distance must be set to zero to avoid a bug in GenomicsDBImport.

 gatk Mutect2 -R reference.fasta -I normal1.bam -max-mnp-distance 0 -O normal1.vcf.gz
 

Step 2. Create a GenomicsDB from the normal Mutect2 calls.

    gatk GenomicsDBImport -R reference.fasta -L intervals.interval_list \
       --genomicsdb-workspace-path pon_db \
       -V normal1.vcf.gz \
       -V normal2.vcf.gz \
       -V normal3.vcf.gz
  

Step 3. Combine the normal calls using CreateSomaticPanelOfNormals.

 gatk CreateSomaticPanelOfNormals -R reference.fasta -V gendb://pon_db -O pon.vcf.gz
 
  • Field Details

    • MIN_SAMPLE_COUNT_LONG_NAME

      public static final String MIN_SAMPLE_COUNT_LONG_NAME
      See Also:
    • DEFAULT_MIN_SAMPLE_COUNT

      public static final int DEFAULT_MIN_SAMPLE_COUNT
      See Also:
    • MAX_GERMLINE_PROBABILITY_LONG_NAME

      public static final String MAX_GERMLINE_PROBABILITY_LONG_NAME
      See Also:
    • DEFAULT_MAX_GERMLINE_PROBABILITY

      public static final double DEFAULT_MAX_GERMLINE_PROBABILITY
      See Also:
    • FRACTION_INFO_FIELD

      public static final String FRACTION_INFO_FIELD
      See Also:
    • BETA_SHAPE_INFO_FIELD

      public static final String BETA_SHAPE_INFO_FIELD
      See Also:
    • germlineResource

      @Argument(fullName="germline-resource", doc="Population vcf of germline sequencing containing allele fractions.", optional=true) public FeatureInput<htsjdk.variant.variantcontext.VariantContext> germlineResource
      A resource, such as gnomAD, containing population allele frequencies of common and rare variants. We use this to remove germline variants from the panel of normals, keeping only technical artifacts
    • maxGermlineProbability

      @Argument(fullName="max-germline-probability", doc="Skip genotypes with germline probability greater than this value", optional=true) public double maxGermlineProbability
  • Constructor Details

    • CreateSomaticPanelOfNormals

      public CreateSomaticPanelOfNormals()
  • Method Details

    • getGenomicsDBOptions

      protected GenomicsDBOptions getGenomicsDBOptions()
      Description copied from class: GATKTool
      Get the GenomicsDB read settings for the current tool
      Overrides:
      getGenomicsDBOptions in class VariantWalkerBase
      Returns:
      By default, just return the vanilla options
    • onTraversalStart

      public void onTraversalStart()
      Description copied from class: GATKTool
      Operations performed just prior to the start of traversal. Should be overridden by tool authors who need to process arguments local to their tool or perform other kinds of local initialization. Default implementation does nothing.
      Overrides:
      onTraversalStart in class GATKTool
    • apply

      public void apply(htsjdk.variant.variantcontext.VariantContext vc, ReadsContext rc, ReferenceContext ref, FeatureContext fc)
      Description copied from class: VariantWalker
      Process an individual variant. Must be implemented by tool authors. In general, tool authors should simply stream their output from apply(), and maintain as little internal state as possible.
      Specified by:
      apply in class VariantWalker
      Parameters:
      vc - Current variant being processed.
      rc - Reads overlapping the current variant. Will be an empty, but non-null, context object if there is no backing source of reads data (in which case all queries on it will return an empty array/iterator)
      ref - Reference bases spanning the current variant. Will be an empty, but non-null, context object if there is no backing source of reference data (in which case all queries on it will return an empty array/iterator). Can request extra bases of context around the current variant's interval by invoking ReferenceContext.setWindow(int, int) on this object before calling ReferenceContext.getBases()
      fc - Features spanning the current variant. Will be an empty, but non-null, context object if there is no backing source of Feature data (in which case all queries on it will return an empty List).
    • onTraversalSuccess

      public Object onTraversalSuccess()
      Description copied from class: GATKTool
      Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal). Should be overridden by tool authors who need to close local resources, etc., after traversal. Also allows tools to return a value representing the traversal result, which is printed by the engine. Default implementation does nothing and returns null.
      Overrides:
      onTraversalSuccess in class GATKTool
      Returns:
      Object representing the traversal result, or null if a tool does not return a value
    • closeTool

      public void closeTool()
      Description copied from class: GATKTool
      This method is called by the GATK framework at the end of the GATKTool.doWork() template method. It is called regardless of whether the GATKTool.traverse() has succeeded or not. It is called after the GATKTool.onTraversalSuccess() has completed (successfully or not) but before the GATKTool.doWork() method returns. In other words, on successful runs both GATKTool.onTraversalSuccess() and GATKTool.closeTool() will be called (in this order) while on failed runs (when GATKTool.traverse() causes an exception), only GATKTool.closeTool() will be called. The default implementation does nothing. Subclasses should override this method to close any resources that must be closed regardless of the success of traversal.
      Overrides:
      closeTool in class GATKTool