All Implemented Interfaces:
org.broadinstitute.barclay.argparser.CommandLinePluginProvider
Direct Known Subclasses:
AlleleFrequencyQC

@DocumentedFeature @BetaFeature public class VariantEval extends MultiVariantWalkerGroupedOnStart

Given a variant callset, it is common to calculate various quality control metrics. These metrics include the number of raw or filtered SNP counts; ratio of transition mutations to transversions; concordance of a particular sample's calls to a genotyping chip; number of s per sample; etc. Furthermore, it is often useful to stratify these metrics by various criteria like functional class (missense, nonsense, silent), whether the site is CpG site, the amino acid degeneracy of the site, etc. VariantEval facilitates these calculations in two ways: by providing several built-in evaluation and stratification modules, and by providing a framework that permits the easy development of new evaluation and stratification modules.

Input

One or more variant sets to evaluate plus any number of comparison sets.

Output

Evaluation tables detailing the results of the eval modules which were applied. For example:

 output.eval.grp:
 ##:GATKReport.v0.1 CountVariants : Counts different classes of variants in the sample
 CountVariants  CompFeatureInput  CpG      EvalFeatureInput  JexlExpression  Novelty  nProcessedLoci  nCalledLoci  nRefLoci  nVariantLoci  variantRate ...
 CountVariants  dbsnp             CpG      eval              none            all      65900028        135770       0         135770        0.00206024  ...
 CountVariants  dbsnp             CpG      eval              none            known    65900028        47068        0         47068         0.00071423  ...
 CountVariants  dbsnp             CpG      eval              none            novel    65900028        88702        0         88702         0.00134601  ...
 CountVariants  dbsnp             all      eval              none            all      65900028        330818       0         330818        0.00502000  ...
 CountVariants  dbsnp             all      eval              none            known    65900028        120685       0         120685        0.00183133  ...
 CountVariants  dbsnp             all      eval              none            novel    65900028        210133       0         210133        0.00318866  ...
 CountVariants  dbsnp             non_CpG  eval              none            all      65900028        195048       0         195048        0.00295976  ...
 CountVariants  dbsnp             non_CpG  eval              none            known    65900028        73617        0         73617         0.00111710  ...
 CountVariants  dbsnp             non_CpG  eval              none            novel    65900028        121431       0         121431        0.00184265  ...
 ...
 

Usage examples

 gatk VariantEval \
   -R reference.fasta \
   -O output.eval.grp \
   --eval set1:set1.vcf \
   --eval set2:set2.vcf \
   [--comp comp.vcf]
 
Count Mendelian violations for each family in a callset with multiple families (and provided pedigree)
 gatk VariantEval \
   -R reference.fasta \
   -O output.MVs.byFamily.table \
   --eval multiFamilyCallset.vcf \
   -no-ev -noST \
   -ST Family \
   -EV MendelianViolationEvaluator
 

Caveat

Some stratifications and evaluators are incompatible with each other due to their respective memory requirements, such as AlleleCount and VariantSummary, or Sample and VariantSummary. If you specify such a combination, the program will output an error message and ask you to disable one of these options. We do not currently provide an exhaustive list of incompatible combinations, so we recommend trying out combinations that you are interested in on a dummy command line, to rapidly ascertain whether it will work or not.

  • Field Details

    • engine

      protected VariantEvalEngine engine
    • outFile

      @Argument(fullName="output", shortName="O", doc="File to which variants should be written") protected File outFile
    • variantEvalArgs

      @ArgumentCollection protected VariantEvalArgumentCollection variantEvalArgs
    • LIST

      @Argument(fullName="list", shortName="ls", doc="List the available eval modules and exit", optional=true) protected Boolean LIST
      Note that the --list argument requires a fully resolved and correct command-line to work.
  • Constructor Details

    • VariantEval

      public VariantEval()
  • Method Details

    • getMultiVariantInputArgumentCollection

      protected MultiVariantInputArgumentCollection getMultiVariantInputArgumentCollection()
      Description copied from class: MultiVariantWalker
      Return an argument collection that provides the driving variants. This allows subclasses to override and use a different argument pattern besides the default -V
      Overrides:
      getMultiVariantInputArgumentCollection in class MultiVariantWalker
    • initializeDrivingVariants

      protected void initializeDrivingVariants()
      Description copied from class: VariantWalkerBase
      Process the feature inputs that represent the primary driving source(s) of variants for this tool, and perform any necessary header and sequence dictionary validation. Called by the framework during feature initialization.
      Overrides:
      initializeDrivingVariants in class MultiVariantWalker
    • onTraversalStart

      public void onTraversalStart()
      Description copied from class: GATKTool
      Operations performed just prior to the start of traversal. Should be overridden by tool authors who need to process arguments local to their tool or perform other kinds of local initialization. Default implementation does nothing.
      Overrides:
      onTraversalStart in class GATKTool
    • listModulesAndExit

      public void listModulesAndExit()
      List all of the available evaluation modules, then exit successfully
    • apply

      public void apply(List<htsjdk.variant.variantcontext.VariantContext> variantContexts, ReferenceContext referenceContext, List<ReadsContext> readsContexts)
      Description copied from class: MultiVariantWalkerGroupedOnStart
      This method must be implemented by tool authors. This is the primary traversal for any MultiVariantWalkerGroupedOnStart walkers. Will traverse over input variant contexts and call #apply() exactly once for each unique reference start position. All variants starting at each locus across source files will be grouped and passed as a list of VariantContext objects.
      Specified by:
      apply in class MultiVariantWalkerGroupedOnStart
      Parameters:
      variantContexts - VariantContexts from driving variants with matching start position NOTE: This will never be empty
      referenceContext - ReferenceContext object covering the reference of the longest spanning VariantContext
    • onTraversalSuccess

      public Object onTraversalSuccess()
      Description copied from class: GATKTool
      Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal). Should be overridden by tool authors who need to close local resources, etc., after traversal. Also allows tools to return a value representing the traversal result, which is printed by the engine. Default implementation does nothing and returns null.
      Overrides:
      onTraversalSuccess in class GATKTool
      Returns:
      Object representing the traversal result, or null if a tool does not return a value
    • getPluginDescriptors

      public List<? extends org.broadinstitute.barclay.argparser.CommandLinePluginDescriptor<?>> getPluginDescriptors()
      Description copied from class: GATKTool
      Return the list of GATKCommandLinePluginDescriptors to be used for this tool. Uses the read filter plugin.
      Specified by:
      getPluginDescriptors in interface org.broadinstitute.barclay.argparser.CommandLinePluginProvider
      Overrides:
      getPluginDescriptors in class GATKTool