Class UpdateVCFSequenceDictionary

All Implemented Interfaces:
org.broadinstitute.barclay.argparser.CommandLinePluginProvider

@DocumentedFeature public final class UpdateVCFSequenceDictionary extends VariantWalker
Updates the reference contigs in the header of the VCF format file, i.e. the reference dictionary, using the dictionary from a variant, alignment, reference, or dictionary file.

This tool is designed to update the sequence dictionary in a variant file using a dictionary from another variant, alignment, dictionary, or reference file. The dictionary must be valid, i.e. must contain a sequence record, for all variants in the target file. The dictionary lines start with '##contig='.

By specifying both --replace and --disable-sequence-dictionary-validation, one can force replace an invalid sequence dictionary in a variant file with a valid sequence dictionary in another file.

Usage example

Use the contig dictionary from a BAM (SQ lines) to replace an existing dictionary in the header of a VCF.

 gatk UpdateVCFSequenceDictionary \
     -V cohort.vcf.gz \
     --source-dictionary sample.bam \
     --output cohort_replacedcontiglines.vcf.gz \
     --replace=true
 

Use a reference dictionary to add reference contig lines to a VCF without any.

 gatk UpdateVCFSequenceDictionary \
     -V resource.vcf.gz \
     --source-dictionary reference.dict \
     --output resource_newcontiglines.vcf.gz
 

Use the reference set to add contig lines to a VCF without any.

 gatk UpdateVCFSequenceDictionary \
     -V resource.vcf.gz \
     -R reference.fasta \
     --output resource_newcontiglines.vcf.gz
 

The -O argument specifies the name of the updated file. The --source-dictionary argument specifies the input sequence dictionary. The --replace argument is optional, and forces the replacement of the dictionary if the input file already has a dictionary.

  • Field Details

    • outFile

      @Argument(fullName="output", shortName="O", doc="File to which updated variants should be written") public GATKPath outFile
    • DICTIONARY_ARGUMENT_NAME

      public static final String DICTIONARY_ARGUMENT_NAME
      See Also:
    • REPLACE_ARGUMENT_NAME

      public static final String REPLACE_ARGUMENT_NAME
      See Also:
  • Constructor Details

    • UpdateVCFSequenceDictionary

      public UpdateVCFSequenceDictionary()
  • Method Details

    • onTraversalStart

      public void onTraversalStart()
      Description copied from class: GATKTool
      Operations performed just prior to the start of traversal. Should be overridden by tool authors who need to process arguments local to their tool or perform other kinds of local initialization. Default implementation does nothing.
      Overrides:
      onTraversalStart in class GATKTool
    • apply

      public void apply(htsjdk.variant.variantcontext.VariantContext vc, ReadsContext readsContext, ReferenceContext ref, FeatureContext featureContext)
      Description copied from class: VariantWalker
      Process an individual variant. Must be implemented by tool authors. In general, tool authors should simply stream their output from apply(), and maintain as little internal state as possible.
      Specified by:
      apply in class VariantWalker
      Parameters:
      vc - Current variant being processed.
      readsContext - Reads overlapping the current variant. Will be an empty, but non-null, context object if there is no backing source of reads data (in which case all queries on it will return an empty array/iterator)
      ref - Reference bases spanning the current variant. Will be an empty, but non-null, context object if there is no backing source of reference data (in which case all queries on it will return an empty array/iterator). Can request extra bases of context around the current variant's interval by invoking ReferenceContext.setWindow(int, int) on this object before calling ReferenceContext.getBases()
      featureContext - Features spanning the current variant. Will be an empty, but non-null, context object if there is no backing source of Feature data (in which case all queries on it will return an empty List).
    • closeTool

      public void closeTool()
      Close out the new variants file.
      Overrides:
      closeTool in class GATKTool
    • getBestAvailableSequenceDictionary

      public htsjdk.samtools.SAMSequenceDictionary getBestAvailableSequenceDictionary()
      Description copied from class: VariantWalkerBase
      Overriding the superclass method to preferentially choose the sequence dictionary from the driving source of variants.
      Overrides:
      getBestAvailableSequenceDictionary in class VariantWalkerBase
      Returns:
      best available sequence dictionary given our inputs or null if no one dictionary is the best one.