Class CollectSVEvidence

All Implemented Interfaces:
org.broadinstitute.barclay.argparser.CommandLinePluginProvider

@BetaFeature @DocumentedFeature public class CollectSVEvidence extends ReadWalker
Creates discordant read pair, split read evidence, site depth, and read depth files for use in the GATK-SV pipeline. This tool emulates the functionality of the "svtk collect-pesr" used in v1 of the GATK-SV pipeline. The first output file, which should be named "*.pe.txt" or "*.pe.txt.gz" is a tab-delimited file containing information on discordant read pairs in the input cram, with the following columns:
  • read contig
  • read start
  • read strand
  • mate contig
  • mate start
  • mate strand
  • sample name
Only one record is emitted for each discordant read pair, at the read in the pair with the "upstream" start position according to the sequence dictionary contig ordering and coordinate. The second output file, which should be named "*.sr.txt" or "*.sr.txt.gz" contains the locations of all split read clippings in the input bam or cram, with the following columns:
  • contig
  • clipping position
  • direction: side of the read that was clipped (either "left" or "right")
  • count: the number of reads clipped at this location in this direction
  • sample name
The third output file, which should be named "*.sd.txt" or "*.sd.txt.gz" specifies site depth counts: For each locus specified in an input VCF as a simple, biallelic SNP, it gives a count, for each base call, of the number of reads that cover that locus. It has the following columns:
  • contig
  • position
  • sampleName
  • A observations
  • C observations
  • G observations
  • T observations
The fourth output file, which should be named "*.rd.txt" or "*.rd.txt.gz" specifies read depths: For each interval specified by a 3-column, tab delimited input file, the number of reads that start in that interval are reported. It has the following columns:
  • contig
  • starting position
  • ending position
  • read count
Note: when only collecting RD evidence, users should consider providing the same interval list with -L as --depth-evidence-intervals in order to avoid processing unused reads outside the intervals. Each of these output files may also be written as a block-compressed interval file, rather than as a tab-delimited text file by specifying an output file name that ends with ".bci" rather than ".txt". These files are self-indexing, and contain complete header information including sample name(s) and a dictionary for the contigs.
  • Field Details

    • PAIRED_END_FILE_ARGUMENT_SHORT_NAME

      public static final String PAIRED_END_FILE_ARGUMENT_SHORT_NAME
      See Also:
    • PAIRED_END_FILE_ARGUMENT_LONG_NAME

      public static final String PAIRED_END_FILE_ARGUMENT_LONG_NAME
      See Also:
    • SPLIT_READ_FILE_ARGUMENT_SHORT_NAME

      public static final String SPLIT_READ_FILE_ARGUMENT_SHORT_NAME
      See Also:
    • SPLIT_READ_FILE_ARGUMENT_LONG_NAME

      public static final String SPLIT_READ_FILE_ARGUMENT_LONG_NAME
      See Also:
    • SITE_DEPTH_OUTPUT_ARGUMENT_SHORT_NAME

      public static final String SITE_DEPTH_OUTPUT_ARGUMENT_SHORT_NAME
      See Also:
    • SITE_DEPTH_OUTPUT_ARGUMENT_LONG_NAME

      public static final String SITE_DEPTH_OUTPUT_ARGUMENT_LONG_NAME
      See Also:
    • SITE_DEPTH_INPUT_ARGUMENT_SHORT_NAME

      public static final String SITE_DEPTH_INPUT_ARGUMENT_SHORT_NAME
      See Also:
    • SITE_DEPTH_INPUT_ARGUMENT_LONG_NAME

      public static final String SITE_DEPTH_INPUT_ARGUMENT_LONG_NAME
      See Also:
    • DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_SHORT_NAME

      public static final String DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_SHORT_NAME
      See Also:
    • DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_LONG_NAME

      public static final String DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_LONG_NAME
      See Also:
    • DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_SHORT_NAME

      public static final String DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_SHORT_NAME
      See Also:
    • DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_LONG_NAME

      public static final String DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_LONG_NAME
      See Also:
    • DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_SHORT_NAME

      public static final String DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_SHORT_NAME
      See Also:
    • DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_LONG_NAME

      public static final String DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_LONG_NAME
      See Also:
    • MIN_DEPTH_EVIDENCE_MAPQ_ARGUMENT_NAME

      public static final String MIN_DEPTH_EVIDENCE_MAPQ_ARGUMENT_NAME
      See Also:
    • MIN_SITE_DEPTH_MAPQ_ARGUMENT_NAME

      public static final String MIN_SITE_DEPTH_MAPQ_ARGUMENT_NAME
      See Also:
    • MIN_SITE_DEPTH_BASEQ_ARGUMENT_NAME

      public static final String MIN_SITE_DEPTH_BASEQ_ARGUMENT_NAME
      See Also:
    • SAMPLE_NAME_ARGUMENT_LONG_NAME

      public static final String SAMPLE_NAME_ARGUMENT_LONG_NAME
      See Also:
    • COMPRESSION_LEVEL_ARGUMENT_LONG_NAME

      public static final String COMPRESSION_LEVEL_ARGUMENT_LONG_NAME
      See Also:
    • peFile

      @Argument(shortName="PE", fullName="pe-file", doc="Output file for paired end evidence", optional=true) public GATKPath peFile
    • srFile

      @Argument(shortName="SR", fullName="sr-file", doc="Output file for split read evidence", optional=true) public GATKPath srFile
    • siteDepthOutputFilename

      @Argument(shortName="SD", fullName="sd-file", doc="Output file for site depth counts", optional=true) public GATKPath siteDepthOutputFilename
    • siteDepthInputFilename

      @Argument(shortName="F", fullName="site-depth-locs-vcf", doc="Input VCF of SNPs marking loci for site depth counts", optional=true) public GATKPath siteDepthInputFilename
    • depthEvidenceOutputFilename

      @Argument(shortName="RD", fullName="depth-evidence-file", doc="Output file for depth evidence", optional=true) public GATKPath depthEvidenceOutputFilename
    • depthEvidenceSummaryFilename

      @Argument(shortName="DS", fullName="depth-summary-file", doc="Output file for depth evidence summary statistics", optional=true) public GATKPath depthEvidenceSummaryFilename
    • depthEvidenceInputFilename

      @Argument(shortName="DI", fullName="depth-evidence-intervals", doc="Input feature file specifying intervals where depth evidence will be gathered", optional=true) public GATKPath depthEvidenceInputFilename
    • minDepthEvidenceMapQ

      @Argument(fullName="depth-evidence-min-mapq", doc="minimum mapping quality for read to be counted as depth evidence", optional=true) public int minDepthEvidenceMapQ
    • minMapQ

      @Argument(fullName="site-depth-min-mapq", doc="minimum mapping quality for read to be counted toward site depth", optional=true) public int minMapQ
    • minQ

      @Argument(fullName="site-depth-min-baseq", doc="minimum base call quality for SNP to be counted toward site depth", optional=true) public int minQ
  • Constructor Details

    • CollectSVEvidence

      public CollectSVEvidence()
  • Method Details

    • requiresReads

      public boolean requiresReads()
      Description copied from class: GATKTool
      Does this tool require reads? Traversals types and/or tools that do should override to return true.
      Overrides:
      requiresReads in class ReadWalker
      Returns:
      true if this tool requires reads, otherwise false
    • onTraversalStart

      public void onTraversalStart()
      Description copied from class: GATKTool
      Operations performed just prior to the start of traversal. Should be overridden by tool authors who need to process arguments local to their tool or perform other kinds of local initialization. Default implementation does nothing.
      Overrides:
      onTraversalStart in class GATKTool
    • getDefaultReadFilters

      public List<ReadFilter> getDefaultReadFilters()
      Description copied from class: ReadWalker
      Returns the default list of CommandLineReadFilters that are used for this tool. The filters returned by this method are subject to selective enabling/disabling by the user via the command line. The default implementation uses the WellformedReadFilter filter with all default options. Subclasses can override to provide alternative filters. Note: this method is called before command line parsing begins, and thus before a SAMFileHeader is available through {link #getHeaderForReads}.
      Overrides:
      getDefaultReadFilters in class ReadWalker
      Returns:
      List of individual filters to be applied for this tool.
    • apply

      public void apply(GATKRead read, ReferenceContext referenceContext, FeatureContext featureContext)
      Description copied from class: ReadWalker
      Process an individual read (with optional contextual information). Must be implemented by tool authors. In general, tool authors should simply stream their output from apply(), and maintain as little internal state as possible. TODO: Determine whether and to what degree the GATK engine should provide a reduce operation TODO: to complement this operation. At a minimum, we should make apply() return a value to TODO: discourage statefulness in walkers, but how this value should be handled is TBD.
      Specified by:
      apply in class ReadWalker
      Parameters:
      read - current read
      referenceContext - Reference bases spanning the current read. Will be an empty, but non-null, context object if there is no backing source of reference data (in which case all queries on it will return an empty array/iterator). Can request extra bases of context around the current read's interval by invoking ReferenceContext.setWindow(int, int) on this object before calling ReferenceContext.getBases()
      featureContext - Features spanning the current read. Will be an empty, but non-null, context object if there is no backing source of Feature data (in which case all queries on it will return an empty List).
    • getReportableDiscordantReadPair

      public org.broadinstitute.hellbender.tools.walkers.sv.CollectSVEvidence.DiscordantRead getReportableDiscordantReadPair(GATKRead read, Set<String> observedDiscordantNamesAtThisLocus, htsjdk.samtools.SAMSequenceDictionary samSequenceDictionary)
    • countSplitRead

      public void countSplitRead(GATKRead read, PriorityQueue<org.broadinstitute.hellbender.tools.walkers.sv.CollectSVEvidence.SplitPos> splitCounts, FeatureSink<SplitReadEvidence> srWriter)
      Adds split read information about the current read to the counts in splitCounts. Flushes split read counts to srWriter if necessary.
    • onTraversalSuccess

      public Object onTraversalSuccess()
      Description copied from class: GATKTool
      Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal). Should be overridden by tool authors who need to close local resources, etc., after traversal. Also allows tools to return a value representing the traversal result, which is printed by the engine. Default implementation does nothing and returns null.
      Overrides:
      onTraversalSuccess in class GATKTool
      Returns:
      Object representing the traversal result, or null if a tool does not return a value
    • closeTool

      public void closeTool()
      Description copied from class: GATKTool
      This method is called by the GATK framework at the end of the GATKTool.doWork() template method. It is called regardless of whether the GATKTool.traverse() has succeeded or not. It is called after the GATKTool.onTraversalSuccess() has completed (successfully or not) but before the GATKTool.doWork() method returns. In other words, on successful runs both GATKTool.onTraversalSuccess() and GATKTool.closeTool() will be called (in this order) while on failed runs (when GATKTool.traverse() causes an exception), only GATKTool.closeTool() will be called. The default implementation does nothing. Subclasses should override this method to close any resources that must be closed regardless of the success of traversal.
      Overrides:
      closeTool in class GATKTool