org.broadinstitute.hellbender.tools.walkers.sv.CollectSVEvidence

All Implemented Interfaces:: org.broadinstitute.barclay.argparser.CommandLinePluginProvider

@BetaFeature @DocumentedFeature public class CollectSVEvidence extends ReadWalker

Creates discordant read pair, split read evidence, site depth, and read depth files for use in the GATK-SV pipeline. This tool emulates the functionality of the "svtk collect-pesr" used in v1 of the GATK-SV pipeline. The first output file, which should be named "*.pe.txt" or "*.pe.txt.gz" is a tab-delimited file containing information on discordant read pairs in the input cram, with the following columns:

read contig
read start
read strand
mate contig
mate start
mate strand
sample name

Only one record is emitted for each discordant read pair, at the read in the pair with the "upstream" start position according to the sequence dictionary contig ordering and coordinate. The second output file, which should be named "*.sr.txt" or "*.sr.txt.gz" contains the locations of all split read clippings in the input bam or cram, with the following columns:

contig
clipping position
direction: side of the read that was clipped (either "left" or "right")
count: the number of reads clipped at this location in this direction
sample name

The third output file, which should be named "*.sd.txt" or "*.sd.txt.gz" specifies site depth counts: For each locus specified in an input VCF as a simple, biallelic SNP, it gives a count, for each base call, of the number of reads that cover that locus. It has the following columns:

contig
position
sampleName
A observations
C observations
G observations
T observations

The fourth output file, which should be named "*.rd.txt" or "*.rd.txt.gz" specifies read depths: For each interval specified by a 3-column, tab delimited input file, the number of reads that start in that interval are reported. It has the following columns:

contig
starting position
ending position
read count

Note: when only collecting RD evidence, users should consider providing the same interval list with -L as --depth-evidence-intervals in order to avoid processing unused reads outside the intervals. Each of these output files may also be written as a block-compressed interval file, rather than as a tab-delimited text file by specifying an output file name that ends with ".bci" rather than ".txt". These files are self-indexing, and contain complete header information including sample name(s) and a dictionary for the contigs.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final class

CollectSVEvidence.BAFSiteIterator

Nested classes/interfaces inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
CommandLineProgram.AutoCloseableNoCheckedExceptions
Field Summary

Fields

Modifier and Type

Field

Description

static final String

COMPRESSION_LEVEL_ARGUMENT_LONG_NAME

static final String

DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_LONG_NAME

static final String

DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_SHORT_NAME

static final String

DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_LONG_NAME

static final String

DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_SHORT_NAME

static final String

DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_LONG_NAME

static final String

DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_SHORT_NAME

GATKPath

depthEvidenceInputFilename

GATKPath

depthEvidenceOutputFilename

GATKPath

depthEvidenceSummaryFilename

static final String

MIN_DEPTH_EVIDENCE_MAPQ_ARGUMENT_NAME

static final String

MIN_SITE_DEPTH_BASEQ_ARGUMENT_NAME

static final String

MIN_SITE_DEPTH_MAPQ_ARGUMENT_NAME

int

minDepthEvidenceMapQ

int

minMapQ

int

minQ

static final String

PAIRED_END_FILE_ARGUMENT_LONG_NAME

static final String

PAIRED_END_FILE_ARGUMENT_SHORT_NAME

GATKPath

peFile

static final String

SAMPLE_NAME_ARGUMENT_LONG_NAME

static final String

SITE_DEPTH_INPUT_ARGUMENT_LONG_NAME

static final String

SITE_DEPTH_INPUT_ARGUMENT_SHORT_NAME

static final String

SITE_DEPTH_OUTPUT_ARGUMENT_LONG_NAME

static final String

SITE_DEPTH_OUTPUT_ARGUMENT_SHORT_NAME

GATKPath

siteDepthInputFilename

GATKPath

siteDepthOutputFilename

static final String

SPLIT_READ_FILE_ARGUMENT_LONG_NAME

static final String

SPLIT_READ_FILE_ARGUMENT_SHORT_NAME

GATKPath

srFile

Fields inherited from class org.broadinstitute.hellbender.engine.ReadWalker
FEATURE_CACHE_LOOKAHEAD

Fields inherited from class org.broadinstitute.hellbender.engine.GATKTool
addOutputSAMProgramRecord, addOutputVCFCommandLine, cloudIndexPrefetchBuffer, cloudPrefetchBuffer, createOutputBamIndex, createOutputBamMD5, createOutputVariantIndex, createOutputVariantMD5, disableBamIndexCaching, features, intervalArgumentCollection, lenientVCFProcessing, outputSitesOnlyVCFs, progressMeter, readArguments, referenceArguments, SECONDS_BETWEEN_PROGRESS_UPDATES_NAME, seqValidationArguments

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor Summary

Constructors

Constructor

Description

CollectSVEvidence()
Method Summary

Modifier and Type

Method

Description

void

apply(GATKRead read, ReferenceContext referenceContext, FeatureContext featureContext)

Process an individual read (with optional contextual information).

void

closeTool()

This method is called by the GATK framework at the end of the GATKTool.doWork() template method.

void

countSplitRead(GATKRead read, PriorityQueue<org.broadinstitute.hellbender.tools.walkers.sv.CollectSVEvidence.SplitPos> splitCounts, FeatureSink<SplitReadEvidence> srWriter)

Adds split read information about the current read to the counts in splitCounts.

List<ReadFilter>

getDefaultReadFilters()

Returns the default list of CommandLineReadFilters that are used for this tool.

org.broadinstitute.hellbender.tools.walkers.sv.CollectSVEvidence.DiscordantRead

getReportableDiscordantReadPair(GATKRead read, Set<String> observedDiscordantNamesAtThisLocus, htsjdk.samtools.SAMSequenceDictionary samSequenceDictionary)

void

onTraversalStart()

Operations performed just prior to the start of traversal.

Object

onTraversalSuccess()

Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal).

boolean

requiresReads()

Does this tool require reads? Traversals types and/or tools that do should override to return true.

Methods inherited from class org.broadinstitute.hellbender.engine.ReadWalker
getProgressMeterRecordLabel, onShutdown, onStartup, resetReadsDataSource, traverse

Methods inherited from class org.broadinstitute.hellbender.engine.WalkerBase
directlyAccessEngineFeatureManager, directlyAccessEngineReadsDataSource, directlyAccessEngineReferenceDataSource

Methods inherited from class org.broadinstitute.hellbender.engine.GATKTool
addFeatureInputsAfterInitialization, bamIndexCachingShouldBeEnabled, createSAMWriter, createVCFWriter, createVCFWriter, createVCFWriter, disableProgressMeter, doWork, getBestAvailableSequenceDictionary, getDefaultCloudIndexPrefetchBufferSize, getDefaultCloudPrefetchBufferSize, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotationGroups, getDefaultVariantAnnotations, getGenomicsDBOptions, getHeaderForFeatures, getHeaderForReads, getHeaderForSAMWriter, getMasterSequenceDictionary, getPluginDescriptors, getReferenceDictionary, getSequenceDictionaryValidationArgumentCollection, getToolName, getTransformedReadStream, getTraversalIntervals, getUserSuppliedIntervals, hasFeatures, hasReads, hasReference, hasUserSuppliedIntervals, initializeProgressMeter, makePostReadFilterTransformer, makePreReadFilterTransformer, makeReadFilter, makeSamReaderFactory, makeVariantAnnotations, requiresFeatures, requiresIntervals, requiresReference, transformTraversalIntervals, useVariantAnnotations

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- PAIRED_END_FILE_ARGUMENT_SHORT_NAME
  
  public static final String PAIRED_END_FILE_ARGUMENT_SHORT_NAME
  See Also:
  
  Constant Field Values
- PAIRED_END_FILE_ARGUMENT_LONG_NAME
  
  public static final String PAIRED_END_FILE_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- SPLIT_READ_FILE_ARGUMENT_SHORT_NAME
  
  public static final String SPLIT_READ_FILE_ARGUMENT_SHORT_NAME
  See Also:
  
  Constant Field Values
- SPLIT_READ_FILE_ARGUMENT_LONG_NAME
  
  public static final String SPLIT_READ_FILE_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- SITE_DEPTH_OUTPUT_ARGUMENT_SHORT_NAME
  
  public static final String SITE_DEPTH_OUTPUT_ARGUMENT_SHORT_NAME
  See Also:
  
  Constant Field Values
- SITE_DEPTH_OUTPUT_ARGUMENT_LONG_NAME
  
  public static final String SITE_DEPTH_OUTPUT_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- SITE_DEPTH_INPUT_ARGUMENT_SHORT_NAME
  
  public static final String SITE_DEPTH_INPUT_ARGUMENT_SHORT_NAME
  See Also:
  
  Constant Field Values
- SITE_DEPTH_INPUT_ARGUMENT_LONG_NAME
  
  public static final String SITE_DEPTH_INPUT_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_SHORT_NAME
  
  public static final String DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_SHORT_NAME
  See Also:
  
  Constant Field Values
- DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_LONG_NAME
  
  public static final String DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_SHORT_NAME
  
  public static final String DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_SHORT_NAME
  See Also:
  
  Constant Field Values
- DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_LONG_NAME
  
  public static final String DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_SHORT_NAME
  
  public static final String DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_SHORT_NAME
  See Also:
  
  Constant Field Values
- DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_LONG_NAME
  
  public static final String DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- MIN_DEPTH_EVIDENCE_MAPQ_ARGUMENT_NAME
  
  public static final String MIN_DEPTH_EVIDENCE_MAPQ_ARGUMENT_NAME
  See Also:
  
  Constant Field Values
- MIN_SITE_DEPTH_MAPQ_ARGUMENT_NAME
  
  public static final String MIN_SITE_DEPTH_MAPQ_ARGUMENT_NAME
  See Also:
  
  Constant Field Values
- MIN_SITE_DEPTH_BASEQ_ARGUMENT_NAME
  
  public static final String MIN_SITE_DEPTH_BASEQ_ARGUMENT_NAME
  See Also:
  
  Constant Field Values
- SAMPLE_NAME_ARGUMENT_LONG_NAME
  
  public static final String SAMPLE_NAME_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- COMPRESSION_LEVEL_ARGUMENT_LONG_NAME
  
  public static final String COMPRESSION_LEVEL_ARGUMENT_LONG_NAME
  See Also:
  
  Constant Field Values
- peFile
  
  @Argument(shortName="PE", fullName="pe-file", doc="Output file for paired end evidence", optional=true) public GATKPath peFile
- srFile
  
  @Argument(shortName="SR", fullName="sr-file", doc="Output file for split read evidence", optional=true) public GATKPath srFile
- siteDepthOutputFilename
  
  @Argument(shortName="SD", fullName="sd-file", doc="Output file for site depth counts", optional=true) public GATKPath siteDepthOutputFilename
- siteDepthInputFilename
  
  @Argument(shortName="F", fullName="site-depth-locs-vcf", doc="Input VCF of SNPs marking loci for site depth counts", optional=true) public GATKPath siteDepthInputFilename
- depthEvidenceOutputFilename
  
  @Argument(shortName="RD", fullName="depth-evidence-file", doc="Output file for depth evidence", optional=true) public GATKPath depthEvidenceOutputFilename
- depthEvidenceSummaryFilename
  
  @Argument(shortName="DS", fullName="depth-summary-file", doc="Output file for depth evidence summary statistics", optional=true) public GATKPath depthEvidenceSummaryFilename
- depthEvidenceInputFilename
  
  @Argument(shortName="DI", fullName="depth-evidence-intervals", doc="Input feature file specifying intervals where depth evidence will be gathered", optional=true) public GATKPath depthEvidenceInputFilename
- minDepthEvidenceMapQ
  
  @Argument(fullName="depth-evidence-min-mapq", doc="minimum mapping quality for read to be counted as depth evidence", optional=true) public int minDepthEvidenceMapQ
- minMapQ
  
  @Argument(fullName="site-depth-min-mapq", doc="minimum mapping quality for read to be counted toward site depth", optional=true) public int minMapQ
- minQ
  
  @Argument(fullName="site-depth-min-baseq", doc="minimum base call quality for SNP to be counted toward site depth", optional=true) public int minQ
Constructor Details
- CollectSVEvidence
  
  public CollectSVEvidence()
Method Details
- requiresReads
  
  public boolean requiresReads()
  
  Description copied from class: GATKTool
  
  Does this tool require reads? Traversals types and/or tools that do should override to return true.
  
  Overrides:
  
  requiresReads in class ReadWalker
  
  Returns:
  
  true if this tool requires reads, otherwise false
- onTraversalStart
  
  public void onTraversalStart()
  
  Description copied from class: GATKTool
  
  Operations performed just prior to the start of traversal. Should be overridden by tool authors who need to process arguments local to their tool or perform other kinds of local initialization. Default implementation does nothing.
  
  Overrides:
  
  onTraversalStart in class GATKTool
- getDefaultReadFilters
  
  public List<ReadFilter> getDefaultReadFilters()
  
  Description copied from class: ReadWalker
  
  Returns the default list of CommandLineReadFilters that are used for this tool. The filters returned by this method are subject to selective enabling/disabling by the user via the command line. The default implementation uses the WellformedReadFilter filter with all default options. Subclasses can override to provide alternative filters. Note: this method is called before command line parsing begins, and thus before a SAMFileHeader is available through {link #getHeaderForReads}.
  
  Overrides:
  
  getDefaultReadFilters in class ReadWalker
  
  Returns:
  
  List of individual filters to be applied for this tool.
- apply
  
  public void apply(GATKRead read, ReferenceContext referenceContext, FeatureContext featureContext)
  
  Description copied from class: ReadWalker
  
  Process an individual read (with optional contextual information). Must be implemented by tool authors. In general, tool authors should simply stream their output from apply(), and maintain as little internal state as possible. TODO: Determine whether and to what degree the GATK engine should provide a reduce operation TODO: to complement this operation. At a minimum, we should make apply() return a value to TODO: discourage statefulness in walkers, but how this value should be handled is TBD.
  
  Specified by:
  
  apply in class ReadWalker
  
  Parameters:
  
  read - current read
  
  referenceContext - Reference bases spanning the current read. Will be an empty, but non-null, context object if there is no backing source of reference data (in which case all queries on it will return an empty array/iterator). Can request extra bases of context around the current read's interval by invoking ReferenceContext.setWindow(int, int) on this object before calling ReferenceContext.getBases()
  
  featureContext - Features spanning the current read. Will be an empty, but non-null, context object if there is no backing source of Feature data (in which case all queries on it will return an empty List).
- getReportableDiscordantReadPair
  
  public org.broadinstitute.hellbender.tools.walkers.sv.CollectSVEvidence.DiscordantRead getReportableDiscordantReadPair(GATKRead read, Set<String> observedDiscordantNamesAtThisLocus, htsjdk.samtools.SAMSequenceDictionary samSequenceDictionary)
- countSplitRead
  
  public void countSplitRead(GATKRead read, PriorityQueue<org.broadinstitute.hellbender.tools.walkers.sv.CollectSVEvidence.SplitPos> splitCounts, FeatureSink<SplitReadEvidence> srWriter)
  
  Adds split read information about the current read to the counts in splitCounts. Flushes split read counts to srWriter if necessary.
- onTraversalSuccess
  
  public Object onTraversalSuccess()
  
  Description copied from class: GATKTool
  
  Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal). Should be overridden by tool authors who need to close local resources, etc., after traversal. Also allows tools to return a value representing the traversal result, which is printed by the engine. Default implementation does nothing and returns null.
  
  Overrides:
  
  onTraversalSuccess in class GATKTool
  
  Returns:
  
  Object representing the traversal result, or null if a tool does not return a value
- closeTool
  
  public void closeTool()
  
  Description copied from class: GATKTool
  
  This method is called by the GATK framework at the end of the GATKTool.doWork() template method. It is called regardless of whether the GATKTool.traverse() has succeeded or not. It is called after the GATKTool.onTraversalSuccess() has completed (successfully or not) but before the GATKTool.doWork() method returns. In other words, on successful runs both GATKTool.onTraversalSuccess() and GATKTool.closeTool() will be called (in this order) while on failed runs (when GATKTool.traverse() causes an exception), only GATKTool.closeTool() will be called. The default implementation does nothing. Subclasses should override this method to close any resources that must be closed regardless of the success of traversal.
  
  Overrides:
  
  closeTool in class GATKTool

Class CollectSVEvidence

Nested Class Summary

Nested classes/interfaces inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Field Summary

Fields inherited from class org.broadinstitute.hellbender.engine.ReadWalker

Fields inherited from class org.broadinstitute.hellbender.engine.GATKTool

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class org.broadinstitute.hellbender.engine.ReadWalker

Methods inherited from class org.broadinstitute.hellbender.engine.WalkerBase

Methods inherited from class org.broadinstitute.hellbender.engine.GATKTool

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Details

PAIRED_END_FILE_ARGUMENT_SHORT_NAME

PAIRED_END_FILE_ARGUMENT_LONG_NAME

SPLIT_READ_FILE_ARGUMENT_SHORT_NAME

SPLIT_READ_FILE_ARGUMENT_LONG_NAME

SITE_DEPTH_OUTPUT_ARGUMENT_SHORT_NAME

SITE_DEPTH_OUTPUT_ARGUMENT_LONG_NAME

SITE_DEPTH_INPUT_ARGUMENT_SHORT_NAME

SITE_DEPTH_INPUT_ARGUMENT_LONG_NAME

DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_SHORT_NAME

DEPTH_EVIDENCE_OUTPUT_FILE_ARGUMENT_LONG_NAME

DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_SHORT_NAME

DEPTH_EVIDENCE_SUMMARY_FILE_ARGUMENT_LONG_NAME

DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_SHORT_NAME

DEPTH_EVIDENCE_INTERVALS_INPUT_FILE_ARGUMENT_LONG_NAME

MIN_DEPTH_EVIDENCE_MAPQ_ARGUMENT_NAME

MIN_SITE_DEPTH_MAPQ_ARGUMENT_NAME

MIN_SITE_DEPTH_BASEQ_ARGUMENT_NAME

SAMPLE_NAME_ARGUMENT_LONG_NAME

COMPRESSION_LEVEL_ARGUMENT_LONG_NAME

peFile

srFile

siteDepthOutputFilename

siteDepthInputFilename

depthEvidenceOutputFilename

depthEvidenceSummaryFilename

depthEvidenceInputFilename

minDepthEvidenceMapQ

minMapQ

minQ

Constructor Details

CollectSVEvidence

Method Details

requiresReads

onTraversalStart

getDefaultReadFilters

apply

getReportableDiscordantReadPair

countSplitRead

onTraversalSuccess

closeTool