Class CollectReadCounts

All Implemented Interfaces:
org.broadinstitute.barclay.argparser.CommandLinePluginProvider

@DocumentedFeature public final class CollectReadCounts extends ReadWalker
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.

Inputs

  • SAM format read data
  • Intervals at which counts will be collected. The argument interval-merging-rule must be set to IntervalMergingRule.OVERLAPPING_ONLY and all other common arguments for interval padding or merging must be set to their defaults.
  • Output file format. This can be used to select TSV or HDF5 output.

Outputs

  • Counts file. By default, the tool produces HDF5 format results. This can be changed with the format option to TSV format. Using HDF5 files with CreateReadCountPanelOfNormals can decrease runtime, by reducing time spent on IO, so this is the default output format. The HDF5 format contains information in the paths defined in HDF5SimpleCountCollection. HDF5 files may be viewed using hdfview or loaded in Python using PyTables or h5py. The TSV format has a SAM-style header containing a read group sample name, a sequence dictionary, a row specifying the column headers contained in SimpleCountCollection.SimpleCountTableColumn, and the corresponding entry rows.

Usage examples

     gatk CollectReadCounts \
          -I sample.bam \
          -L intervals.interval_list \
          --interval-merging-rule OVERLAPPING_ONLY \
          -O sample.counts.hdf5
 
  • Field Details

  • Constructor Details

    • CollectReadCounts

      public CollectReadCounts()
  • Method Details

    • requiresIntervals

      public boolean requiresIntervals()
      Description copied from class: GATKTool
      Does this tool require intervals? Traversals types and/or tools that do should override to return true.
      Overrides:
      requiresIntervals in class GATKTool
      Returns:
      true if this tool requires intervals, otherwise false
    • getDefaultReadFilters

      public List<ReadFilter> getDefaultReadFilters()
      Description copied from class: ReadWalker
      Returns the default list of CommandLineReadFilters that are used for this tool. The filters returned by this method are subject to selective enabling/disabling by the user via the command line. The default implementation uses the WellformedReadFilter filter with all default options. Subclasses can override to provide alternative filters. Note: this method is called before command line parsing begins, and thus before a SAMFileHeader is available through {link #getHeaderForReads}.
      Overrides:
      getDefaultReadFilters in class ReadWalker
      Returns:
      List of individual filters to be applied for this tool.
    • onTraversalStart

      public void onTraversalStart()
      Description copied from class: GATKTool
      Operations performed just prior to the start of traversal. Should be overridden by tool authors who need to process arguments local to their tool or perform other kinds of local initialization. Default implementation does nothing.
      Overrides:
      onTraversalStart in class GATKTool
    • apply

      public void apply(GATKRead read, ReferenceContext referenceContext, FeatureContext featureContext)
      Description copied from class: ReadWalker
      Process an individual read (with optional contextual information). Must be implemented by tool authors. In general, tool authors should simply stream their output from apply(), and maintain as little internal state as possible. TODO: Determine whether and to what degree the GATK engine should provide a reduce operation TODO: to complement this operation. At a minimum, we should make apply() return a value to TODO: discourage statefulness in walkers, but how this value should be handled is TBD.
      Specified by:
      apply in class ReadWalker
      Parameters:
      read - current read
      referenceContext - Reference bases spanning the current read. Will be an empty, but non-null, context object if there is no backing source of reference data (in which case all queries on it will return an empty array/iterator). Can request extra bases of context around the current read's interval by invoking ReferenceContext.setWindow(int, int) on this object before calling ReferenceContext.getBases()
      featureContext - Features spanning the current read. Will be an empty, but non-null, context object if there is no backing source of Feature data (in which case all queries on it will return an empty List).
    • onTraversalSuccess

      public Object onTraversalSuccess()
      Description copied from class: GATKTool
      Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal). Should be overridden by tool authors who need to close local resources, etc., after traversal. Also allows tools to return a value representing the traversal result, which is printed by the engine. Default implementation does nothing and returns null.
      Overrides:
      onTraversalSuccess in class GATKTool
      Returns:
      Object representing the traversal result, or null if a tool does not return a value