Class CoverageOutputWriter

java.lang.Object
org.broadinstitute.hellbender.tools.walkers.coverage.CoverageOutputWriter
All Implemented Interfaces:
Closeable, AutoCloseable

public class CoverageOutputWriter extends Object implements Closeable
This is a class for managing the output formatting/files for DepthOfCoverage. Output for DepthOfCoverage can be organized as a combination Partition, Aggregation, and OutputType, with this writer storing an internal list of streams to which to write output organized by DoCOutputType objects. Generally speaking there are three patterns that this writer is responsible for managing: 1. writePerLocusDepthSummary() - This should be called over every locus and is responsible for producing the locus coverage output table summarizing the coverage (and possibly base counts) for every partition type x relevant samples. This takes as input the output from CoverageUtils.getBaseCountsByPartition(org.broadinstitute.hellbender.engine.AlignmentContext, byte, byte, org.broadinstitute.hellbender.tools.walkers.coverage.CoverageUtils.CountPileupType, java.util.Collection<org.broadinstitute.hellbender.tools.walkers.coverage.DoCOutputType.Partition>, htsjdk.samtools.SAMFileHeader) 2a. writePerIntervalDepthInformation() - This should be called once per traversal interval once it is finished and takes as input a DepthOfCoveragePartitionedDataStore object corresponding to the coverage information over the whole interval. This is used to write both "_interval_summary" and "_interval_statistics" files for each partition. 2b. writePerGeneDepthInformation() - Similarly to 2a, this should be called once per gene in the interval traversal and it takes the same gene-interval coverage summary as 2a. NOTE: This may not actually be called in function for every gene if the provided traversal intervals for the tool do not actually cover any bases for the gene in quesion. If this is the case then the gene will be passed over. //TODO in the future this should be changed to emit empty coverage counts for untraversed genes. 3. writeTraversalCumulativeCoverage() - This method takes a DepthOfCoveragePartitionedDataStore object that should correspond to the partitioned counts for every base traversed by DepthOfCoverage aggregated. This method is responsible for outputting the "_cumulative_coverage_counts", "_cumulative_coverage_proportions", "_statistics", and "_summary" files.
  • Constructor Details

    • CoverageOutputWriter

      public CoverageOutputWriter(CoverageOutputWriter.DEPTH_OF_COVERAGE_OUTPUT_FORMAT outputFormat, EnumSet<DoCOutputType.Partition> partitionsToCover, String outputBaseName, boolean includeGeneOutputPerSample, boolean printBaseCounts, boolean omitDepthOutput, boolean omitIntervals, boolean omitSampleSummary, boolean omitLocusTable, List<Integer> coverageThresholds) throws IOException
      Creates a CoverageOutputWriter for managing all of the output files for DepthOfCoverage.

      This object holds on to output writers for each of the tables and ensures, among other things, that the data are written into the correct table corresponding to each data output call. This class is also responsible for generating the data needed to format some output lines for DepthOfCoverage. This means that for most datatypes, DepthOfCoverage need only provided with the proper DepthOfCoverageStats object and the this class will handle extracting the necessary summary data therin. Furthermore this class understands how to write CSV/TSV files, which all DoC output files approximately follow.

      Parameters:
      outputFormat - type of table (CSV/TSV) to output results as
      partitionsToCover - partitioning for the data (eg sample, library, etc...) that will be computed in paralell
      outputBaseName - base file path for the output tables (will be prepended to the output suffix and .tsv or .csv)
      includeGeneOutputPerSample - whether to produce an output sink for gene partition data
      printBaseCounts - whether to summarize per-nucleotide counts at each locus
      omitDepthOutput - if true will not generate locus output files
      omitIntervals - if true then will generate "_interval_summary" output files
      omitSampleSummary - if true then will not generate "_statistics" and "_summary" output sinks
      omitLocusTable - if true will not generate "_cumulative_coverage_counts" or "_cumulative_coverage_proportions"
      coverageThresholds - Threshold coverage level to use for reporting.
      Throws:
      IOException
  • Method Details

    • writeCoverageOutputHeaders

      public void writeCoverageOutputHeaders(Map<DoCOutputType.Partition,List<String>> sortedSamplesByPartition)
      Initialize and write the output headers for the output streams that need to be continuously updated during traversal.
      Parameters:
      sortedSamplesByPartition - global map of partitionType to sorted list of samples associated with that partition
    • writePerLocusDepthSummary

      public void writePerLocusDepthSummary(SimpleInterval locus, Map<DoCOutputType.Partition,Map<String,int[]>> countsBySampleByType, Map<DoCOutputType.Partition,List<String>> identifiersByType, boolean includeDeletions)
      Writes a summary of the given per-locus data out to the main output track. Output data for any given sample may optionally contain a summary of which bases were present in the pileup at that site which will consist of a 4 element list in the format 'A:0 C:0 T:0 G:0 N:0'.

      This writer is responsible for determining the total depth at the site by polling one of the output tracks and using that value to calculate the mean per-sample coverage for each partitioning. These depth summary columns as well as the locus site are present in the first lines of the locusSummary output.

      Parameters:
      locus - The site corresponding to the data that will be written out
      countsBySampleByType - A map of partition to sample and finally to 4 element array of bases. This output is expected to be produced by CoverageUtils.getBaseCountsByPartition(AlignmentContext, byte, byte, CoverageUtils.CountPileupType, Collection, SAMFileHeader)
      identifiersByType - A global map of sorted samples in each partition, to be used for ordering. NOTE: this map should remain unchanged between every call of this method or correct output is not guarinteed.
      includeDeletions - Whether or not to include deletions in the summary line
    • writePerIntervalDepthInformation

      public void writePerIntervalDepthInformation(DoCOutputType.Partition partition, SimpleInterval interval, DepthOfCoverageStats intervalStats, List<String> sortedSamples)
      Method that should be called once per-partition at the end of each coverage interval. This method is responsible for extending the per-interval depth summary information for each sample.
      Parameters:
      partition - Partition corresponding to the data to be written
      interval - interval spanned by the input data
      intervalStats - statistics for coverage overlapped by the interval
      sortedSamples - sorted list of samples for this partition
    • writePerGeneDepthInformation

      public void writePerGeneDepthInformation(RefSeqFeature gene, DepthOfCoverageStats intervalStats, List<String> sortedSamples)
      Write summary information for a gene. This method should be called for each gene in the coverage input that has been passed in coverage.
      Parameters:
      gene - gene corresponding to the coverage statistics
      intervalStats - statistics for coverage overlapped by the gene (may or may not be dropped by exons)
      sortedSamples - sorted list of samples for this partition
    • writePerLocusCumulativeCoverageMetrics

      public void writePerLocusCumulativeCoverageMetrics(DepthOfCoveragePartitionedDataStore coverageProfilesForEntireTraversal, DoCOutputType.Partition partition, List<String> sortedSampleLists)
      Write per-locus cumulative coverage metrics. Note that this should be invoked on a coverage partitioner that has been updated for every locus across the traversal intervals.
      Parameters:
      coverageProfilesForEntireTraversal - DepthOfCoveragePartitionedDataStore object corresponding to the entire tool traversal.
      partition - Partition corresponding to the data to be written
      sortedSampleLists - sorted list of samples for this partition
    • writeCumulativeOutputSummaryFiles

      public void writeCumulativeOutputSummaryFiles(DepthOfCoveragePartitionedDataStore coverageProfilesForEntireTraversal, DoCOutputType.Partition partition, List<String> sortedSamples)
      Write output summary histogram based metrics. Note that this should be invoked on a coverage partitioner that has been updated for every locus across the traversal intervals.
      Parameters:
      coverageProfilesForEntireTraversal - DepthOfCoveragePartitionedDataStore object corresponding to the entire tool traversal.
      partition - Partition corresponding to the data to be written
      sortedSamples - sorted list of samples for this partition
    • writeOutputIntervalStatistics

      public void writeOutputIntervalStatistics(DoCOutputType.Partition partition, int[][] nTargetsByAvgCvgBySample, int[] binEndpoints)
      Write out the interval summary statistics. Note that this method expects as input that the provided table has had CoverageUtils.updateTargetTable(int[][], DepthOfCoverageStats) called on it exactly once for each interval summarized in this traversal.
      Parameters:
      partition - Partition corresponding to the data to be written
      nTargetsByAvgCvgBySample - Target sample coverage histogram for the given partition to be written out
      binEndpoints - Bins endpoints used in the construction of of the provided histogram
    • writeOutputGeneStatistics

      public void writeOutputGeneStatistics(int[][] nTargetsByAvgCvgBySample, int[] binEndpoints)
      Write out the gene statistics. Note that this method expects as input that the provided table has had CoverageUtils.updateTargetTable(int[][], DepthOfCoverageStats) called on it exactly once for each interval summarized in this traversal.
      Parameters:
      nTargetsByAvgCvgBySample - Target sample coverage histogram for the given partition to be written out
      binEndpoints - Bins endpoints used in the construction of of the provided histogram
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable