Class CoverageOutputWriter
- All Implemented Interfaces:
Closeable
,AutoCloseable
DepthOfCoverage
can be organized as a combination Partition, Aggregation, and OutputType, with this writer
storing an internal list of streams to which to write output organized by DoCOutputType
objects. Generally
speaking there are three patterns that this writer is responsible for managing:
1. writePerLocusDepthSummary() - This should be called over every locus and is responsible for producing the locus
coverage output table summarizing the coverage (and possibly base counts)
for every partition type x relevant samples. This takes as input the output from
CoverageUtils.getBaseCountsByPartition(org.broadinstitute.hellbender.engine.AlignmentContext, byte, byte, org.broadinstitute.hellbender.tools.walkers.coverage.CoverageUtils.CountPileupType, java.util.Collection<org.broadinstitute.hellbender.tools.walkers.coverage.DoCOutputType.Partition>, htsjdk.samtools.SAMFileHeader)
2a. writePerIntervalDepthInformation() - This should be called once per traversal interval once it is finished and takes
as input a DepthOfCoveragePartitionedDataStore
object corresponding to the coverage
information over the whole interval. This is used to write both "_interval_summary"
and "_interval_statistics" files for each partition.
2b. writePerGeneDepthInformation() - Similarly to 2a, this should be called once per gene in the interval traversal and
it takes the same gene-interval coverage summary as 2a.
NOTE: This may not actually be called in function for every gene if the provided traversal
intervals for the tool do not actually cover any bases for the gene in quesion. If
this is the case then the gene will be passed over. //TODO in the future this should be changed to emit empty coverage counts for untraversed genes.
3. writeTraversalCumulativeCoverage() - This method takes a DepthOfCoveragePartitionedDataStore
object that should correspond
to the partitioned counts for every base traversed by DepthOfCoverage aggregated.
This method is responsible for outputting the "_cumulative_coverage_counts",
"_cumulative_coverage_proportions", "_statistics", and "_summary" files.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
-
Constructor Summary
ConstructorsConstructorDescriptionCoverageOutputWriter
(CoverageOutputWriter.DEPTH_OF_COVERAGE_OUTPUT_FORMAT outputFormat, EnumSet<DoCOutputType.Partition> partitionsToCover, String outputBaseName, boolean includeGeneOutputPerSample, boolean printBaseCounts, boolean omitDepthOutput, boolean omitIntervals, boolean omitSampleSummary, boolean omitLocusTable, List<Integer> coverageThresholds) Creates a CoverageOutputWriter for managing all of the output files for DepthOfCoverage. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
void
writeCoverageOutputHeaders
(Map<DoCOutputType.Partition, List<String>> sortedSamplesByPartition) Initialize and write the output headers for the output streams that need to be continuously updated during traversal.void
writeCumulativeOutputSummaryFiles
(DepthOfCoveragePartitionedDataStore coverageProfilesForEntireTraversal, DoCOutputType.Partition partition, List<String> sortedSamples) Write output summary histogram based metrics.void
writeOutputGeneStatistics
(int[][] nTargetsByAvgCvgBySample, int[] binEndpoints) Write out the gene statistics.void
writeOutputIntervalStatistics
(DoCOutputType.Partition partition, int[][] nTargetsByAvgCvgBySample, int[] binEndpoints) Write out the interval summary statistics.void
writePerGeneDepthInformation
(RefSeqFeature gene, DepthOfCoverageStats intervalStats, List<String> sortedSamples) Write summary information for a gene.void
writePerIntervalDepthInformation
(DoCOutputType.Partition partition, SimpleInterval interval, DepthOfCoverageStats intervalStats, List<String> sortedSamples) Method that should be called once per-partition at the end of each coverage interval.void
writePerLocusCumulativeCoverageMetrics
(DepthOfCoveragePartitionedDataStore coverageProfilesForEntireTraversal, DoCOutputType.Partition partition, List<String> sortedSampleLists) Write per-locus cumulative coverage metrics.void
writePerLocusDepthSummary
(SimpleInterval locus, Map<DoCOutputType.Partition, Map<String, int[]>> countsBySampleByType, Map<DoCOutputType.Partition, List<String>> identifiersByType, boolean includeDeletions) Writes a summary of the given per-locus data out to the main output track.
-
Constructor Details
-
CoverageOutputWriter
public CoverageOutputWriter(CoverageOutputWriter.DEPTH_OF_COVERAGE_OUTPUT_FORMAT outputFormat, EnumSet<DoCOutputType.Partition> partitionsToCover, String outputBaseName, boolean includeGeneOutputPerSample, boolean printBaseCounts, boolean omitDepthOutput, boolean omitIntervals, boolean omitSampleSummary, boolean omitLocusTable, List<Integer> coverageThresholds) throws IOException Creates a CoverageOutputWriter for managing all of the output files for DepthOfCoverage.This object holds on to output writers for each of the tables and ensures, among other things, that the data are written into the correct table corresponding to each data output call. This class is also responsible for generating the data needed to format some output lines for DepthOfCoverage. This means that for most datatypes, DepthOfCoverage need only provided with the proper
DepthOfCoverageStats
object and the this class will handle extracting the necessary summary data therin. Furthermore this class understands how to write CSV/TSV files, which all DoC output files approximately follow.- Parameters:
outputFormat
- type of table (CSV/TSV) to output results aspartitionsToCover
- partitioning for the data (eg sample, library, etc...) that will be computed in paralelloutputBaseName
- base file path for the output tables (will be prepended to the output suffix and .tsv or .csv)includeGeneOutputPerSample
- whether to produce an output sink for gene partition dataprintBaseCounts
- whether to summarize per-nucleotide counts at each locusomitDepthOutput
- if true will not generate locus output filesomitIntervals
- if true then will generate "_interval_summary" output filesomitSampleSummary
- if true then will not generate "_statistics" and "_summary" output sinksomitLocusTable
- if true will not generate "_cumulative_coverage_counts" or "_cumulative_coverage_proportions"coverageThresholds
- Threshold coverage level to use for reporting.- Throws:
IOException
-
-
Method Details
-
writeCoverageOutputHeaders
public void writeCoverageOutputHeaders(Map<DoCOutputType.Partition, List<String>> sortedSamplesByPartition) Initialize and write the output headers for the output streams that need to be continuously updated during traversal.- Parameters:
sortedSamplesByPartition
- global map of partitionType to sorted list of samples associated with that partition
-
writePerLocusDepthSummary
public void writePerLocusDepthSummary(SimpleInterval locus, Map<DoCOutputType.Partition, Map<String, int[]>> countsBySampleByType, Map<DoCOutputType.Partition, List<String>> identifiersByType, boolean includeDeletions) Writes a summary of the given per-locus data out to the main output track. Output data for any given sample may optionally contain a summary of which bases were present in the pileup at that site which will consist of a 4 element list in the format 'A:0 C:0 T:0 G:0 N:0'.This writer is responsible for determining the total depth at the site by polling one of the output tracks and using that value to calculate the mean per-sample coverage for each partitioning. These depth summary columns as well as the locus site are present in the first lines of the locusSummary output.
- Parameters:
locus
- The site corresponding to the data that will be written outcountsBySampleByType
- A map of partition to sample and finally to 4 element array of bases. This output is expected to be produced byCoverageUtils.getBaseCountsByPartition(AlignmentContext, byte, byte, CoverageUtils.CountPileupType, Collection, SAMFileHeader)
identifiersByType
- A global map of sorted samples in each partition, to be used for ordering. NOTE: this map should remain unchanged between every call of this method or correct output is not guarinteed.includeDeletions
- Whether or not to include deletions in the summary line
-
writePerIntervalDepthInformation
public void writePerIntervalDepthInformation(DoCOutputType.Partition partition, SimpleInterval interval, DepthOfCoverageStats intervalStats, List<String> sortedSamples) Method that should be called once per-partition at the end of each coverage interval. This method is responsible for extending the per-interval depth summary information for each sample.- Parameters:
partition
- Partition corresponding to the data to be writteninterval
- interval spanned by the input dataintervalStats
- statistics for coverage overlapped by the intervalsortedSamples
- sorted list of samples for this partition
-
writePerGeneDepthInformation
public void writePerGeneDepthInformation(RefSeqFeature gene, DepthOfCoverageStats intervalStats, List<String> sortedSamples) Write summary information for a gene. This method should be called for each gene in the coverage input that has been passed in coverage.- Parameters:
gene
- gene corresponding to the coverage statisticsintervalStats
- statistics for coverage overlapped by the gene (may or may not be dropped by exons)sortedSamples
- sorted list of samples for this partition
-
writePerLocusCumulativeCoverageMetrics
public void writePerLocusCumulativeCoverageMetrics(DepthOfCoveragePartitionedDataStore coverageProfilesForEntireTraversal, DoCOutputType.Partition partition, List<String> sortedSampleLists) Write per-locus cumulative coverage metrics. Note that this should be invoked on a coverage partitioner that has been updated for every locus across the traversal intervals.- Parameters:
coverageProfilesForEntireTraversal
- DepthOfCoveragePartitionedDataStore object corresponding to the entire tool traversal.partition
- Partition corresponding to the data to be writtensortedSampleLists
- sorted list of samples for this partition
-
writeCumulativeOutputSummaryFiles
public void writeCumulativeOutputSummaryFiles(DepthOfCoveragePartitionedDataStore coverageProfilesForEntireTraversal, DoCOutputType.Partition partition, List<String> sortedSamples) Write output summary histogram based metrics. Note that this should be invoked on a coverage partitioner that has been updated for every locus across the traversal intervals.- Parameters:
coverageProfilesForEntireTraversal
- DepthOfCoveragePartitionedDataStore object corresponding to the entire tool traversal.partition
- Partition corresponding to the data to be writtensortedSamples
- sorted list of samples for this partition
-
writeOutputIntervalStatistics
public void writeOutputIntervalStatistics(DoCOutputType.Partition partition, int[][] nTargetsByAvgCvgBySample, int[] binEndpoints) Write out the interval summary statistics. Note that this method expects as input that the provided table has hadCoverageUtils.updateTargetTable(int[][], DepthOfCoverageStats)
called on it exactly once for each interval summarized in this traversal.- Parameters:
partition
- Partition corresponding to the data to be writtennTargetsByAvgCvgBySample
- Target sample coverage histogram for the given partition to be written outbinEndpoints
- Bins endpoints used in the construction of of the provided histogram
-
writeOutputGeneStatistics
public void writeOutputGeneStatistics(int[][] nTargetsByAvgCvgBySample, int[] binEndpoints) Write out the gene statistics. Note that this method expects as input that the provided table has hadCoverageUtils.updateTargetTable(int[][], DepthOfCoverageStats)
called on it exactly once for each interval summarized in this traversal.- Parameters:
nTargetsByAvgCvgBySample
- Target sample coverage histogram for the given partition to be written outbinEndpoints
- Bins endpoints used in the construction of of the provided histogram
-
close
public void close()- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-