objectCompareADAM extends BDGCommandCompanion with Serializable
CompareADAM is a tool for pairwise comparison of ADAM files (or merged sets of ADAM files, see the
note on the -recurse{1,2} optional parameters, below).
The canonical use-case for CompareADAM involves a single input file run through (for example) two
different implementations of the same pipeline, producing two comparable ADAM files at the end.
CompareADAM will load these ADAM files and perform a read-name-based equi-join. It then computes
one or more metrics (embodied as BucketComparisons values) across the joined records, as specified
on the command-line, and aggregates each metric into a histogram (although, this can be modified if
other aggregations are required in the future) and outputs the resulting histograms to a specified
directory as text files.
There is an R script in the adam-scripts module to process those outputs into a figure.
The available metrics to be calculated are defined, by name, in the DefaultComparisons object.
A subsequent tool like FindReads can be used to track down which reads give rise to particular aggregated
bins in the output histograms, if further diagnosis is needed.
Linear Supertypes
Serializable, Serializable, BDGCommandCompanion, AnyRef, Any
CompareADAM is a tool for pairwise comparison of ADAM files (or merged sets of ADAM files, see the note on the -recurse{1,2} optional parameters, below).
The canonical use-case for CompareADAM involves a single input file run through (for example) two different implementations of the same pipeline, producing two comparable ADAM files at the end.
CompareADAM will load these ADAM files and perform a read-name-based equi-join. It then computes one or more metrics (embodied as BucketComparisons values) across the joined records, as specified on the command-line, and aggregates each metric into a histogram (although, this can be modified if other aggregations are required in the future) and outputs the resulting histograms to a specified directory as text files.
There is an R script in the adam-scripts module to process those outputs into a figure.
The available metrics to be calculated are defined, by name, in the DefaultComparisons object.
A subsequent tool like FindReads can be used to track down which reads give rise to particular aggregated bins in the output histograms, if further diagnosis is needed.