Summary
Clusters the results from a
CrosscheckFingerprints
run according to the LOD score. The resulting metric file
can be used to assist diagnosing results from
CrosscheckFingerprints
. It clusters the connectivity graph between the
different groups. Two groups are connected if they have a LOD score greater than the
LOD_THRESHOLD
.
Details
The results of running
CrosscheckFingerprints
can be difficult to analyze, especially when many groups are
related (meaning LOD greater than
LOD_THRESHOLD
) in non-transitive manner (A is related to B, B is related to C,
but A doesn't seem to be related to C.)
ClusterCrosscheckMetrics
clusters the metrics from
CrosscheckFingerprints
so that all the groups in a cluster are related to each other either directly, or indirectly (thus A, B and C would
end up in one cluster.) Two samples can only be in two different clusters if all the samples from these two clusters
do not get high LOD scores when compared to each other.
Example
java -jar picard.jar ClusterCrosscheckMetrics \
INPUT=sample.crosscheck_metrics \
LOD_THRESHOLD=3 \
OUTPUT=sample.clustered.crosscheck_metrics
The resulting file, consists of the
ClusteredCrosscheckMetric
class and contains the original crosscheck metric
values, for groups that end-up in the same clusters (regardless of LOD score of each comparison). In addition it notes
the
ClusteredCrosscheckMetric.CLUSTER
identifier and the size of the cluster (in
ClusteredCrosscheckMetric.CLUSTER_SIZE
.)
Groups that do not have high LOD scores with any other group (including itself!) will not be included in the metric file.
Note that cross-group comparisons are not included in the metric file.