Package picard.sam.markduplicates
Class UmiGraph
java.lang.Object
picard.sam.markduplicates.UmiGraph
UmiGraph is used to identify UMIs that come from the same original source molecule. The assumption
is that UMIs with small edit distances are likely to be read errors on the sequencer rather than
distinct molecules.
The algorithm used here is to join all pairs of UMIs that are within maxEditDistanceToJoin. It is possible
for a set of UMIs A, B and C to all be considered as part of the same source molecule even if two of the UMIs
have a Hamming distance larger than maxEditDistanceToJoin. Suppose A = "ATCC", B = "AACC", and C = "AACG"
and maxEditDistanceToJoin = 1. In this case, A and B are 1 Hamming distance so they are joined, and B and C
are 1 Hamming distance so they are joined. Because A and B are joined and because B and C are joined, this results
in A and C being joined even though they have a distance of 2.
-
Method Summary