Class UmiGraph

java.lang.Object
picard.sam.markduplicates.UmiGraph

public class UmiGraph extends Object
UmiGraph is used to identify UMIs that come from the same original source molecule. The assumption is that UMIs with small edit distances are likely to be read errors on the sequencer rather than distinct molecules. The algorithm used here is to join all pairs of UMIs that are within maxEditDistanceToJoin. It is possible for a set of UMIs A, B and C to all be considered as part of the same source molecule even if two of the UMIs have a Hamming distance larger than maxEditDistanceToJoin. Suppose A = "ATCC", B = "AACC", and C = "AACG" and maxEditDistanceToJoin = 1. In this case, A and B are 1 Hamming distance so they are joined, and B and C are 1 Hamming distance so they are joined. Because A and B are joined and because B and C are joined, this results in A and C being joined even though they have a distance of 2.