- All Implemented Interfaces:
- java.io.Serializable, java.util.function.Function<GATKRead,GATKRead>, java.util.function.UnaryOperator<GATKRead>, ReadTransformer, SerializableFunction<GATKRead,GATKRead>
public final class PalindromeArtifactClipReadTransformer
extends java.lang.Object
implements ReadTransformer
Trims (hard clips) soft-clipped bases due to the following artifact:
When a sequence and its reverse complement occur near opposite ends of a fragment DNA damage (especially in the case
of FFPE samples and ancient DNA) can disrupt base-pairing causing a single-strand loop of the sequence and its reverse
complement, after which end repair copies the true 5' end of the fragment onto the 3' end of the fragment. That is, the
artifact looks like this (A' denotes the reverse complement of A)
Biological sequence:
Forward strand 3' A B . . . B' C 5'
Reverse strand 5' A' B' . . . B C' 3'
Loop structure (B/B', C/C' between forward and strands are *not* hybridized due to damage)
Reverse strand 3' C' B . . . . . .
Forward strand 5' C B' . . . . . .
| . .
Forward strand 3' B . . . . . .
Reverse strand 5' B' . . . . . .
After end repair of 5' overhang of sequence C on self-looped forward strand
Reverse strand 3' C' B . . . . . .
Forward strand 5' C B' . . . . . .
| | . .
Forward strand 3' C' B . . . . . .
Reverse strand 5' B' . . . . . .
Forward strand with artifact after denaturing: (C' replaces A)
Forward strand 3' C' B . . . B' C
Since there is no good way early in GATK tools to collect reads and mates, here we filter only for the case where
sequence C matches the reference, so that the reference sequence is a proxy for the 5' end of the mate. This catches
most errors and saves a lot of runtime by reducing false active regions and by simplifying the assembly.
- See Also:
- Serialized Form