Class PalindromeArtifactClipReadTransformer

java.lang.Object
org.broadinstitute.hellbender.transformers.PalindromeArtifactClipReadTransformer
All Implemented Interfaces:
Serializable, Function<GATKRead,GATKRead>, UnaryOperator<GATKRead>, ReadTransformer, SerializableFunction<GATKRead,GATKRead>

public final class PalindromeArtifactClipReadTransformer extends Object implements ReadTransformer
Trims (hard clips) soft-clipped bases due to the following artifact: When a sequence and its reverse complement occur near opposite ends of a fragment DNA damage (especially in the case of FFPE samples and ancient DNA) can disrupt base-pairing causing a single-strand loop of the sequence and its reverse complement, after which end repair copies the true 5' end of the fragment onto the 3' end of the fragment. That is, the artifact looks like this (A' denotes the reverse complement of A) Biological sequence: Forward strand 3' A B . . . B' C 5' Reverse strand 5' A' B' . . . B C' 3' Loop structure (B/B', C/C' between forward and strands are *not* hybridized due to damage) Reverse strand 3' C' B . . . . . . Forward strand 5' C B' . . . . . . | . . Forward strand 3' B . . . . . . Reverse strand 5' B' . . . . . . After end repair of 5' overhang of sequence C on self-looped forward strand Reverse strand 3' C' B . . . . . . Forward strand 5' C B' . . . . . . | | . . Forward strand 3' C' B . . . . . . Reverse strand 5' B' . . . . . . Forward strand with artifact after denaturing: (C' replaces A) Forward strand 3' C' B . . . B' C Since there is no good way early in GATK tools to collect reads and mates, here we filter only for the case where sequence C matches the reference, so that the reference sequence is a proxy for the 5' end of the mate. This catches most errors and saves a lot of runtime by reducing false active regions and by simplifying the assembly.
See Also: