Class AssemblyRegion
java.lang.Object
org.broadinstitute.hellbender.engine.AssemblyRegion
- All Implemented Interfaces:
htsjdk.samtools.util.Locatable
Region of the genome that gets assembled by the local assembly engine.
As AssemblyRegion is defined by two intervals -- a primary interval containing a territory for variant calling and a second,
padded, interval for assembly -- as well as the reads overlapping the padded interval. Although we do not call variants in the padded interval,
assembling over a larger territory improves calls in the primary territory.
This concept is complicated somewhat by the fact that these intervals are mutable and the fact that the AssemblyRegion onject lives on after
assembly during local realignment during PairHMM. Here is an example of the life cycle of an AssemblyRegion:
Suppose that the HaplotypeCaller engine finds an evidence for a het in a pileup at locus 400 -- that is, it produces
an
ActivityProfileState
with non-zero probability at site 400 and passes it to its ActivityProfile
.
The ActivityProfile
eventually produces an AssemblyRegion based on the AssemblyRegionArgumentCollection
parameters.
Let's suppose that this initial region has primary span 350-450 and padded span 100 - 700.
Next, the assembly engine assembles all reads that overlap the padded interval to find variant haplotypes and the variants
they contain. The AssemblyRegion is then trimmed down to a new primary interval bound by all assembled variants within the original primary interval
and a new padded interval. The amount of padding of the new padded interval around the variants depends on the needs of local realignment
and as such need not equal the original padding that was used for assembly.-
Constructor Summary
ConstructorsConstructorDescriptionAssemblyRegion
(SimpleInterval activeSpan, boolean isActive, int padding, htsjdk.samtools.SAMFileHeader header) Create a new AssemblyRegion containing no readsAssemblyRegion
(SimpleInterval activeSpan, int padding, htsjdk.samtools.SAMFileHeader header) Simple interface to create an assembly region that isActive without any profile stateAssemblyRegion
(SimpleInterval activeSpan, SimpleInterval paddedSpan, boolean isActive, htsjdk.samtools.SAMFileHeader header) Create a new AssemblyRegion containing no reads -
Method Summary
Modifier and TypeMethodDescriptionvoid
Add read to this region Read must have alignment start >= than the last read currently in this active region.void
addAll
(Collection<GATKRead> readsToAdd) Add all readsToAdd to this regionvoid
addAllAlignmentData
(List<AlignmentAndReferenceContext> alignmentData) Method for adding alignment data to the collection of AlignmentData associated with the ActiveRegion.void
addHardClippedPileupReads
(Collection<GATKRead> readsToAdd) void
Clear all of the reads currently in this regionMethod for obtaining the alignment data which is attached to the assembly region.byte[]
getAssemblyRegionReference
(htsjdk.samtools.reference.ReferenceSequenceFile referenceReader) SeegetAssemblyRegionReference(htsjdk.samtools.reference.ReferenceSequenceFile)
with padding == 0byte[]
getAssemblyRegionReference
(htsjdk.samtools.reference.ReferenceSequenceFile referenceReader, int padding) Get the reference bases from referenceReader spanned by the padded span of this active region, including additional padding bp on either side.int
getEnd()
Get an unmodifiable copy of the list of reads currently in this assembly region.htsjdk.samtools.SAMFileHeader
Returns the header for the reads in this region.Get the span of this assembly region including the padding valuegetReads()
Get an unmodifiable copy of the list of reads currently in this assembly region.getSpan()
Get the raw span of this assembly region (excluding the padding)int
getStart()
boolean
isActive()
Does this region represent an active region (all isActiveProbs above threshold) or an inactive region (all isActiveProbs below threshold)?boolean
void
removeAll
(Collection<GATKRead> readsToRemove) Remove all of the reads in readsToRemove from this regionvoid
setFinalized
(boolean value) int
size()
Get the number of reads currently in this regiontoString()
trim
(SimpleInterval span, int padding) Trim this region to just the span, producing a new assembly region without any reads that has only the extent of newExtend intersected with the current extenttrim
(SimpleInterval span, SimpleInterval paddedSpan) Trim this region to no more than the span, producing a new assembly region with properly trimmed reads that attempts to provide the best possible representation of this region covering the span.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface htsjdk.samtools.util.Locatable
contains, contigsMatch, getLengthOnReference, overlaps, withinDistanceOf
-
Constructor Details
-
AssemblyRegion
public AssemblyRegion(SimpleInterval activeSpan, boolean isActive, int padding, htsjdk.samtools.SAMFileHeader header) Create a new AssemblyRegion containing no reads- Parameters:
activeSpan
- the span of this active regionisActive
- indicates whether this is an active region, or an inactive onepadding
- the active region padding to use for this active region
-
AssemblyRegion
public AssemblyRegion(SimpleInterval activeSpan, SimpleInterval paddedSpan, boolean isActive, htsjdk.samtools.SAMFileHeader header) Create a new AssemblyRegion containing no reads- Parameters:
activeSpan
- the span of this active regionpaddedSpan
- the padded span of this active regionisActive
- indicates whether this is an active region, or an inactive one
-
AssemblyRegion
Simple interface to create an assembly region that isActive without any profile state
-
-
Method Details
-
getAlignmentData
Method for obtaining the alignment data which is attached to the assembly region.- Returns:
- The list of AlignmentData objects associated with ActiveRegion.
-
addAllAlignmentData
Method for adding alignment data to the collection of AlignmentData associated with the ActiveRegion. -
getContig
- Specified by:
getContig
in interfacehtsjdk.samtools.util.Locatable
-
getStart
public int getStart()- Specified by:
getStart
in interfacehtsjdk.samtools.util.Locatable
-
getEnd
public int getEnd()- Specified by:
getEnd
in interfacehtsjdk.samtools.util.Locatable
-
toString
-
isActive
public boolean isActive()Does this region represent an active region (all isActiveProbs above threshold) or an inactive region (all isActiveProbs below threshold)? -
getPaddedSpan
Get the span of this assembly region including the padding value- Returns:
- a non-null SimpleInterval
-
getSpan
Get the raw span of this assembly region (excluding the padding)- Returns:
- a non-null SimpleInterval
-
getReads
Get an unmodifiable copy of the list of reads currently in this assembly region. The reads are sorted by their coordinate position.- Returns:
- an unmodifiable and inmutable copy of the reads in the assembly region.
-
getHardClippedPileupReads
Get an unmodifiable copy of the list of reads currently in this assembly region. The reads are sorted by their coordinate position.- Returns:
- an unmodifiable and inmutable copy of the reads in the assembly region.
-
getHeader
public htsjdk.samtools.SAMFileHeader getHeader()Returns the header for the reads in this region. -
trim
Trim this region to just the span, producing a new assembly region without any reads that has only the extent of newExtend intersected with the current extent- Parameters:
span
- the new extend of the active region we wantpadding
- the padding size we want for the newly trimmed active region- Returns:
- a non-null, empty assembly region
-
trim
Trim this region to no more than the span, producing a new assembly region with properly trimmed reads that attempts to provide the best possible representation of this region covering the span. The challenge here is that span may (1) be larger than can be represented by this assembly region + its original padding and (2) the padding must be symmetric on both sides. This algorithm therefore determines how best to represent span as a subset of the span of this region with a padding value that captures as much of the span as possible. For example, suppose this active region is Active: 100-200 with padding of 50, so that the true span is 50-250 NewExtent: 150-225 saying that we'd ideally like to just have bases 150-225 Here we represent the assembly region as a region from 150-200 with 25 bp of padding. The overall constraint is that the region can never exceed the original region, and the padding is chosen to maximize overlap with the desired region- Parameters:
span
- the new extend of the active region we want- Returns:
- a non-null, empty active region
-
add
Add read to this region Read must have alignment start >= than the last read currently in this active region.- Parameters:
read
- a non-null GATKRead- Throws:
IllegalArgumentException
- if read doesn't overlap the padded region of this active region
-
size
public int size()Get the number of reads currently in this region- Returns:
- an integer >= 0
-
clearReads
public void clearReads()Clear all of the reads currently in this region -
removeAll
Remove all of the reads in readsToRemove from this region- Parameters:
readsToRemove
- the set of reads we want to remove
-
addAll
Add all readsToAdd to this region- Parameters:
readsToAdd
- a collection of readsToAdd to add to this active region
-
addHardClippedPileupReads
-
getAssemblyRegionReference
public byte[] getAssemblyRegionReference(htsjdk.samtools.reference.ReferenceSequenceFile referenceReader) SeegetAssemblyRegionReference(htsjdk.samtools.reference.ReferenceSequenceFile)
with padding == 0 -
getAssemblyRegionReference
public byte[] getAssemblyRegionReference(htsjdk.samtools.reference.ReferenceSequenceFile referenceReader, int padding) Get the reference bases from referenceReader spanned by the padded span of this active region, including additional padding bp on either side. If this expanded region would exceed the boundaries of the active region's contig, the returned result will be truncated to only include on-genome reference bases- Parameters:
referenceReader
- the source of the reference genome basespadding
- the padding, in BP, we want to add to either side of this active region padded region- Returns:
- a non-null array of bytes holding the reference bases in referenceReader
-
setFinalized
public void setFinalized(boolean value) -
isFinalized
public boolean isFinalized()
-