Class AssemblyRegion

java.lang.Object
org.broadinstitute.hellbender.engine.AssemblyRegion
All Implemented Interfaces:
htsjdk.samtools.util.Locatable

public final class AssemblyRegion extends Object implements htsjdk.samtools.util.Locatable
Region of the genome that gets assembled by the local assembly engine. As AssemblyRegion is defined by two intervals -- a primary interval containing a territory for variant calling and a second, padded, interval for assembly -- as well as the reads overlapping the padded interval. Although we do not call variants in the padded interval, assembling over a larger territory improves calls in the primary territory. This concept is complicated somewhat by the fact that these intervals are mutable and the fact that the AssemblyRegion onject lives on after assembly during local realignment during PairHMM. Here is an example of the life cycle of an AssemblyRegion: Suppose that the HaplotypeCaller engine finds an evidence for a het in a pileup at locus 400 -- that is, it produces an ActivityProfileState with non-zero probability at site 400 and passes it to its ActivityProfile. The ActivityProfile eventually produces an AssemblyRegion based on the AssemblyRegionArgumentCollection parameters. Let's suppose that this initial region has primary span 350-450 and padded span 100 - 700. Next, the assembly engine assembles all reads that overlap the padded interval to find variant haplotypes and the variants they contain. The AssemblyRegion is then trimmed down to a new primary interval bound by all assembled variants within the original primary interval and a new padded interval. The amount of padding of the new padded interval around the variants depends on the needs of local realignment and as such need not equal the original padding that was used for assembly.
  • Constructor Details

    • AssemblyRegion

      public AssemblyRegion(SimpleInterval activeSpan, boolean isActive, int padding, htsjdk.samtools.SAMFileHeader header)
      Create a new AssemblyRegion containing no reads
      Parameters:
      activeSpan - the span of this active region
      isActive - indicates whether this is an active region, or an inactive one
      padding - the active region padding to use for this active region
    • AssemblyRegion

      public AssemblyRegion(SimpleInterval activeSpan, SimpleInterval paddedSpan, boolean isActive, htsjdk.samtools.SAMFileHeader header)
      Create a new AssemblyRegion containing no reads
      Parameters:
      activeSpan - the span of this active region
      paddedSpan - the padded span of this active region
      isActive - indicates whether this is an active region, or an inactive one
    • AssemblyRegion

      public AssemblyRegion(SimpleInterval activeSpan, int padding, htsjdk.samtools.SAMFileHeader header)
      Simple interface to create an assembly region that isActive without any profile state
  • Method Details

    • getAlignmentData

      public List<AlignmentAndReferenceContext> getAlignmentData()
      Method for obtaining the alignment data which is attached to the assembly region.
      Returns:
      The list of AlignmentData objects associated with ActiveRegion.
    • addAllAlignmentData

      public void addAllAlignmentData(List<AlignmentAndReferenceContext> alignmentData)
      Method for adding alignment data to the collection of AlignmentData associated with the ActiveRegion.
    • getContig

      public String getContig()
      Specified by:
      getContig in interface htsjdk.samtools.util.Locatable
    • getStart

      public int getStart()
      Specified by:
      getStart in interface htsjdk.samtools.util.Locatable
    • getEnd

      public int getEnd()
      Specified by:
      getEnd in interface htsjdk.samtools.util.Locatable
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • isActive

      public boolean isActive()
      Does this region represent an active region (all isActiveProbs above threshold) or an inactive region (all isActiveProbs below threshold)?
    • getPaddedSpan

      public SimpleInterval getPaddedSpan()
      Get the span of this assembly region including the padding value
      Returns:
      a non-null SimpleInterval
    • getSpan

      public SimpleInterval getSpan()
      Get the raw span of this assembly region (excluding the padding)
      Returns:
      a non-null SimpleInterval
    • getReads

      public List<GATKRead> getReads()
      Get an unmodifiable copy of the list of reads currently in this assembly region. The reads are sorted by their coordinate position.
      Returns:
      an unmodifiable and inmutable copy of the reads in the assembly region.
    • getHardClippedPileupReads

      public List<GATKRead> getHardClippedPileupReads()
      Get an unmodifiable copy of the list of reads currently in this assembly region. The reads are sorted by their coordinate position.
      Returns:
      an unmodifiable and inmutable copy of the reads in the assembly region.
    • getHeader

      public htsjdk.samtools.SAMFileHeader getHeader()
      Returns the header for the reads in this region.
    • trim

      public AssemblyRegion trim(SimpleInterval span, int padding)
      Trim this region to just the span, producing a new assembly region without any reads that has only the extent of newExtend intersected with the current extent
      Parameters:
      span - the new extend of the active region we want
      padding - the padding size we want for the newly trimmed active region
      Returns:
      a non-null, empty assembly region
    • trim

      public AssemblyRegion trim(SimpleInterval span, SimpleInterval paddedSpan)
      Trim this region to no more than the span, producing a new assembly region with properly trimmed reads that attempts to provide the best possible representation of this region covering the span. The challenge here is that span may (1) be larger than can be represented by this assembly region + its original padding and (2) the padding must be symmetric on both sides. This algorithm therefore determines how best to represent span as a subset of the span of this region with a padding value that captures as much of the span as possible. For example, suppose this active region is Active: 100-200 with padding of 50, so that the true span is 50-250 NewExtent: 150-225 saying that we'd ideally like to just have bases 150-225 Here we represent the assembly region as a region from 150-200 with 25 bp of padding. The overall constraint is that the region can never exceed the original region, and the padding is chosen to maximize overlap with the desired region
      Parameters:
      span - the new extend of the active region we want
      Returns:
      a non-null, empty active region
    • add

      public void add(GATKRead read)
      Add read to this region Read must have alignment start >= than the last read currently in this active region.
      Parameters:
      read - a non-null GATKRead
      Throws:
      IllegalArgumentException - if read doesn't overlap the padded region of this active region
    • size

      public int size()
      Get the number of reads currently in this region
      Returns:
      an integer >= 0
    • clearReads

      public void clearReads()
      Clear all of the reads currently in this region
    • removeAll

      public void removeAll(Collection<GATKRead> readsToRemove)
      Remove all of the reads in readsToRemove from this region
      Parameters:
      readsToRemove - the set of reads we want to remove
    • addAll

      public void addAll(Collection<GATKRead> readsToAdd)
      Add all readsToAdd to this region
      Parameters:
      readsToAdd - a collection of readsToAdd to add to this active region
    • addHardClippedPileupReads

      public void addHardClippedPileupReads(Collection<GATKRead> readsToAdd)
    • getAssemblyRegionReference

      public byte[] getAssemblyRegionReference(htsjdk.samtools.reference.ReferenceSequenceFile referenceReader)
    • getAssemblyRegionReference

      public byte[] getAssemblyRegionReference(htsjdk.samtools.reference.ReferenceSequenceFile referenceReader, int padding)
      Get the reference bases from referenceReader spanned by the padded span of this active region, including additional padding bp on either side. If this expanded region would exceed the boundaries of the active region's contig, the returned result will be truncated to only include on-genome reference bases
      Parameters:
      referenceReader - the source of the reference genome bases
      padding - the padding, in BP, we want to add to either side of this active region padded region
      Returns:
      a non-null array of bytes holding the reference bases in referenceReader
    • setFinalized

      public void setFinalized(boolean value)
    • isFinalized

      public boolean isFinalized()