SVReferenceUtils (gatk 4.4.0.0 API)

java.lang.Object

org.broadinstitute.hellbender.tools.spark.sv.utils.SVReferenceUtils

public final class SVReferenceUtils extends Object

Constructor Summary

Constructors

Constructor

Description

SVReferenceUtils()
Method Summary

Modifier and Type

Method

Description

static org.apache.spark.api.java.JavaRDD<byte[]>

getReferenceBasesRDD(org.apache.spark.api.java.JavaSparkContext ctx, int kSize, ReferenceMultiSparkSource ref, htsjdk.samtools.SAMSequenceDictionary dict, int refRecordLen, int refRecordsPerPartition)

Create an RDD from the reference sequences.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SVReferenceUtils
  
  public SVReferenceUtils()
Method Details
- getReferenceBasesRDD
  
  public static org.apache.spark.api.java.JavaRDD<byte[]> getReferenceBasesRDD(org.apache.spark.api.java.JavaSparkContext ctx, int kSize, ReferenceMultiSparkSource ref, htsjdk.samtools.SAMSequenceDictionary dict, int refRecordLen, int refRecordsPerPartition)
  
  Create an RDD from the reference sequences. The reference sequences are transformed into a single, large collection of byte arrays. The collection is then parallelized into an RDD. Each contig that exceeds a size given by refRecordLen is broken into a series of refRecordLen chunks with a kSize - 1 base overlap between successive chunks. (I.e., for kSize = 63, the last 62 bases in chunk n match the first 62 bases in chunk n+1) so that we don't miss any kmers due to the chunking -- we can just kmerize each record independently.