Class SVReferenceUtils
java.lang.Object
org.broadinstitute.hellbender.tools.spark.sv.utils.SVReferenceUtils
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.spark.api.java.JavaRDD<byte[]>
getReferenceBasesRDD
(org.apache.spark.api.java.JavaSparkContext ctx, int kSize, ReferenceMultiSparkSource ref, htsjdk.samtools.SAMSequenceDictionary dict, int refRecordLen, int refRecordsPerPartition) Create an RDD from the reference sequences.
-
Constructor Details
-
SVReferenceUtils
public SVReferenceUtils()
-
-
Method Details
-
getReferenceBasesRDD
public static org.apache.spark.api.java.JavaRDD<byte[]> getReferenceBasesRDD(org.apache.spark.api.java.JavaSparkContext ctx, int kSize, ReferenceMultiSparkSource ref, htsjdk.samtools.SAMSequenceDictionary dict, int refRecordLen, int refRecordsPerPartition) Create an RDD from the reference sequences. The reference sequences are transformed into a single, large collection of byte arrays. The collection is then parallelized into an RDD. Each contig that exceeds a size given byrefRecordLen
is broken into a series ofrefRecordLen
chunks with akSize
- 1 base overlap between successive chunks. (I.e., forkSize
= 63, the last 62 bases in chunk n match the first 62 bases in chunk n+1) so that we don't miss any kmers due to the chunking -- we can just kmerize each record independently.
-