Class SVReferenceUtils

java.lang.Object
org.broadinstitute.hellbender.tools.spark.sv.utils.SVReferenceUtils

public final class SVReferenceUtils extends Object
  • Constructor Details

    • SVReferenceUtils

      public SVReferenceUtils()
  • Method Details

    • getReferenceBasesRDD

      public static org.apache.spark.api.java.JavaRDD<byte[]> getReferenceBasesRDD(org.apache.spark.api.java.JavaSparkContext ctx, int kSize, ReferenceMultiSparkSource ref, htsjdk.samtools.SAMSequenceDictionary dict, int refRecordLen, int refRecordsPerPartition)
      Create an RDD from the reference sequences. The reference sequences are transformed into a single, large collection of byte arrays. The collection is then parallelized into an RDD. Each contig that exceeds a size given by refRecordLen is broken into a series of refRecordLen chunks with a kSize - 1 base overlap between successive chunks. (I.e., for kSize = 63, the last 62 bases in chunk n match the first 62 bases in chunk n+1) so that we don't miss any kmers due to the chunking -- we can just kmerize each record independently.