public class FindAssemblyRegionsSpark
extends java.lang.Object
Constructor and Description |
---|
FindAssemblyRegionsSpark() |
Modifier and Type | Method and Description |
---|---|
static org.apache.spark.api.java.JavaRDD<AssemblyRegionWalkerContext> |
getAssemblyRegionsFast(org.apache.spark.api.java.JavaSparkContext ctx,
org.apache.spark.api.java.JavaRDD<GATKRead> reads,
htsjdk.samtools.SAMFileHeader header,
htsjdk.samtools.SAMSequenceDictionary sequenceDictionary,
java.lang.String referenceFileName,
FeatureManager features,
java.util.List<ShardBoundary> intervalShards,
org.apache.spark.broadcast.Broadcast<java.util.function.Supplier<AssemblyRegionEvaluator>> assemblyRegionEvaluatorSupplierBroadcast,
AssemblyRegionReadShardArgumentCollection shardingArgs,
AssemblyRegionArgumentCollection assemblyRegionArgs,
boolean includeReadsWithDeletionsInIsActivePileups,
boolean shuffle)
Get an RDD of assembly regions for the given reads and intervals using the fast algorithm (looks for
assembly regions in each read shard in parallel).
|
static org.apache.spark.api.java.JavaRDD<AssemblyRegionWalkerContext> |
getAssemblyRegionsStrict(org.apache.spark.api.java.JavaSparkContext ctx,
org.apache.spark.api.java.JavaRDD<GATKRead> reads,
htsjdk.samtools.SAMFileHeader header,
htsjdk.samtools.SAMSequenceDictionary sequenceDictionary,
java.lang.String referenceFileName,
FeatureManager features,
java.util.List<ShardBoundary> intervalShards,
org.apache.spark.broadcast.Broadcast<java.util.function.Supplier<AssemblyRegionEvaluator>> assemblyRegionEvaluatorSupplierBroadcast,
AssemblyRegionReadShardArgumentCollection shardingArgs,
AssemblyRegionArgumentCollection assemblyRegionArgs,
boolean includeReadsWithDeletionsInIsActivePileups,
boolean shuffle)
Get an RDD of assembly regions for the given reads and intervals using the strict algorithm (looks for
assembly regions in each contig in parallel).
|
public static org.apache.spark.api.java.JavaRDD<AssemblyRegionWalkerContext> getAssemblyRegionsFast(org.apache.spark.api.java.JavaSparkContext ctx, org.apache.spark.api.java.JavaRDD<GATKRead> reads, htsjdk.samtools.SAMFileHeader header, htsjdk.samtools.SAMSequenceDictionary sequenceDictionary, java.lang.String referenceFileName, FeatureManager features, java.util.List<ShardBoundary> intervalShards, org.apache.spark.broadcast.Broadcast<java.util.function.Supplier<AssemblyRegionEvaluator>> assemblyRegionEvaluatorSupplierBroadcast, AssemblyRegionReadShardArgumentCollection shardingArgs, AssemblyRegionArgumentCollection assemblyRegionArgs, boolean includeReadsWithDeletionsInIsActivePileups, boolean shuffle)
ctx
- the Spark contextreads
- the coordinate-sorted readsheader
- the header for the readssequenceDictionary
- the sequence dictionary for the readsreferenceFileName
- the file name for the referencefeatures
- source of arbitrary features (may be null)intervalShards
- the sharded intervals to find assembly regions forassemblyRegionEvaluatorSupplierBroadcast
- evaluator used to determine whether a locus is activeshardingArgs
- the arguments for sharding readsassemblyRegionArgs
- the arguments for finding assembly regionsincludeReadsWithDeletionsInIsActivePileups
- include reads with deletion at locishuffle
- whether to use a shuffle or not when sharding readspublic static org.apache.spark.api.java.JavaRDD<AssemblyRegionWalkerContext> getAssemblyRegionsStrict(org.apache.spark.api.java.JavaSparkContext ctx, org.apache.spark.api.java.JavaRDD<GATKRead> reads, htsjdk.samtools.SAMFileHeader header, htsjdk.samtools.SAMSequenceDictionary sequenceDictionary, java.lang.String referenceFileName, FeatureManager features, java.util.List<ShardBoundary> intervalShards, org.apache.spark.broadcast.Broadcast<java.util.function.Supplier<AssemblyRegionEvaluator>> assemblyRegionEvaluatorSupplierBroadcast, AssemblyRegionReadShardArgumentCollection shardingArgs, AssemblyRegionArgumentCollection assemblyRegionArgs, boolean includeReadsWithDeletionsInIsActivePileups, boolean shuffle)
ctx
- the Spark contextreads
- the coordinate-sorted readsheader
- the header for the readssequenceDictionary
- the sequence dictionary for the readsreferenceFileName
- the file name for the referencefeatures
- source of arbitrary features (may be null)intervalShards
- the sharded intervals to find assembly regions forassemblyRegionEvaluatorSupplierBroadcast
- evaluator used to determine whether a locus is activeshardingArgs
- the arguments for sharding readsassemblyRegionArgs
- the arguments for finding assembly regionsincludeReadsWithDeletionsInIsActivePileups
- include reads with deletion at locishuffle
- whether to use a shuffle or not when sharding reads