public abstract class LocusWalkerSpark extends GATKSparkTool
LocusWalker
. Subclasses should implement processAlignments(JavaRDD, JavaSparkContext)
and operate on the passed in RDD.GATKSparkTool.ReadInputMergingPolicy
Modifier and Type | Field and Description |
---|---|
protected int |
maxDepthPerSample |
int |
readShardSize |
boolean |
shuffle |
addOutputVCFCommandLine, BAM_PARTITION_SIZE_LONG_NAME, bamPartitionSplitSize, CREATE_OUTPUT_BAM_SPLITTING_INDEX_LONG_NAME, createOutputBamIndex, createOutputBamSplittingIndex, createOutputVariantIndex, features, intervalArgumentCollection, NUM_REDUCERS_LONG_NAME, numReducers, OUTPUT_SHARD_DIR_LONG_NAME, readArguments, referenceArguments, sequenceDictionaryValidationArguments, SHARDED_OUTPUT_LONG_NAME, shardedOutput, shardedPartsDir, USE_NIO, useNio
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
LocusWalkerSpark() |
Modifier and Type | Method and Description |
---|---|
protected int |
defaultMaxDepthPerSample()
Returns default value for the
maxDepthPerSample parameter, if none is provided on the command line. |
boolean |
emitEmptyLoci()
Does this tool emit information for uncovered loci? Tools that do should override to return
true . |
org.apache.spark.api.java.JavaRDD<LocusWalkerContext> |
getAlignments(org.apache.spark.api.java.JavaSparkContext ctx)
Loads alignments and the corresponding reference and features into a
JavaRDD for the intervals specified. |
protected LIBSDownsamplingInfo |
getDownsamplingInfo()
Returns the downsampling info using
maxDepthPerSample as target coverage. |
protected abstract void |
processAlignments(org.apache.spark.api.java.JavaRDD<LocusWalkerContext> rdd,
org.apache.spark.api.java.JavaSparkContext ctx)
Process the alignments and write output.
|
boolean |
requiresReads()
Does this tool require reads? Tools that do should override to return true.
|
protected void |
runTool(org.apache.spark.api.java.JavaSparkContext ctx)
Runs the tool itself after initializing and validating inputs.
|
addReferenceFilesForSpark, addVCFsForSpark, editIntervals, getBestAvailableSequenceDictionary, getDefaultReadFilters, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotationGroups, getDefaultVariantAnnotations, getGatkReadJavaRDD, getHeaderForReads, getIntervals, getPluginDescriptors, getReadInputMergingPolicy, getReads, getReadSourceHeaderMap, getReadSourceName, getRecommendedNumReducers, getReference, getReferenceSequenceDictionary, getReferenceWindowFunction, getSequenceDictionaryValidationArgumentCollection, getTargetPartitionSize, getUnfilteredReads, hasReads, hasReference, hasUserSuppliedIntervals, makeReadFilter, makeReadFilter, makeVariantAnnotations, requiresIntervals, requiresReference, runPipeline, useVariantAnnotations, validateSequenceDictionaries, writeReads, writeReads
afterPipeline, doWork, getProgramName
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
@Argument(fullName="max-depth-per-sample", shortName="max-depth-per-sample", doc="Maximum number of reads to retain per sample per locus. Reads above this threshold will be downsampled. Set to 0 to disable.", optional=true) protected int maxDepthPerSample
@Argument(fullName="read-shard-size", shortName="read-shard-size", doc="Maximum size of each read shard, in bases.", optional=true) public int readShardSize
@Argument(doc="whether to use the shuffle implementation or overlaps partitioning (the default)", shortName="shuffle", fullName="shuffle", optional=true) public boolean shuffle
protected int defaultMaxDepthPerSample()
maxDepthPerSample
parameter, if none is provided on the command line.
Default implementation returns 0 (no downsampling by default).public boolean requiresReads()
GATKSparkTool
requiresReads
in class GATKSparkTool
protected final LIBSDownsamplingInfo getDownsamplingInfo()
maxDepthPerSample
as target coverage.public boolean emitEmptyLoci()
true
.
NOTE: Typically, this should only be used when intervals are specified.
NOTE: If MappedReadFilter is removed, then emitting empty loci will fail.
NOTE: If there is no available sequence dictionary and this is set to true, there should be a failure. Please
consider requiring reads and/or references for all tools that wish to set this to true
.true
if this tool requires uncovered loci information to be emitted, false
otherwisepublic org.apache.spark.api.java.JavaRDD<LocusWalkerContext> getAlignments(org.apache.spark.api.java.JavaSparkContext ctx)
JavaRDD
for the intervals specified.
If no intervals were specified, returns all the alignments.JavaRDD
, bounded by intervals if specified.protected void runTool(org.apache.spark.api.java.JavaSparkContext ctx)
GATKSparkTool
runTool
in class GATKSparkTool
ctx
- our Spark contextprotected abstract void processAlignments(org.apache.spark.api.java.JavaRDD<LocusWalkerContext> rdd, org.apache.spark.api.java.JavaSparkContext ctx)
rdd
- a distributed collection of LocusWalkerContext
ctx
- our Spark context