public abstract class ReadWalkerSpark extends GATKSparkTool
ReadWalker
. Subclasses should implement processReads(JavaRDD, JavaSparkContext)
and operate on the passed in RDD.GATKSparkTool.ReadInputMergingPolicy
Modifier and Type | Field and Description |
---|---|
static int |
FEATURE_CACHE_LOOKAHEAD
This number controls the size of the cache for our FeatureInputs
(specifically, the number of additional bases worth of overlapping records to cache when querying feature sources).
|
addOutputVCFCommandLine, BAM_PARTITION_SIZE_LONG_NAME, bamPartitionSplitSize, CREATE_OUTPUT_BAM_SPLITTING_INDEX_LONG_NAME, createOutputBamIndex, createOutputBamSplittingIndex, createOutputVariantIndex, features, intervalArgumentCollection, NUM_REDUCERS_LONG_NAME, numReducers, OUTPUT_SHARD_DIR_LONG_NAME, readArguments, referenceArguments, sequenceDictionaryValidationArguments, SHARDED_OUTPUT_LONG_NAME, shardedOutput, shardedPartsDir, USE_NIO, useNio
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
ReadWalkerSpark() |
Modifier and Type | Method and Description |
---|---|
org.apache.spark.api.java.JavaRDD<ReadWalkerContext> |
getReads(org.apache.spark.api.java.JavaSparkContext ctx)
Loads reads and the corresponding reference and features into a
JavaRDD for the intervals specified. |
protected abstract void |
processReads(org.apache.spark.api.java.JavaRDD<ReadWalkerContext> rdd,
org.apache.spark.api.java.JavaSparkContext ctx)
Process the reads and write output.
|
boolean |
requiresReads()
Does this tool require reads? Tools that do should override to return true.
|
protected void |
runTool(org.apache.spark.api.java.JavaSparkContext ctx)
Runs the tool itself after initializing and validating inputs.
|
addReferenceFilesForSpark, addVCFsForSpark, editIntervals, getBestAvailableSequenceDictionary, getDefaultReadFilters, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotationGroups, getDefaultVariantAnnotations, getGatkReadJavaRDD, getHeaderForReads, getIntervals, getPluginDescriptors, getReadInputMergingPolicy, getReads, getReadSourceHeaderMap, getReadSourceName, getRecommendedNumReducers, getReference, getReferenceSequenceDictionary, getReferenceWindowFunction, getSequenceDictionaryValidationArgumentCollection, getTargetPartitionSize, getUnfilteredReads, hasReads, hasReference, hasUserSuppliedIntervals, makeReadFilter, makeReadFilter, makeVariantAnnotations, requiresIntervals, requiresReference, runPipeline, useVariantAnnotations, validateSequenceDictionaries, writeReads, writeReads
afterPipeline, doWork, getProgramName
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final int FEATURE_CACHE_LOOKAHEAD
public boolean requiresReads()
GATKSparkTool
requiresReads
in class GATKSparkTool
public org.apache.spark.api.java.JavaRDD<ReadWalkerContext> getReads(org.apache.spark.api.java.JavaSparkContext ctx)
JavaRDD
for the intervals specified.
If no intervals were specified, returns all the reads.JavaRDD
, bounded by intervals if specified.protected void runTool(org.apache.spark.api.java.JavaSparkContext ctx)
GATKSparkTool
runTool
in class GATKSparkTool
ctx
- our Spark contextprotected abstract void processReads(org.apache.spark.api.java.JavaRDD<ReadWalkerContext> rdd, org.apache.spark.api.java.JavaSparkContext ctx)
rdd
- a distributed collection of ReadWalkerContext
ctx
- our Spark context