@DocumentedFeature public final class PathSeqBwaSpark extends GATKSparkTool
See PathSeqPipelineSpark for an overview of the PathSeq pipeline.
This is a specialized version of BwaSpark designed for the PathSeq pipeline. The main difference is that alignments with SAM bit flag 0x100 or 0x800 (indicating secondary or supplementary alignment) are omitted in the output.
*A standard microbe reference is available in the GATK Resource Bundle.
This tool can be run without explicitly specifying Spark options. That is to say, the given example command without Spark options will run locally. See Tutorial#10060 for an example of how to set up and run a Spark tool on a cloud Spark cluster.
gatk PathSeqBwaSpark \ --paired-input input_reads_paired.bam \ --unpaired-input input_reads_unpaired.bam \ --paired-output output_reads_paired.bam \ --unpaired-output output_reads_unpaired.bam \ --microbe-bwa-image reference.img \ --microbe-fasta reference.fa
gatk PathSeqBwaSpark \ --paired-input gs://my-gcs-bucket/input_reads_paired.bam \ --unpaired-input gs://my-gcs-bucket/input_reads_unpaired.bam \ --paired-output gs://my-gcs-bucket/output_reads_paired.bam \ --unpaired-output gs://my-gcs-bucket/output_reads_unpaired.bam \ --microbe-bwa-image /references/reference.img \ --microbe-fasta hdfs://my-cluster-m:8020//references/reference.fa \ --bam-partition-size 4000000 \ -- \ --sparkRunner GCS \ --cluster my_cluster \ --driver-memory 8G \ --executor-memory 32G \ --num-executors 4 \ --executor-cores 30 \ --conf spark.executor.memoryOverhead=132000
Note that the microbe BWA image must be copied to the same path on every worker node. The microbe FASTA may also be copied to a single path on every worker node or to HDFS.
For small input BAMs, it is recommended that the user reduce the BAM partition size in order to increase parallelism. Note that insert size is estimated separately for each Spark partition. Consequently partition size and other Spark parameters can affect the output for paired-end alignment.
To minimize output file size, header lines are included only for sequences with at least one alignment.
GATKSparkTool.ReadInputMergingPolicy
Modifier and Type | Field and Description |
---|---|
PSBwaArgumentCollection |
bwaArgs |
java.lang.String |
inputPaired |
java.lang.String |
inputUnpaired |
java.lang.String |
outputPaired |
java.lang.String |
outputUnpaired |
static java.lang.String |
PAIRED_INPUT_LONG_NAME |
static java.lang.String |
PAIRED_INPUT_SHORT_NAME |
static java.lang.String |
PAIRED_OUTPUT_LONG_NAME |
static java.lang.String |
PAIRED_OUTPUT_SHORT_NAME |
static java.lang.String |
UNPAIRED_INPUT_LONG_NAME |
static java.lang.String |
UNPAIRED_INPUT_SHORT_NAME |
static java.lang.String |
UNPAIRED_OUTPUT_LONG_NAME |
static java.lang.String |
UNPAIRED_OUTPUT_SHORT_NAME |
addOutputVCFCommandLine, BAM_PARTITION_SIZE_LONG_NAME, bamPartitionSplitSize, CREATE_OUTPUT_BAM_SPLITTING_INDEX_LONG_NAME, createOutputBamIndex, createOutputBamSplittingIndex, createOutputVariantIndex, features, intervalArgumentCollection, NUM_REDUCERS_LONG_NAME, numReducers, OUTPUT_SHARD_DIR_LONG_NAME, readArguments, referenceArguments, sequenceDictionaryValidationArguments, SHARDED_OUTPUT_LONG_NAME, shardedOutput, shardedPartsDir, SPLITTING_INDEX_GRANULARITY, splittingIndexGranularity, USE_NIO, useNio
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
PathSeqBwaSpark() |
Modifier and Type | Method and Description |
---|---|
protected void |
runTool(org.apache.spark.api.java.JavaSparkContext ctx)
Runs the tool itself after initializing and validating inputs.
|
addReferenceFilesForSpark, addVCFsForSpark, editIntervals, getBestAvailableSequenceDictionary, getDefaultReadFilters, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotationGroups, getDefaultVariantAnnotations, getGatkReadJavaRDD, getHeaderForReads, getIntervals, getPluginDescriptors, getReadInputMergingPolicy, getReads, getReadSourceHeaderMap, getReadSourceName, getRecommendedNumReducers, getReference, getReferenceSequenceDictionary, getReferenceWindowFunction, getSequenceDictionaryValidationArgumentCollection, getTargetPartitionSize, getUnfilteredReads, hasReads, hasReference, hasUserSuppliedIntervals, makeReadFilter, makeReadFilter, makeVariantAnnotations, requiresIntervals, requiresReads, requiresReference, runPipeline, useVariantAnnotations, validateSequenceDictionaries, writeReads, writeReads
afterPipeline, doWork, getProgramName
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final java.lang.String PAIRED_INPUT_LONG_NAME
public static final java.lang.String PAIRED_INPUT_SHORT_NAME
public static final java.lang.String UNPAIRED_INPUT_LONG_NAME
public static final java.lang.String UNPAIRED_INPUT_SHORT_NAME
public static final java.lang.String PAIRED_OUTPUT_LONG_NAME
public static final java.lang.String PAIRED_OUTPUT_SHORT_NAME
public static final java.lang.String UNPAIRED_OUTPUT_LONG_NAME
public static final java.lang.String UNPAIRED_OUTPUT_SHORT_NAME
@Argument(doc="Input queryname-sorted BAM containing only paired reads", fullName="paired-input", shortName="PI", optional=true) public java.lang.String inputPaired
@Argument(doc="Input BAM containing only unpaired reads", fullName="unpaired-input", shortName="UI", optional=true) public java.lang.String inputUnpaired
@Argument(doc="Output BAM containing only paired reads", fullName="paired-output", shortName="PO", optional=true) public java.lang.String outputPaired
@Argument(doc="Output BAM containing only unpaired reads", fullName="unpaired-output", shortName="UO", optional=true) public java.lang.String outputUnpaired
@ArgumentCollection public PSBwaArgumentCollection bwaArgs
protected void runTool(org.apache.spark.api.java.JavaSparkContext ctx)
GATKSparkTool
runTool
in class GATKSparkTool
ctx
- our Spark context