PathSeqBwaSpark (gatk 4.1.7.0 API)

java.lang.Object
- org.broadinstitute.hellbender.cmdline.CommandLineProgram
- - org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
  - - org.broadinstitute.hellbender.engine.spark.GATKSparkTool
    - - org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqBwaSpark

All Implemented Interfaces:

java.io.Serializable, org.broadinstitute.barclay.argparser.CommandLinePluginProvider
```
@DocumentedFeature
public final class PathSeqBwaSpark
extends GATKSparkTool
```
Align reads to a microbe reference using BWA-MEM and Spark. Second step in the PathSeq pipeline.
See PathSeqPipelineSpark for an overview of the PathSeq pipeline.

This is a specialized version of BwaSpark designed for the PathSeq pipeline. The main difference is that alignments with SAM bit flag 0x100 or 0x800 (indicating secondary or supplementary alignment) are omitted in the output.

Inputs
- Unaligned queryname-sorted BAM file containing only paired reads (paired-end reads with mates)
- Unaligned BAM file containing only unpaired reads (paired-end reads without mates and/or single-end reads)
- *Microbe reference BWA-MEM index image generated using BwaMemIndexImageCreator
- *Indexed microbe reference FASTA file
*A standard microbe reference is available in the GATK Resource Bundle.

Output
- Aligned BAM file containing the paired reads (paired-end reads with mates)
- Aligned BAM file containing the unpaired reads (paired-end reads without mates and/or single-end reads)
Usage example

This tool can be run without explicitly specifying Spark options. That is to say, the given example command without Spark options will run locally. See Tutorial#10060 for an example of how to set up and run a Spark tool on a cloud Spark cluster.

Local mode:
```
 gatk PathSeqBwaSpark  \
   --paired-input input_reads_paired.bam \
   --unpaired-input input_reads_unpaired.bam \
   --paired-output output_reads_paired.bam \
   --unpaired-output output_reads_unpaired.bam \
   --microbe-bwa-image reference.img \
   --microbe-fasta reference.fa
 
```
Spark cluster on Google Cloud DataProc with 6 32-core / 208GB memory worker nodes:
```
 gatk PathSeqBwaSpark  \
   --paired-input gs://my-gcs-bucket/input_reads_paired.bam \
   --unpaired-input gs://my-gcs-bucket/input_reads_unpaired.bam \
   --paired-output gs://my-gcs-bucket/output_reads_paired.bam \
   --unpaired-output gs://my-gcs-bucket/output_reads_unpaired.bam \
   --microbe-bwa-image /references/reference.img \
   --microbe-fasta hdfs://my-cluster-m:8020//references/reference.fa \
   --bam-partition-size 4000000 \
   -- \
   --sparkRunner GCS \
   --cluster my_cluster \
   --driver-memory 8G \
   --executor-memory 32G \
   --num-executors 4 \
   --executor-cores 30 \
   --conf spark.executor.memoryOverhead=132000
 
```
Note that the microbe BWA image must be copied to the same path on every worker node. The microbe FASTA may also be copied to a single path on every worker node or to HDFS.

Notes

For small input BAMs, it is recommended that the user reduce the BAM partition size in order to increase parallelism. Note that insert size is estimated separately for each Spark partition. Consequently partition size and other Spark parameters can affect the output for paired-end alignment.

To minimize output file size, header lines are included only for sequences with at least one alignment.
See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool
  GATKSparkTool.ReadInputMergingPolicy

Field Summary

Fields
Modifier and Type	Field and Description
`PSBwaArgumentCollection`	`bwaArgs`
`java.lang.String`	`inputPaired`
`java.lang.String`	`inputUnpaired`
`java.lang.String`	`outputPaired`
`java.lang.String`	`outputUnpaired`
`static java.lang.String`	`PAIRED_INPUT_LONG_NAME`
`static java.lang.String`	`PAIRED_INPUT_SHORT_NAME`
`static java.lang.String`	`PAIRED_OUTPUT_LONG_NAME`
`static java.lang.String`	`PAIRED_OUTPUT_SHORT_NAME`
`static java.lang.String`	`UNPAIRED_INPUT_LONG_NAME`
`static java.lang.String`	`UNPAIRED_INPUT_SHORT_NAME`
`static java.lang.String`	`UNPAIRED_OUTPUT_LONG_NAME`
`static java.lang.String`	`UNPAIRED_OUTPUT_SHORT_NAME`

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY

Constructor Summary

Constructors
Constructor and Description

PathSeqBwaSpark()

Constructors
Constructor and Description
`PathSeqBwaSpark()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`runTool(org.apache.spark.api.java.JavaSparkContext ctx)` Runs the tool itself after initializing and validating inputs.

Methods inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool
addReferenceFilesForSpark, addVCFsForSpark, editIntervals, getBestAvailableSequenceDictionary, getDefaultReadFilters, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotationGroups, getDefaultVariantAnnotations, getGatkReadJavaRDD, getHeaderForReads, getIntervals, getPluginDescriptors, getReadInputMergingPolicy, getReads, getReadSourceHeaderMap, getReadSourceName, getRecommendedNumReducers, getReference, getReferenceSequenceDictionary, getReferenceWindowFunction, getSequenceDictionaryValidationArgumentCollection, getTargetPartitionSize, getUnfilteredReads, hasReads, hasReference, hasUserSuppliedIntervals, makeReadFilter, makeReadFilter, makeVariantAnnotations, requiresIntervals, requiresReads, requiresReference, runPipeline, useVariantAnnotations, validateSequenceDictionaries, writeReads, writeReads

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
afterPipeline, doWork, getProgramName

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

PAIRED_INPUT_LONG_NAME

public static final java.lang.String PAIRED_INPUT_LONG_NAME

See Also:: Constant Field Values

PAIRED_INPUT_SHORT_NAME

public static final java.lang.String PAIRED_INPUT_SHORT_NAME

See Also:: Constant Field Values

UNPAIRED_INPUT_LONG_NAME

public static final java.lang.String UNPAIRED_INPUT_LONG_NAME

See Also:: Constant Field Values

UNPAIRED_INPUT_SHORT_NAME

public static final java.lang.String UNPAIRED_INPUT_SHORT_NAME

See Also:: Constant Field Values

PAIRED_OUTPUT_LONG_NAME

public static final java.lang.String PAIRED_OUTPUT_LONG_NAME

See Also:: Constant Field Values

PAIRED_OUTPUT_SHORT_NAME

public static final java.lang.String PAIRED_OUTPUT_SHORT_NAME

See Also:: Constant Field Values

UNPAIRED_OUTPUT_LONG_NAME

public static final java.lang.String UNPAIRED_OUTPUT_LONG_NAME

See Also:: Constant Field Values

UNPAIRED_OUTPUT_SHORT_NAME

public static final java.lang.String UNPAIRED_OUTPUT_SHORT_NAME

See Also:: Constant Field Values

inputPaired

@Argument(doc="Input queryname-sorted BAM containing only paired reads",
          fullName="paired-input",
          shortName="PI",
          optional=true)
public java.lang.String inputPaired

inputUnpaired

@Argument(doc="Input BAM containing only unpaired reads",
          fullName="unpaired-input",
          shortName="UI",
          optional=true)
public java.lang.String inputUnpaired

outputPaired

@Argument(doc="Output BAM containing only paired reads",
          fullName="paired-output",
          shortName="PO",
          optional=true)
public java.lang.String outputPaired

outputUnpaired

@Argument(doc="Output BAM containing only unpaired reads",
          fullName="unpaired-output",
          shortName="UO",
          optional=true)
public java.lang.String outputUnpaired

bwaArgs

@ArgumentCollection
public PSBwaArgumentCollection bwaArgs

Constructor Detail
- PathSeqBwaSpark
```
public PathSeqBwaSpark()
```

Method Detail
- runTool
```
protected void runTool(org.apache.spark.api.java.JavaSparkContext ctx)
```
  Description copied from class: GATKSparkTool
  
  Runs the tool itself after initializing and validating inputs. Must be implemented by subclasses.
  
  Specified by:
  
  runTool in class GATKSparkTool
  
  Parameters:
  
  ctx - our Spark context

Class PathSeqBwaSpark

Inputs

Output

Usage example

Local mode:

Spark cluster on Google Cloud DataProc with 6 32-core / 208GB memory worker nodes:

Notes

Nested Class Summary

Nested classes/interfaces inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool

Field Summary

Fields inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Detail

PAIRED_INPUT_LONG_NAME

PAIRED_INPUT_SHORT_NAME

UNPAIRED_INPUT_LONG_NAME

UNPAIRED_INPUT_SHORT_NAME

PAIRED_OUTPUT_LONG_NAME

PAIRED_OUTPUT_SHORT_NAME

UNPAIRED_OUTPUT_LONG_NAME

UNPAIRED_OUTPUT_SHORT_NAME

inputPaired

inputUnpaired

outputPaired

outputUnpaired

bwaArgs

Constructor Detail

PathSeqBwaSpark

Method Detail

runTool