GATKSparkTool (gatk 4.1.4.1 API)

java.lang.Object
- org.broadinstitute.hellbender.cmdline.CommandLineProgram
- - org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
  - - org.broadinstitute.hellbender.engine.spark.GATKSparkTool

All Implemented Interfaces:

java.io.Serializable, org.broadinstitute.barclay.argparser.CommandLinePluginProvider

Direct Known Subclasses:

ApplyBQSRSpark, AssemblyRegionWalkerSpark, BaseRecalibratorSpark, BQSRPipelineSpark, BwaAndMarkDuplicatesPipelineSpark, BwaSpark, CalcMetadataSpark, CollectBaseDistributionByCycleSpark, CollectMultipleMetricsSpark, CompareDuplicatesSpark, CountBasesSpark, CountReadsSpark, CountVariantsSpark, CpxVariantReInterpreterSpark, DiscoverVariantsFromContigAlignmentsSAMSpark, ExtractOriginalAlignmentRecordsByNameSpark, ExtractSVEvidenceSpark, FindBadGenomicKmersSpark, FindBreakpointEvidenceSpark, FlagStatSpark, IntervalWalkerSpark, LocusWalkerSpark, MarkDuplicatesSpark, MeanQualityByCycleSpark, MetricsCollectorSparkTool, ParallelCopyGCSDirectoryIntoHDFSSpark, PathSeqBwaSpark, PathSeqFilterSpark, PathSeqPipelineSpark, PathSeqScoreSpark, PrintReadsSpark, QualityScoreDistributionSpark, ReadsPipelineSpark, ReadWalkerSpark, RevertSamSpark, SortSamSpark, StructuralVariationDiscoveryPipelineSpark, SvDiscoverFromLocalAssemblyContigAlignmentsSpark, VariantWalkerSpark
```
public abstract class GATKSparkTool
extends SparkCommandLineProgram
```
Base class for GATK spark tools that accept standard kinds of inputs (reads, reference, and/or intervals). Centralizes handling of tool inputs to enforce consistency, reduce duplicated boilerplate code in tools, and apply standardized validation (such as sequence dictionary validation). Spark tools that do not fit into this pattern should extend SparkCommandLineProgram directly instead of this class. USAGE: -Tools must implement runTool(org.apache.spark.api.java.JavaSparkContext). -Tools should override requiresReference(), requiresReads(), and/or requiresIntervals() as appropriate to indicate required inputs. -Tools can query whether certain inputs are present via hasReference(), hasReads(), and hasUserSuppliedIntervals(). -Tools can load the reads via getReads(), access the reference via getReference(), and access the intervals via getIntervals(). Any intervals specified are automatically applied to the reads. Input metadata is available via getHeaderForReads(), getReferenceSequenceDictionary(), and getBestAvailableSequenceDictionary(). -Tools that require a custom reference window function (extra bases of reference context around each read) may override getReferenceWindowFunction() to supply one. This function will be propagated to the reference source returned by getReference().

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class GATKSparkTool.ReadInputMergingPolicy

Nested Classes
Modifier and Type	Class and Description
`static class`	`GATKSparkTool.ReadInputMergingPolicy`

Field Summary

Fields
Modifier and Type	Field and Description
`boolean`	`addOutputVCFCommandLine`
`static java.lang.String`	`BAM_PARTITION_SIZE_LONG_NAME`
`protected long`	`bamPartitionSplitSize`
`static java.lang.String`	`CREATE_OUTPUT_BAM_SPLITTING_INDEX_LONG_NAME`
`boolean`	`createOutputBamIndex`
`boolean`	`createOutputBamSplittingIndex`
`boolean`	`createOutputVariantIndex`
`protected FeatureManager`	`features`
`protected IntervalArgumentCollection`	`intervalArgumentCollection`
`static java.lang.String`	`NUM_REDUCERS_LONG_NAME`
`protected int`	`numReducers`
`static java.lang.String`	`OUTPUT_SHARD_DIR_LONG_NAME`
`ReadInputArgumentCollection`	`readArguments`
`ReferenceInputArgumentCollection`	`referenceArguments`
`protected SequenceDictionaryValidationArgumentCollection`	`sequenceDictionaryValidationArguments`
`static java.lang.String`	`SHARDED_OUTPUT_LONG_NAME`
`protected boolean`	`shardedOutput`
`protected java.lang.String`	`shardedPartsDir`
`static java.lang.String`	`USE_NIO`
`protected boolean`	`useNio`

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY

Constructor Summary

Constructors
Constructor and Description

GATKSparkTool()

Constructors
Constructor and Description
`GATKSparkTool()`

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected static java.lang.String`	`addReferenceFilesForSpark(org.apache.spark.api.java.JavaSparkContext ctx, java.lang.String referenceFile)` Register the reference file (and associated dictionary and index) to be downloaded to every node using Spark's copying mechanism (`SparkContext#addFile()`).
`protected static java.util.List<java.lang.String>`	`addVCFsForSpark(org.apache.spark.api.java.JavaSparkContext ctx, java.util.List<java.lang.String> vcfFileNames)` Register the VCF file (and associated index) to be downloaded to every node using Spark's copying mechanism (`SparkContext#addFile()`).
`protected java.util.List<SimpleInterval>`	`editIntervals(java.util.List<SimpleInterval> rawIntervals)` Transform the intervals during loading.
`htsjdk.samtools.SAMSequenceDictionary`	`getBestAvailableSequenceDictionary()` Returns the "best available" sequence dictionary.
`java.util.List<ReadFilter>`	`getDefaultReadFilters()` Returns the default list of ReadFilters that are used for this tool.
`protected java.util.Set<htsjdk.variant.vcf.VCFHeaderLine>`	`getDefaultToolVCFHeaderLines()`
`java.util.List<java.lang.Class<? extends Annotation>>`	`getDefaultVariantAnnotationGroups()`
`java.util.List<Annotation>`	`getDefaultVariantAnnotations()`
`protected org.apache.spark.api.java.JavaRDD<GATKRead>`	`getGatkReadJavaRDD(TraversalParameters traversalParameters, ReadsSparkSource source, java.lang.String input)`
`htsjdk.samtools.SAMFileHeader`	`getHeaderForReads()`
`java.util.List<SimpleInterval>`	`getIntervals()`
`java.util.List<? extends org.broadinstitute.barclay.argparser.CommandLinePluginDescriptor<?>>`	`getPluginDescriptors()` Return the list of GATKCommandLinePluginDescriptor objects to be used for this CLP.
`GATKSparkTool.ReadInputMergingPolicy`	`getReadInputMergingPolicy()` Does this tool support multiple inputs? Tools that do should override this method with the desired `GATKSparkTool.ReadInputMergingPolicy`.
`org.apache.spark.api.java.JavaRDD<GATKRead>`	`getReads()` Loads the reads into a `JavaRDD` using the intervals specified, and filters them using the filter returned by `makeReadFilter()`.
`protected java.util.LinkedHashMap<java.lang.String,htsjdk.samtools.SAMFileHeader>`	`getReadSourceHeaderMap()` Returns a map of read input to header.
`protected java.util.List<java.lang.String>`	`getReadSourceName()` Returns the name of the source of reads data.
`int`	`getRecommendedNumReducers()` Return the recommended number of reducers for a pipeline processing the reads.
`ReferenceMultiSparkSource`	`getReference()`
`htsjdk.samtools.SAMSequenceDictionary`	`getReferenceSequenceDictionary()`
`SerializableFunction<GATKRead,SimpleInterval>`	`getReferenceWindowFunction()` Window function that controls how much reference context to return for each read when using the reference source returned by `getReference()`.
`protected SequenceDictionaryValidationArgumentCollection`	`getSequenceDictionaryValidationArgumentCollection()` subclasses can override this to provide different default behavior for sequence dictionary validation
`int`	`getTargetPartitionSize()` Returns the size of each input partition (in bytes) that is used to determine the recommended number of reducers for running a processing pipeline.
`org.apache.spark.api.java.JavaRDD<GATKRead>`	`getUnfilteredReads()` Loads the reads into a `JavaRDD` using the intervals specified, and returns them without applying any filtering.
`boolean`	`hasReads()` Are sources of reads available?
`boolean`	`hasReference()` Is a source of reference data available?
`boolean`	`hasUserSuppliedIntervals()` Are sources of intervals available?
`ReadFilter`	`makeReadFilter()` Returns a read filter (simple or composite) that can be applied to the reads returned from `getReads()`.
`protected ReadFilter`	`makeReadFilter(htsjdk.samtools.SAMFileHeader samFileHeader)` Like `makeReadFilter()` but with the ability to pass a different SAMFileHeader.
`java.util.Collection<Annotation>`	`makeVariantAnnotations()`
`boolean`	`requiresIntervals()` Does this tool require intervals? Tools that do should override to return true.
`boolean`	`requiresReads()` Does this tool require reads? Tools that do should override to return true.
`boolean`	`requiresReference()` Does this tool require reference data? Tools that do should override to return true.
`protected void`	`runPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext)` Runs the pipeline.
`protected abstract void`	`runTool(org.apache.spark.api.java.JavaSparkContext ctx)` Runs the tool itself after initializing and validating inputs.
`boolean`	`useVariantAnnotations()`
`protected void`	`validateSequenceDictionaries()` Validates standard tool inputs against each other.
`void`	`writeReads(org.apache.spark.api.java.JavaSparkContext ctx, java.lang.String outputFile, org.apache.spark.api.java.JavaRDD<GATKRead> reads)` Writes the reads from a `JavaRDD` to an output file.
`void`	`writeReads(org.apache.spark.api.java.JavaSparkContext ctx, java.lang.String outputFile, org.apache.spark.api.java.JavaRDD<GATKRead> reads, htsjdk.samtools.SAMFileHeader header, boolean sortReadsToHeader)` Writes the reads from a `JavaRDD` to an output file.

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
afterPipeline, doWork, getProgramName

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - BAM_PARTITION_SIZE_LONG_NAME
```
public static final java.lang.String BAM_PARTITION_SIZE_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - NUM_REDUCERS_LONG_NAME
```
public static final java.lang.String NUM_REDUCERS_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - SHARDED_OUTPUT_LONG_NAME
```
public static final java.lang.String SHARDED_OUTPUT_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - OUTPUT_SHARD_DIR_LONG_NAME
```
public static final java.lang.String OUTPUT_SHARD_DIR_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - CREATE_OUTPUT_BAM_SPLITTING_INDEX_LONG_NAME
```
public static final java.lang.String CREATE_OUTPUT_BAM_SPLITTING_INDEX_LONG_NAME
```
    See Also:
    
    Constant Field Values
  - USE_NIO
```
public static final java.lang.String USE_NIO
```
    See Also:
    
    Constant Field Values
  - referenceArguments
```
@ArgumentCollection
public final ReferenceInputArgumentCollection referenceArguments
```
  - readArguments
```
@ArgumentCollection
public final ReadInputArgumentCollection readArguments
```
  - intervalArgumentCollection
```
@ArgumentCollection
protected IntervalArgumentCollection intervalArgumentCollection
```
  - bamPartitionSplitSize
```
@Argument(doc="maximum number of bytes to read from a file into each partition of reads. Setting this higher will result in fewer partitions. Note that this will not be equal to the size of the partition in memory. Defaults to 0, which uses the default split size (determined by the Hadoop input format, typically the size of one HDFS block).",
          fullName="bam-partition-size",
          optional=true)
protected long bamPartitionSplitSize
```
  - useNio
```
@Argument(doc="Whether to use NIO or the Hadoop filesystem (default) for reading files. (Note that the Hadoop filesystem is always used for writing files.)",
          fullName="use-nio",
          optional=true)
protected boolean useNio
```
  - sequenceDictionaryValidationArguments
```
@ArgumentCollection
protected SequenceDictionaryValidationArgumentCollection sequenceDictionaryValidationArguments
```
  - addOutputVCFCommandLine
```
@Argument(fullName="add-output-vcf-command-line",
          shortName="add-output-vcf-command-line",
          doc="If true, adds a command line header line to created VCF files.",
          optional=true,
          common=true)
public boolean addOutputVCFCommandLine
```
  - shardedOutput
```
@Argument(doc="For tools that write an output, write the output in multiple pieces (shards)",
          fullName="sharded-output",
          optional=true,
          mutex="output-shard-tmp-dir")
protected boolean shardedOutput
```
  - shardedPartsDir
```
@Argument(doc="when writing a bam, in single sharded mode this directory to write the temporary intermediate output shards, if not specified .parts/ will be used",
          fullName="output-shard-tmp-dir",
          optional=true,
          mutex="sharded-output")
protected java.lang.String shardedPartsDir
```
  - numReducers
```
@Argument(doc="For tools that shuffle data or write an output, sets the number of reducers. Defaults to 0, which gives one partition per 10MB of input.",
          fullName="num-reducers",
          optional=true)
protected int numReducers
```
  - createOutputBamIndex
```
@Argument(fullName="create-output-bam-index",
          shortName="OBI",
          doc="If true, create a BAM index when writing a coordinate-sorted BAM file.",
          optional=true,
          common=true)
public boolean createOutputBamIndex
```
  - createOutputBamSplittingIndex
```
@Argument(fullName="create-output-bam-splitting-index",
          doc="If true, create a BAM splitting index (SBI) when writing a coordinate-sorted BAM file.",
          optional=true,
          common=true)
public boolean createOutputBamSplittingIndex
```
  - createOutputVariantIndex
```
@Argument(fullName="create-output-variant-index",
          shortName="OVI",
          doc="If true, create a VCF index when writing a coordinate-sorted VCF file.",
          optional=true,
          common=true)
public boolean createOutputVariantIndex
```
  - features
```
protected FeatureManager features
```
- Constructor Detail
  - GATKSparkTool
```
public GATKSparkTool()
```
- Method Detail
  - getPluginDescriptors
```
public java.util.List<? extends org.broadinstitute.barclay.argparser.CommandLinePluginDescriptor<?>> getPluginDescriptors()
```
    Return the list of GATKCommandLinePluginDescriptor objects to be used for this CLP. Use the read filter plugin.
    
    Specified by:
    
    getPluginDescriptors in interface org.broadinstitute.barclay.argparser.CommandLinePluginProvider
    
    Overrides:
    
    getPluginDescriptors in class CommandLineProgram
  - requiresReference
```
public boolean requiresReference()
```
    Does this tool require reference data? Tools that do should override to return true.
    
    Returns:
    
    true if this tool requires a reference, otherwise false
  - requiresReads
```
public boolean requiresReads()
```
    Does this tool require reads? Tools that do should override to return true.
    
    Returns:
    
    true if this tool requires reads, otherwise false
  - getReadInputMergingPolicy
```
public GATKSparkTool.ReadInputMergingPolicy getReadInputMergingPolicy()
```
    Does this tool support multiple inputs? Tools that do should override this method with the desired GATKSparkTool.ReadInputMergingPolicy.
    
    Returns:
    
    doNotMerge by default
  - requiresIntervals
```
public boolean requiresIntervals()
```
    Does this tool require intervals? Tools that do should override to return true.
    
    Returns:
    
    true if this tool requires intervals, otherwise false
  - hasReference
```
public final boolean hasReference()
```
    Is a source of reference data available?
    
    Returns:
    
    true if a reference is available, otherwise false
  - hasReads
```
public final boolean hasReads()
```
    Are sources of reads available?
    
    Returns:
    
    true if reads are available, otherwise false
  - hasUserSuppliedIntervals
```
public final boolean hasUserSuppliedIntervals()
```
    Are sources of intervals available?
    
    Returns:
    
    true if intervals are available, otherwise false
  - getReferenceWindowFunction
```
public SerializableFunction<GATKRead,SimpleInterval> getReferenceWindowFunction()
```
    Window function that controls how much reference context to return for each read when using the reference source returned by getReference(). Tools should override as appropriate. The default function is the identity function (ie., return exactly the reference bases that span each read).
    
    Returns:
    
    reference window function used to initialize the reference source
  - getSequenceDictionaryValidationArgumentCollection
```
protected SequenceDictionaryValidationArgumentCollection getSequenceDictionaryValidationArgumentCollection()
```
    subclasses can override this to provide different default behavior for sequence dictionary validation
    
    Returns:
    
    a SequenceDictionaryValidationArgumentCollection
  - getBestAvailableSequenceDictionary
```
public htsjdk.samtools.SAMSequenceDictionary getBestAvailableSequenceDictionary()
```
    Returns the "best available" sequence dictionary. This will be the reference sequence dictionary if there is a reference, otherwise it will be the sequence dictionary constructed from the reads if there are reads, otherwise it will be null. TODO: check interval file(s) as well for a sequence dictionary
    
    Returns:
    
    best available sequence dictionary given our inputs
  - getReferenceSequenceDictionary
```
public htsjdk.samtools.SAMSequenceDictionary getReferenceSequenceDictionary()
```
    Returns:
    
    sequence dictionary for the reference, or null if there is no reference available
  - getHeaderForReads
```
public htsjdk.samtools.SAMFileHeader getHeaderForReads()
```
    Returns:
    
    header for the reads, or null if there are no reads available
  - getReads
```
public org.apache.spark.api.java.JavaRDD<GATKRead> getReads()
```
    Loads the reads into a JavaRDD using the intervals specified, and filters them using the filter returned by makeReadFilter(). If no intervals were specified, returns all the reads (both mapped and unmapped).
    
    Returns:
    
    all reads from our reads input(s) as a JavaRDD, bounded by intervals if specified, and filtered using the filter from makeReadFilter().
  - getUnfilteredReads
```
public org.apache.spark.api.java.JavaRDD<GATKRead> getUnfilteredReads()
```
    Loads the reads into a JavaRDD using the intervals specified, and returns them without applying any filtering. If no intervals were specified, returns all the reads (both mapped and unmapped).
    
    Returns:
    
    all reads from our reads input(s) as a JavaRDD, bounded by intervals if specified, and unfiltered.
  - getGatkReadJavaRDD
```
protected org.apache.spark.api.java.JavaRDD<GATKRead> getGatkReadJavaRDD(TraversalParameters traversalParameters,
                                                                         ReadsSparkSource source,
                                                                         java.lang.String input)
```
  - writeReads
```
public void writeReads(org.apache.spark.api.java.JavaSparkContext ctx,
                       java.lang.String outputFile,
                       org.apache.spark.api.java.JavaRDD<GATKRead> reads)
```
    Writes the reads from a JavaRDD to an output file.
    
    Parameters:
    
    ctx - the JavaSparkContext to write.
    
    outputFile - path to the output bam/cram.
    
    reads - reads to write.
  - writeReads
```
public void writeReads(org.apache.spark.api.java.JavaSparkContext ctx,
                       java.lang.String outputFile,
                       org.apache.spark.api.java.JavaRDD<GATKRead> reads,
                       htsjdk.samtools.SAMFileHeader header,
                       boolean sortReadsToHeader)
```
    Writes the reads from a JavaRDD to an output file.
    
    Parameters:
    
    ctx - the JavaSparkContext to write.
    
    outputFile - path to the output bam/cram.
    
    reads - reads to write.
    
    header - the header to write.
  - getRecommendedNumReducers
```
public int getRecommendedNumReducers()
```
    Return the recommended number of reducers for a pipeline processing the reads. The number is calculated by finding the total size (in bytes) of all the files in the input path, then dividing by the target split size (determined by getTargetPartitionSize(). Subclasses that want to control the recommended number of reducers should typically override getTargetPartitionSize() rather than this method.
    
    Returns:
    
    the recommended number of reducers
  - getTargetPartitionSize
```
public int getTargetPartitionSize()
```
    Returns the size of each input partition (in bytes) that is used to determine the recommended number of reducers for running a processing pipeline. The larger the number of reducers used, the smaller the amount of memory each one needs. Defaults to 10MB, but subclasses can override to change the value. Memory intensive pipelines should decrease the partition size, while pipelines with lighter memory requirements may increase the partition size.
  - makeReadFilter
```
public ReadFilter makeReadFilter()
```
    Returns a read filter (simple or composite) that can be applied to the reads returned from getReads(). This implementation combines the default read filters for this tool (returned by getDefaultReadFilters() along with any read filter command line directives specified by the user (such as enabling other filters or disabling default filters); and returns a single composite filter resulting from the list by and'ing them together. NOTE: Most tools will not need to override the method, and should only do so in order to provide custom behavior or processing of the final merged read filter. To change the default read filters used by the tool, override getDefaultReadFilters() instead. Multiple filters can be composed by using ReadFilter composition methods.
  - makeReadFilter
```
protected ReadFilter makeReadFilter(htsjdk.samtools.SAMFileHeader samFileHeader)
```
    Like makeReadFilter() but with the ability to pass a different SAMFileHeader.
  - getDefaultReadFilters
```
public java.util.List<ReadFilter> getDefaultReadFilters()
```
    Returns the default list of ReadFilters that are used for this tool. The filters returned by this method are subject to selective enabling/disabling by the user via the command line. The default implementation uses the WellformedReadFilter filter with all default options. Subclasses can override to provide alternative filters. Note: this method is called before command line parsing begins, and thus before a SAMFileHeader is available through getHeaderForReads(). The actual SAMFileHeader is propagated to the read filters by makeReadFilter() after the filters have been merged with command line arguments.
    
    Returns:
    
    List of individual filters to be applied for this tool.
  - useVariantAnnotations
```
public boolean useVariantAnnotations()
```
    See Also:
    
    GATKTool.useVariantAnnotations()
  - getDefaultVariantAnnotations
```
public java.util.List<Annotation> getDefaultVariantAnnotations()
```
    See Also:
    
    GATKTool.getDefaultVariantAnnotations()
  - getDefaultVariantAnnotationGroups
```
public java.util.List<java.lang.Class<? extends Annotation>> getDefaultVariantAnnotationGroups()
```
    See Also:
    
    GATKTool.getDefaultVariantAnnotationGroups()
  - getDefaultToolVCFHeaderLines
```
protected java.util.Set<htsjdk.variant.vcf.VCFHeaderLine> getDefaultToolVCFHeaderLines()
```
    Returns:
    
    If addOutputVCFCommandLine is true, a set of VCF header lines containing the tool name, version, date and command line, otherwise an empty set.
  - makeVariantAnnotations
```
public java.util.Collection<Annotation> makeVariantAnnotations()
```
    See Also:
    
    GATKTool.makeVariantAnnotations()
  - getReadSourceName
```
protected java.util.List<java.lang.String> getReadSourceName()
```
    Returns the name of the source of reads data. It can be a file name or URL.
  - getReadSourceHeaderMap
```
protected java.util.LinkedHashMap<java.lang.String,htsjdk.samtools.SAMFileHeader> getReadSourceHeaderMap()
```
    Returns a map of read input to header.
  - getReference
```
public ReferenceMultiSparkSource getReference()
```
    Returns:
    
    our reference source, or null if no reference is present
  - getIntervals
```
public java.util.List<SimpleInterval> getIntervals()
```
    Returns:
    
    our intervals, or null if no intervals were specified
  - runPipeline
```
protected void runPipeline(org.apache.spark.api.java.JavaSparkContext sparkContext)
```
    Description copied from class: SparkCommandLineProgram
    
    Runs the pipeline.
    
    Specified by:
    
    runPipeline in class SparkCommandLineProgram
  - editIntervals
```
protected java.util.List<SimpleInterval> editIntervals(java.util.List<SimpleInterval> rawIntervals)
```
    Transform the intervals during loading. Developers can override this method to do custom interval handling during initialization of their GATKSparkTool
    
    Parameters:
    
    rawIntervals - Intervals specified on command line by user (-L). Can be null
    
    Returns:
    
    Transformed set of intervals. Allowed to return non-null, if null was specified in the input.
  - validateSequenceDictionaries
```
protected void validateSequenceDictionaries()
```
    Validates standard tool inputs against each other.
  - addReferenceFilesForSpark
```
protected static java.lang.String addReferenceFilesForSpark(org.apache.spark.api.java.JavaSparkContext ctx,
                                                            java.lang.String referenceFile)
```
    Register the reference file (and associated dictionary and index) to be downloaded to every node using Spark's copying mechanism (SparkContext#addFile()).
    
    Parameters:
    
    ctx - the Spark context
    
    referenceFile - the reference file, can be a local file or a remote path
    
    Returns:
    
    the reference file name; the absolute path of the file can be found by a Spark task using SparkFiles#get()
  - addVCFsForSpark
```
protected static java.util.List<java.lang.String> addVCFsForSpark(org.apache.spark.api.java.JavaSparkContext ctx,
                                                                  java.util.List<java.lang.String> vcfFileNames)
```
    Register the VCF file (and associated index) to be downloaded to every node using Spark's copying mechanism (SparkContext#addFile()).
    
    Parameters:
    
    ctx - the Spark context
    
    vcfFileNames - the VCF files, can be local files or remote paths
    
    Returns:
    
    the reference file name; the absolute path of the file can be found by a Spark task using SparkFiles#get()
  - runTool
```
protected abstract void runTool(org.apache.spark.api.java.JavaSparkContext ctx)
```
    Runs the tool itself after initializing and validating inputs. Must be implemented by subclasses.
    
    Parameters:
    
    ctx - our Spark context

Class GATKSparkTool

Nested Class Summary

Field Summary

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Detail

BAM_PARTITION_SIZE_LONG_NAME

NUM_REDUCERS_LONG_NAME

SHARDED_OUTPUT_LONG_NAME

OUTPUT_SHARD_DIR_LONG_NAME

CREATE_OUTPUT_BAM_SPLITTING_INDEX_LONG_NAME

USE_NIO

referenceArguments

readArguments

intervalArgumentCollection

bamPartitionSplitSize

useNio

sequenceDictionaryValidationArguments

addOutputVCFCommandLine

shardedOutput

shardedPartsDir

numReducers

createOutputBamIndex

createOutputBamSplittingIndex

createOutputVariantIndex

features

Constructor Detail

GATKSparkTool

Method Detail

getPluginDescriptors

requiresReference

requiresReads

getReadInputMergingPolicy

requiresIntervals

hasReference

hasReads

hasUserSuppliedIntervals

getReferenceWindowFunction

getSequenceDictionaryValidationArgumentCollection

getBestAvailableSequenceDictionary

getReferenceSequenceDictionary

getHeaderForReads

getReads

getUnfilteredReads

getGatkReadJavaRDD

writeReads

writeReads

getRecommendedNumReducers

getTargetPartitionSize

makeReadFilter

makeReadFilter

getDefaultReadFilters

useVariantAnnotations

getDefaultVariantAnnotations

getDefaultVariantAnnotationGroups

getDefaultToolVCFHeaderLines

makeVariantAnnotations

getReadSourceName

getReadSourceHeaderMap

getReference

getIntervals

runPipeline

editIntervals

validateSequenceDictionaries

addReferenceFilesForSpark

addVCFsForSpark

runTool