BaseRecalibratorSpark (gatk 4.0.5.0 API)

java.lang.Object
- org.broadinstitute.hellbender.cmdline.CommandLineProgram
- - org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
  - - org.broadinstitute.hellbender.engine.spark.GATKSparkTool
    - - org.broadinstitute.hellbender.tools.spark.BaseRecalibratorSpark

All Implemented Interfaces:

java.io.Serializable, org.broadinstitute.barclay.argparser.CommandLinePluginProvider
```
@DocumentedFeature
 @BetaFeature
public class BaseRecalibratorSpark
extends GATKSparkTool
```
Spark version of the first pass of the base quality score recalibration. Generates a recalibration table based on various covariates. The default covariates are read group, reported quality score, machine cycle, and nucleotide context.
This walker generates tables based on specified covariates. It does a by-locus traversal operating only at sites that are not in the known-sites resource. ExAc, gnomAD, or dbSNP resources can be used as known sites of variation. We assume that all reference mismatches we see are therefore errors and indicative of poor base quality. Since there is a large amount of data one can then calculate an empirical probability of error given the particular covariates seen at this site, where p(error) = num mismatches / num observations. The output file is a table (of the several covariate values, num observations, num mismatches, empirical quality score).

Input
1. The input read data whose base quality scores need to be assessed.
2. A database of known polymorphic sites to skip over.
Output

A GATK Report file with many tables:
1. The list of arguments
2. The quantized qualities table
3. The recalibration table by read group
4. The recalibration table by quality score
5. The recalibration table for all the optional covariates
The GATK Report is intended to be easy to read by humans or computers. Check out the documentation of the GATKReport to learn how to manipulate this table.

Examples
```
 gatk BaseRecalibratorSpark \
   -I gs://my-gcs-bucket/my_reads.bam \
   -R gs://my-gcs-bucket/reference.fasta \
   --known-sites gs://my-gcs-bucket/sites_of_variation.vcf \
   --known-sites gs://my-gcs-bucket/another/optional/setOfSitesToMask.vcf \
   -O gs://my-gcs-bucket/recal_data.table \
   -- \
   --sparkRunner GCS \
   --cluster my-dataproc-cluster
 
```
See Also:

Serialized Form

Field Summary

Fields
Modifier and Type Field and Description

int readShardPadding

int readShardSize
- Fields inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool
  BAM_PARTITION_SIZE_LONG_NAME, bamPartitionSplitSize, features, intervalArgumentCollection, NUM_REDUCERS_LONG_NAME, numReducers, OUTPUT_SHARD_DIR_LONG_NAME, readArguments, referenceArguments, sequenceDictionaryValidationArguments, SHARDED_OUTPUT_LONG_NAME, shardedOutput, shardedPartsDir
- Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
  programName, SPARK_PROGRAM_NAME_LONG_NAME, sparkArgs
- Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
  GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, QUIET, specialArgumentsCollection, TMP_DIR, useJdkDeflater, useJdkInflater, VERBOSITY

Fields
Modifier and Type	Field and Description
`int`	`readShardPadding`
`int`	`readShardSize`

Constructor Summary

Constructors
Constructor and Description

BaseRecalibratorSpark()

Constructors
Constructor and Description
`BaseRecalibratorSpark()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.util.List<ReadFilter>`	`getDefaultReadFilters()` Returns the default list of ReadFilters that are used for this tool.
`SerializableFunction<GATKRead,SimpleInterval>`	`getReferenceWindowFunction()` Window function that controls how much reference context to return for each read when using the reference source returned by `GATKSparkTool.getReference()`.
`boolean`	`requiresReads()` Does this tool require reads? Tools that do should override to return true.
`boolean`	`requiresReference()` Does this tool require reference data? Tools that do should override to return true.
`protected void`	`runTool(org.apache.spark.api.java.JavaSparkContext ctx)` Runs the tool itself after initializing and validating inputs.

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram
afterPipeline, doWork, getProgramName

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - readShardSize
```
@Argument(fullName="read-shard-size",
          doc="Maximum size of each read shard, in bases. Only applies when using the OVERLAPS_PARTITIONER join strategy.",
          optional=true)
public int readShardSize
```
  - readShardPadding
```
@Argument(fullName="read-shard-padding",
          doc="Each read shard has this many bases of extra context on each side. Only applies when using the OVERLAPS_PARTITIONER join strategy.",
          optional=true)
public int readShardPadding
```
- Constructor Detail
  - BaseRecalibratorSpark
```
public BaseRecalibratorSpark()
```
- Method Detail
  - requiresReads
```
public boolean requiresReads()
```
    Description copied from class: GATKSparkTool
    
    Does this tool require reads? Tools that do should override to return true.
    
    Overrides:
    
    requiresReads in class GATKSparkTool
    
    Returns:
    
    true if this tool requires reads, otherwise false
  - requiresReference
```
public boolean requiresReference()
```
    Description copied from class: GATKSparkTool
    
    Does this tool require reference data? Tools that do should override to return true.
    
    Overrides:
    
    requiresReference in class GATKSparkTool
    
    Returns:
    
    true if this tool requires a reference, otherwise false
  - getReferenceWindowFunction
```
public SerializableFunction<GATKRead,SimpleInterval> getReferenceWindowFunction()
```
    Description copied from class: GATKSparkTool
    
    Window function that controls how much reference context to return for each read when using the reference source returned by GATKSparkTool.getReference(). Tools should override as appropriate. The default function is the identity function (ie., return exactly the reference bases that span each read).
    
    Overrides:
    
    getReferenceWindowFunction in class GATKSparkTool
    
    Returns:
    
    reference window function used to initialize the reference source
  - getDefaultReadFilters
```
public java.util.List<ReadFilter> getDefaultReadFilters()
```
    Description copied from class: GATKSparkTool
    
    Returns the default list of ReadFilters that are used for this tool. The filters returned by this method are subject to selective enabling/disabling by the user via the command line. The default implementation uses the WellformedReadFilter filter with all default options. Subclasses can override to provide alternative filters. Note: this method is called before command line parsing begins, and thus before a SAMFileHeader is available through GATKSparkTool.getHeaderForReads(). The actual SAMFileHeader is propagated to the read filters by GATKSparkTool.makeReadFilter() after the filters have been merged with command line arguments.
    
    Overrides:
    
    getDefaultReadFilters in class GATKSparkTool
    
    Returns:
    
    List of individual filters to be applied for this tool.
  - runTool
```
protected void runTool(org.apache.spark.api.java.JavaSparkContext ctx)
```
    Description copied from class: GATKSparkTool
    
    Runs the tool itself after initializing and validating inputs. Must be implemented by subclasses.
    
    Specified by:
    
    runTool in class GATKSparkTool
    
    Parameters:
    
    ctx - our Spark context

Class BaseRecalibratorSpark

Input

Output

Examples

Field Summary

Fields inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool

Fields inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Fields inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class org.broadinstitute.hellbender.engine.spark.GATKSparkTool

Methods inherited from class org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram

Methods inherited from class org.broadinstitute.hellbender.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Detail

readShardSize

readShardPadding

Constructor Detail

BaseRecalibratorSpark

Method Detail

requiresReads

requiresReference

getReferenceWindowFunction

getDefaultReadFilters

runTool