@DocumentedFeature public final class BaseRecalibrator extends ReadWalker
This walker generates tables based on specified covariates. It does a by-locus traversal operating only at sites that are in the known sites VCF. ExAc, gnomAD, or dbSNP resources can be used as known sites of variation. We assume that all reference mismatches we see are therefore errors and indicative of poor base quality. Since there is a large amount of data one can then calculate an empirical probability of error given the particular covariates seen at this site, where p(error) = num mismatches / num observations. The output file is a table (of the several covariate values, num observations, num mismatches, empirical quality score).
The input read data whose base quality scores need to be assessed.
A database of known polymorphic sites to skip over.
A GATK Report file with many tables:
gatk BaseRecalibrator \ -I my_reads.bam \ -R reference.fasta \ --known-sites sites_of_variation.vcf \ --known-sites another/optional/setOfSitesToMask.vcf \ -O recal_data.table
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
KNOWN_SITES_ARG_FULL_NAME |
protected static org.apache.logging.log4j.Logger |
logger |
static java.lang.String |
USAGE_ONE_LINE_SUMMARY |
static java.lang.String |
USAGE_SUMMARY |
FEATURE_CACHE_LOOKAHEAD
addOutputSAMProgramRecord, addOutputVCFCommandLine, cloudIndexPrefetchBuffer, cloudPrefetchBuffer, createOutputBamIndex, createOutputBamMD5, createOutputVariantIndex, createOutputVariantMD5, disableBamIndexCaching, intervalArgumentCollection, lenientVCFProcessing, outputSitesOnlyVCFs, progressMeter, readArguments, referenceArguments, SECONDS_BETWEEN_PROGRESS_UPDATES_NAME
GATK_CONFIG_FILE, NIO_MAX_REOPENS, QUIET, specialArgumentsCollection, TMP_DIR, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
BaseRecalibrator() |
Modifier and Type | Method and Description |
---|---|
void |
apply(GATKRead read,
ReferenceContext ref,
FeatureContext featureContext)
For each read at this locus get the various covariate values and increment that location in the map based on
whether or not the base matches the reference at this particular location
|
static java.util.List<ReadFilter> |
getBQSRSpecificReadFilterList()
Return the list of basic, raw read filters used for BQSR contexts, not including WellFormed.
|
java.util.List<ReadFilter> |
getDefaultReadFilters()
Returns the default list of CommandLineReadFilters that are used for this tool.
|
static java.util.List<ReadFilter> |
getStandardBQSRReadFilterList()
Return the full list of raw read filters used for BQSR contexts, including WellFormed.
|
void |
onTraversalStart()
Parse the -cov arguments and create a list of covariates to be used here
Based on the covariates' estimates for initial capacity allocate the data hashmap
|
java.lang.Object |
onTraversalSuccess()
Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal).
|
boolean |
requiresReference()
Does this tool require reference data? Traversals types and/or tools that do should override to return true.
|
getProgressMeterRecordLabel, onShutdown, onStartup, requiresReads, traverse
addFeatureInputsAfterInitialization, closeTool, createSAMWriter, createSAMWriter, createVCFWriter, doWork, getBestAvailableSequenceDictionary, getDefaultCloudIndexPrefetchBufferSize, getDefaultCloudPrefetchBufferSize, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotationGroups, getDefaultVariantAnnotations, getHeaderForFeatures, getHeaderForReads, getHeaderForSAMWriter, getMasterSequenceDictionary, getPluginDescriptors, getReferenceDictionary, getSequenceDictionaryValidationArgumentCollection, getToolkitShortName, getToolName, getTransformedReadStream, hasFeatures, hasIntervals, hasReads, hasReference, makePostReadFilterTransformer, makePreReadFilterTransformer, makeReadFilter, makeVariantAnnotations, requiresFeatures, requiresIntervals, useVariantAnnotations
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final java.lang.String USAGE_ONE_LINE_SUMMARY
public static final java.lang.String USAGE_SUMMARY
public static final java.lang.String KNOWN_SITES_ARG_FULL_NAME
protected static final org.apache.logging.log4j.Logger logger
public boolean requiresReference()
GATKTool
requiresReference
in class GATKTool
public void onTraversalStart()
onTraversalStart
in class GATKTool
public java.util.List<ReadFilter> getDefaultReadFilters()
ReadWalker
WellformedReadFilter
filter with all default options. Subclasses
can override to provide alternative filters.
Note: this method is called before command line parsing begins, and thus before a SAMFileHeader is
available through {link #getHeaderForReads}.getDefaultReadFilters
in class ReadWalker
public static java.util.List<ReadFilter> getStandardBQSRReadFilterList()
public static java.util.List<ReadFilter> getBQSRSpecificReadFilterList()
public void apply(GATKRead read, ReferenceContext ref, FeatureContext featureContext)
apply
in class ReadWalker
read
- current readref
- Reference bases spanning the current read. Will be an empty, but non-null, context object
if there is no backing source of reference data (in which case all queries on it will return
an empty array/iterator). Can request extra bases of context around the current read's interval
by invoking ReferenceContext.setWindow(int, int)
on this object before calling ReferenceContext.getBases()
featureContext
- Features spanning the current read. Will be an empty, but non-null, context object
if there is no backing source of Feature data (in which case all queries on it will return an
empty List).public java.lang.Object onTraversalSuccess()
GATKTool
onTraversalSuccess
in class GATKTool