@DocumentedFeature public final class VariantFiltration extends VariantWalker
This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Filtered records will be preserved in the output unless their removal is requested in the command line.
A filtered VCF in which passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed.
gatk VariantFiltration \ -R reference.fasta \ -V input.vcf.gz \ -O output.vcf.gz \ --filter-name "my_filter1" \ --filter-expression "AB < 0.2" \ --filter-name "my_filter2" \ --filter-expression "MQ0 > 50"
Composing filtering expressions can range from very simple to extremely complicated depending on what you're trying to do.
Compound expressions (ones that specify multiple conditions connected by &&, AND, ||, or OR, and reference multiple attributes) require special consideration. By default, variants that are missing one or more of the attributes referenced in a compound expression are treated as PASS for the entire expression, even if the variant would satisfy the filter criteria for another part of the expression. This can lead to unexpected results if any of the attributes referenced in a compound expression are present for some variants, but missing for others.
It is strongly recommended that such expressions be provided as individual arguments, each referencing a single attribute and specifying a single criteria. This ensures that all of the individual expression are applied to each variant, even if a given variant is missing values for some of the expression conditions.
As an example, multiple individual expressions provided like this:
gatk VariantFiltration \ -R reference.fasta \ -V input.vcf.gz \ -O output.vcf.gz \ --filter-name "my_filter1" \ --filter-expression "AB < 0.2" \ --filter-name "my_filter2" \ --filter-expression "MQ0 > 50"are preferable to a single compound expression such as this:
gatk VariantFiltration \ -R reference.fasta \ -V input.vcf.gz \ -O output.vcf.gz \ --filter-name "my_filter" \ --filter-expression "AB < 0.2 || MQ0 > 50"See this article about using JEXL expressions for more information.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ALLELE_SPECIFIC_LONG_NAME |
boolean |
applyForAllele |
static java.lang.String |
CLUSTER_SIZE_LONG_NAME |
static java.lang.String |
CLUSTER_WINDOW_SIZE_LONG_NAME |
static java.lang.String |
CLUSTERED_SNP_FILTER_NAME |
java.lang.Integer |
clusterSize
Works together with the --cluster-window-size argument.
|
java.lang.Integer |
clusterWindow
Works together with the --cluster-size argument.
|
java.lang.Boolean |
failMissingValues
By default, if JEXL cannot evaluate your expression for a particular record because one of the annotations is not present, the whole expression evaluates as PASSing.
|
static java.lang.String |
FILTER_EXPRESSION_LONG_NAME |
static java.lang.String |
FILTER_NAME_LONG_NAME |
static java.lang.String |
FILTER_NOT_IN_MASK_LONG_NAME |
java.util.List<java.lang.String> |
filterExpressions
VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using
--filter-name One --filter-expression "X < 1" --filter-name Two --filter-expression "X > 2").
|
java.util.List<java.lang.String> |
filterNames
This name is put in the FILTER field for variants that get filtered.
|
boolean |
filterRecordsNotInMask
By default, if the --mask argument is used, any variant falling in a mask will be filtered.
|
static java.lang.String |
GENOTYPE_FILTER_EXPRESSION_LONG_NAME |
static java.lang.String |
GENOTYPE_FILTER_NAME_LONG_NAME |
java.util.List<java.lang.String> |
genotypeFilterExpressions
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead.
|
java.util.List<java.lang.String> |
genotypeFilterNames
Similar to the INFO field based expressions, but used on the FORMAT (genotype) fields instead.
|
static java.lang.String |
INVERT_GT_LONG_NAME |
static java.lang.String |
INVERT_LONG_NAME |
boolean |
invertFilterExpression
Invert the selection criteria for --filter-expression
|
boolean |
invertGenotypeFilterExpression
Invert the selection criteria for --genotype-filter-expression
|
FeatureInput<htsjdk.tribble.Feature> |
mask
Any variant which overlaps entries from the provided mask file will be filtered.
|
static java.lang.String |
MASK_EXTENSION_LONG_NAME |
static java.lang.String |
MASK_NAME_LONG_NAME |
java.lang.Integer |
maskExtension |
java.lang.String |
maskName
When using the --mask argument, the mask-name will be annotated in the variant record.
|
static java.lang.String |
MISSING_VAL_LONG_NAME |
static java.lang.String |
NO_CALL_GTS_LONG_NAME |
GATKPath |
out |
boolean |
setFilteredGenotypesToNocall
If this argument is provided, set filtered genotypes to no-call (./.).
|
drivingVariantFile
DEFAULT_DRIVING_VARIANTS_LOOKAHEAD_BASES, genomicsDBOptions
addOutputSAMProgramRecord, addOutputVCFCommandLine, cloudIndexPrefetchBuffer, cloudPrefetchBuffer, createOutputBamIndex, createOutputBamMD5, createOutputVariantIndex, createOutputVariantMD5, disableBamIndexCaching, features, intervalArgumentCollection, lenientVCFProcessing, outputSitesOnlyVCFs, progressMeter, readArguments, referenceArguments, SECONDS_BETWEEN_PROGRESS_UPDATES_NAME, seqValidationArguments
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
VariantFiltration() |
Modifier and Type | Method and Description |
---|---|
void |
apply(htsjdk.variant.variantcontext.VariantContext variant,
ReadsContext readsContext,
ReferenceContext ref,
FeatureContext featureContext)
Process an individual variant.
|
void |
closeTool()
This method is called by the GATK framework at the end of the
GATKTool.doWork() template method. |
void |
onTraversalStart()
Operations performed just prior to the start of traversal.
|
getDrivingVariantsFeatureInput, getHeaderForVariants, getSequenceDictionaryForDrivingVariants, getSpliteratorForDrivingVariants, initializeDrivingVariants, onShutdown, onStartup, traverse
getBestAvailableSequenceDictionary, getDrivingVariantCacheLookAheadBases, getGenomicsDBOptions, getProgressMeterRecordLabel, getTransformedVariantStream, getTransformedVariantStream, makePostVariantFilterTransformer, makePreVariantFilterTransformer, makeVariantFilter, requiresFeatures
directlyAccessEngineFeatureManager, directlyAccessEngineReadsDataSource, directlyAccessEngineReferenceDataSource
addFeatureInputsAfterInitialization, createSAMWriter, createVCFWriter, createVCFWriter, createVCFWriter, doWork, getDefaultCloudIndexPrefetchBufferSize, getDefaultCloudPrefetchBufferSize, getDefaultReadFilters, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotationGroups, getDefaultVariantAnnotations, getHeaderForFeatures, getHeaderForReads, getHeaderForSAMWriter, getMasterSequenceDictionary, getPluginDescriptors, getReferenceDictionary, getSequenceDictionaryValidationArgumentCollection, getToolName, getTransformedReadStream, getTraversalIntervals, hasFeatures, hasReads, hasReference, hasUserSuppliedIntervals, initializeProgressMeter, makePostReadFilterTransformer, makePreReadFilterTransformer, makeReadFilter, makeVariantAnnotations, onTraversalSuccess, requiresIntervals, requiresReads, requiresReference, transformTraversalIntervals, useVariantAnnotations
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final java.lang.String FILTER_EXPRESSION_LONG_NAME
public static final java.lang.String FILTER_NAME_LONG_NAME
public static final java.lang.String GENOTYPE_FILTER_EXPRESSION_LONG_NAME
public static final java.lang.String GENOTYPE_FILTER_NAME_LONG_NAME
public static final java.lang.String CLUSTER_SIZE_LONG_NAME
public static final java.lang.String CLUSTER_WINDOW_SIZE_LONG_NAME
public static final java.lang.String MASK_EXTENSION_LONG_NAME
public static final java.lang.String MASK_NAME_LONG_NAME
public static final java.lang.String FILTER_NOT_IN_MASK_LONG_NAME
public static final java.lang.String MISSING_VAL_LONG_NAME
public static final java.lang.String INVERT_LONG_NAME
public static final java.lang.String INVERT_GT_LONG_NAME
public static final java.lang.String NO_CALL_GTS_LONG_NAME
public static final java.lang.String ALLELE_SPECIFIC_LONG_NAME
@Argument(fullName="mask", shortName="mask", doc="Input mask", optional=true) public FeatureInput<htsjdk.tribble.Feature> mask
@Argument(doc="File to which variants should be written", fullName="output", shortName="O", optional=false) public GATKPath out
@Argument(fullName="filter-expression", shortName="filter", doc="One or more expressions used with INFO fields to filter", optional=true) public java.util.List<java.lang.String> filterExpressions
@Argument(fullName="filter-name", doc="Names to use for the list of filters", optional=true) public java.util.List<java.lang.String> filterNames
@Argument(fullName="genotype-filter-expression", shortName="G-filter", doc="One or more expressions used with FORMAT (sample/genotype-level) fields to filter (see documentation guide for more info)", optional=true) public java.util.List<java.lang.String> genotypeFilterExpressions
@Argument(fullName="genotype-filter-name", shortName="G-filter-name", doc="Names to use for the list of sample/genotype filters (must be a 1-to-1 mapping); this name is put in the FILTER field for variants that get filtered", optional=true) public java.util.List<java.lang.String> genotypeFilterNames
@Argument(fullName="cluster-size", shortName="cluster", doc="The number of SNPs which make up a cluster. Must be at least 2", optional=true) public java.lang.Integer clusterSize
@Argument(fullName="cluster-window-size", shortName="window", doc="The window size (in bases) in which to evaluate clustered SNPs", optional=true) public java.lang.Integer clusterWindow
@Argument(fullName="mask-extension", doc="How many bases beyond records from a provided \'mask\' should variants be filtered", optional=true) public java.lang.Integer maskExtension
@Argument(fullName="mask-name", doc="The text to put in the FILTER field if a \'mask\' is provided and overlaps with a variant call", optional=true) public java.lang.String maskName
@Argument(fullName="filter-not-in-mask", doc="Filter records NOT in given input mask.", optional=true) public boolean filterRecordsNotInMask
@Argument(fullName="missing-values-evaluate-as-failing", doc="When evaluating the JEXL expressions, missing values should be considered failing the expression", optional=true) public java.lang.Boolean failMissingValues
@Argument(fullName="invert-filter-expression", shortName="invfilter", doc="Invert the selection criteria for --filter-expression", optional=true) public boolean invertFilterExpression
@Argument(fullName="invert-genotype-filter-expression", shortName="invG-filter", doc="Invert the selection criteria for --genotype-filter-expression", optional=true) public boolean invertGenotypeFilterExpression
@Argument(fullName="set-filtered-genotype-to-no-call", optional=true, doc="Set filtered genotypes to no-call") public boolean setFilteredGenotypesToNocall
@Argument(fullName="apply-allele-specific-filters", optional=true, doc="Set mask at the allele level. This option is not compatible with clustering.") public boolean applyForAllele
public static final java.lang.String CLUSTERED_SNP_FILTER_NAME
public void onTraversalStart()
GATKTool
onTraversalStart
in class GATKTool
public void apply(htsjdk.variant.variantcontext.VariantContext variant, ReadsContext readsContext, ReferenceContext ref, FeatureContext featureContext)
VariantWalker
apply
in class VariantWalker
variant
- Current variant being processed.readsContext
- Reads overlapping the current variant. Will be an empty, but non-null, context object
if there is no backing source of reads data (in which case all queries on it will return
an empty array/iterator)ref
- Reference bases spanning the current variant. Will be an empty, but non-null, context object
if there is no backing source of reference data (in which case all queries on it will return
an empty array/iterator). Can request extra bases of context around the current variant's interval
by invoking ReferenceContext.setWindow(int, int)
on this object before calling ReferenceContext.getBases()
featureContext
- Features spanning the current variant. Will be an empty, but non-null, context object
if there is no backing source of Feature data (in which case all queries on it will return an
empty List).public void closeTool()
GATKTool
GATKTool.doWork()
template method.
It is called regardless of whether the GATKTool.traverse()
has succeeded or not.
It is called after the GATKTool.onTraversalSuccess()
has completed (successfully or not)
but before the GATKTool.doWork()
method returns.
In other words, on successful runs both GATKTool.onTraversalSuccess()
and GATKTool.closeTool()
will be called (in this order) while
on failed runs (when GATKTool.traverse()
causes an exception), only GATKTool.closeTool()
will be called.
The default implementation does nothing.
Subclasses should override this method to close any resources that must be closed regardless of the success of traversal.