@DocumentedFeature public final class CombineGVCFs extends MultiVariantWalkerGroupedOnStart
CombineGVCFs is meant to be used for merging of GVCFs that will eventually be input into GenotypeGVCFs. One could use this tool to genotype multiple individual GVCFs instead of GenomicsDBImport; one would first use CombineGVCFs to combine them into a single GVCF and pass the results into GenotypeGVCFs. The main advantage of using CombineGVCFs over GenomicsDBImport is the ability to combine multiple intervals at once without building a GenomicsDB. CombineGVCFs is slower than GenomicsDBImport though, so it is recommended CombineGVCFs only be used when there are few samples to merge.
Two or more HaplotypeCaller GVCFs to combine.
A combined multi-sample gVCF.
gatk CombineGVCFs \ -R reference.fasta \ --variant sample1.g.vcf.gz \ --variant sample2.g.vcf.gz \ -O cohort.g.vcf.gz
Only GVCF files produced by HaplotypeCaller (or CombineGVCFs) can be used as input for this tool. Some other programs produce files that they call GVCFs but those lack some important information (accurate genotype likelihoods for every position) that GenotypeGVCFs requires for its operation.
If the GVCF files contain allele specific annotations, add `-G Standard -G AS_Standard` to the command line.
Users generating large callsets (1000+ samples) may prefer GenomicsDBImport, which uses Intel's GenomicsDB and is capable of scaling to much larger sample sizes than CombineGVCFs. This tool provides a pure java reference implementation of the combine operation which is available on all architectures.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ALLELE_FRACTION_DELTA_LONG_NAME |
static java.lang.String |
BP_RES_LONG_NAME |
static java.lang.String |
BREAK_BANDS_LONG_NAME |
protected DbsnpArgumentCollection |
dbsnp
The rsIDs from this file are used to populate the ID column of the output.
|
static java.lang.String |
DROP_SOMATIC_FILTERING_ANNOTATIONS_LONG_NAME |
protected boolean |
dropSomaticFilteringAnnotations
Rather than move the per-sample INFO annotations used for filtering to the FORMAT field, drop them entirely.
|
protected int |
multipleAtWhichToBreakBands
To reduce file sizes our gVCFs group similar reference positions into bands.
|
static java.lang.String |
SOMATIC_INPUT_LONG_NAME |
protected boolean |
somaticInput
Merge somatic GVCFs, retaining LOD and haplotype event count information in FORMAT field
Note that the Mutect2 reference confidence mode is in BETA -- the likelihoods model and output format are subject to change in subsequent versions.
|
protected boolean |
useBpResolution |
COMBINE_VARIANTS_DISTANCE, distanceToCombineVariants, IGNORE_VARIANTS_THAT_START_OUTSIDE_INTERVAL, MAX_COMBINED_DISTANCE, maxCombinedDistance, REFERENCE_WINDOW_PADDING, referenceWindowPadding
multiVariantInputArgumentCollection
DEFAULT_DRIVING_VARIANTS_LOOKAHEAD_BASES, genomicsDBOptions
addOutputSAMProgramRecord, addOutputVCFCommandLine, cloudIndexPrefetchBuffer, cloudPrefetchBuffer, createOutputBamIndex, createOutputBamMD5, createOutputVariantIndex, createOutputVariantMD5, disableBamIndexCaching, features, intervalArgumentCollection, lenientVCFProcessing, outputSitesOnlyVCFs, progressMeter, readArguments, referenceArguments, SECONDS_BETWEEN_PROGRESS_UPDATES_NAME, seqValidationArguments
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
CombineGVCFs() |
Modifier and Type | Method and Description |
---|---|
void |
apply(java.util.List<htsjdk.variant.variantcontext.VariantContext> variantContexts,
ReferenceContext referenceContext,
java.util.List<ReadsContext> readsContexts)
This method must be implemented by tool authors.
|
void |
closeTool()
This method is called by the GATK framework at the end of the
GATKTool.doWork() template method. |
java.util.List<java.lang.Class<? extends Annotation>> |
getDefaultVariantAnnotationGroups()
Returns the default list of annotation groups that are used for this tool.
|
protected static java.util.Set<java.lang.Integer> |
getIntermediateStopSites(SimpleInterval intervalToClose,
int breakBandMultiple) |
void |
mergeWithNewVCs(java.util.List<htsjdk.variant.variantcontext.VariantContext> variantContexts,
ReferenceContext referenceContext)
Method which calls endPreviousStates at the appropriate places on the given a new startingStates object
and an OverallState object corresponding to the currently accumulated reads.
|
void |
onTraversalStart()
Operations performed just prior to the start of traversal.
|
java.lang.Object |
onTraversalSuccess()
Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal).
|
boolean |
useVariantAnnotations()
Must be overridden in order to add annotation arguments to the engine.
|
apply, apply, defaultDistanceToGroupVariants, defaultMaxGroupedSpan, defaultReferenceWindowPadding, isWithinInterval, requiresReference, traverse
doDictionaryCrossValidation, getDrivingVariantsFeatureInputs, getHeaderForVariants, getMultiVariantInputArgumentCollection, getSamplesForVariants, getSequenceDictionaryForDrivingVariants, getSpliteratorForDrivingVariants, initializeDrivingVariants, onShutdown, onStartup
getBestAvailableSequenceDictionary, getDrivingVariantCacheLookAheadBases, getGenomicsDBOptions, getProgressMeterRecordLabel, getTransformedVariantStream, getTransformedVariantStream, makePostVariantFilterTransformer, makePreVariantFilterTransformer, makeVariantFilter, requiresFeatures
directlyAccessEngineFeatureManager, directlyAccessEngineReadsDataSource, directlyAccessEngineReferenceDataSource
addFeatureInputsAfterInitialization, createSAMWriter, createVCFWriter, createVCFWriter, createVCFWriter, doWork, getDefaultCloudIndexPrefetchBufferSize, getDefaultCloudPrefetchBufferSize, getDefaultReadFilters, getDefaultToolVCFHeaderLines, getDefaultVariantAnnotations, getHeaderForFeatures, getHeaderForReads, getHeaderForSAMWriter, getMasterSequenceDictionary, getPluginDescriptors, getReferenceDictionary, getSequenceDictionaryValidationArgumentCollection, getToolName, getTransformedReadStream, getTraversalIntervals, hasFeatures, hasReads, hasReference, hasUserSuppliedIntervals, initializeProgressMeter, makePostReadFilterTransformer, makePreReadFilterTransformer, makeReadFilter, makeVariantAnnotations, requiresIntervals, requiresReads, transformTraversalIntervals
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final java.lang.String BP_RES_LONG_NAME
public static final java.lang.String BREAK_BANDS_LONG_NAME
public static final java.lang.String SOMATIC_INPUT_LONG_NAME
public static final java.lang.String DROP_SOMATIC_FILTERING_ANNOTATIONS_LONG_NAME
public static final java.lang.String ALLELE_FRACTION_DELTA_LONG_NAME
@Argument(fullName="convert-to-base-pair-resolution", doc="If specified, convert banded gVCFs to all-sites gVCFs", optional=true) protected boolean useBpResolution
@Argument(fullName="break-bands-at-multiples-of", doc="If > 0, reference bands will be broken up at genomic positions that are multiples of this number", optional=true) protected int multipleAtWhichToBreakBands
@Argument(fullName="input-is-somatic", doc="Merge input GVCFs according to somatic (i.e. Mutect2) annotations (BETA)") protected boolean somaticInput
@Argument(fullName="drop-somatic-filtering-annotations", doc="For input somatic GVCFs (i.e. from Mutect2) drop filtering annotations") protected boolean dropSomaticFilteringAnnotations
@ArgumentCollection protected DbsnpArgumentCollection dbsnp
public boolean useVariantAnnotations()
GATKTool
Annotation
s in the packages defined by GATKAnnotationPluginDescriptor.getPackageNames()
and automatically
generate and add command line arguments allowing the user to specify which annotations or groups of annotations to use.
To specify default annotations for a tool simply specify them using GATKTool.getDefaultVariantAnnotationGroups()
or GATKTool.getDefaultVariantAnnotations()
To access instantiated annotation objects simply use GATKTool.makeVariantAnnotations()
.useVariantAnnotations
in class GATKTool
public java.util.List<java.lang.Class<? extends Annotation>> getDefaultVariantAnnotationGroups()
GATKTool
GATKTool.getDefaultVariantAnnotations()
. Returned annotation groups are subject to selective enabling/disabling
by the user via the command line. The default implementation returns an empty list.getDefaultVariantAnnotationGroups
in class GATKTool
public void apply(java.util.List<htsjdk.variant.variantcontext.VariantContext> variantContexts, ReferenceContext referenceContext, java.util.List<ReadsContext> readsContexts)
MultiVariantWalkerGroupedOnStart
apply
in class MultiVariantWalkerGroupedOnStart
variantContexts
- VariantContexts from driving variants with matching start position
NOTE: This will never be emptyreferenceContext
- ReferenceContext object covering the reference of the longest spanning VariantContextprotected static final java.util.Set<java.lang.Integer> getIntermediateStopSites(SimpleInterval intervalToClose, int breakBandMultiple)
public void onTraversalStart()
GATKTool
onTraversalStart
in class GATKTool
public void mergeWithNewVCs(java.util.List<htsjdk.variant.variantcontext.VariantContext> variantContexts, ReferenceContext referenceContext)
variantContexts
- list of variant contexts with the same start position to be reducedreferenceContext
- ReferenceContext object overlapping the provided VariantContextspublic java.lang.Object onTraversalSuccess()
GATKTool
onTraversalSuccess
in class GATKTool
public void closeTool()
GATKTool
GATKTool.doWork()
template method.
It is called regardless of whether the GATKTool.traverse()
has succeeded or not.
It is called after the GATKTool.onTraversalSuccess()
has completed (successfully or not)
but before the GATKTool.doWork()
method returns.
In other words, on successful runs both GATKTool.onTraversalSuccess()
and GATKTool.closeTool()
will be called (in this order) while
on failed runs (when GATKTool.traverse()
causes an exception), only GATKTool.closeTool()
will be called.
The default implementation does nothing.
Subclasses should override this method to close any resources that must be closed regardless of the success of traversal.