public abstract class GATKTool extends CommandLineProgram
Modifier and Type | Field and Description |
---|---|
boolean |
addOutputSAMProgramRecord |
boolean |
addOutputVCFCommandLine |
int |
cloudIndexPrefetchBuffer |
int |
cloudPrefetchBuffer |
boolean |
createOutputBamIndex |
boolean |
createOutputBamMD5 |
boolean |
createOutputVariantIndex |
boolean |
createOutputVariantMD5 |
boolean |
disableBamIndexCaching |
FeatureManager |
features
Our source of Feature data (null if no source of Features was provided)
|
protected IntervalArgumentCollection |
intervalArgumentCollection |
protected boolean |
lenientVCFProcessing |
boolean |
outputSitesOnlyVCFs |
protected ProgressMeter |
progressMeter
Progress meter to print out traversal statistics.
|
protected ReadInputArgumentCollection |
readArguments |
protected ReferenceInputArgumentCollection |
referenceArguments |
static java.lang.String |
SECONDS_BETWEEN_PROGRESS_UPDATES_NAME |
protected SequenceDictionaryValidationArgumentCollection |
seqValidationArguments |
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, NIO_PROJECT_FOR_REQUESTER_PAYS, QUIET, specialArgumentsCollection, tmpDir, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
GATKTool() |
Modifier and Type | Method and Description |
---|---|
FeatureInput<? extends htsjdk.tribble.Feature> |
addFeatureInputsAfterInitialization(java.lang.String filePath,
java.lang.String name,
java.lang.Class<? extends htsjdk.tribble.Feature> featureType,
int featureQueryLookahead)
A method to allow a user to inject
FeatureInput s after initialization that were not
specified as command-line arguments. |
void |
closeTool()
This method is called by the GATK framework at the end of the
doWork() template method. |
SAMFileGATKReadWriter |
createSAMWriter(java.io.File outputFile,
boolean preSorted) |
SAMFileGATKReadWriter |
createSAMWriter(java.nio.file.Path outputPath,
boolean preSorted) |
htsjdk.variant.variantcontext.writer.VariantContextWriter |
createVCFWriter(java.io.File outFile)
Creates a VariantContextWriter whose outputFile type is determined by
the vcfOutput's extension, using the best available sequence dictionary for
this tool, and default index, leniency and md5 generation settings.
|
htsjdk.variant.variantcontext.writer.VariantContextWriter |
createVCFWriter(java.nio.file.Path outPath)
Creates a VariantContextWriter whose outputFile type is determined by
vcfOutput's extension, using the best available sequence dictionary for
this tool, and default index, leniency and md5 generation settings.
|
protected FeatureManager |
directlyAccessEngineFeatureManager()
Get the
FeatureManager for this GATKTool . |
protected ReadsDataSource |
directlyAccessEngineReadsDataSource()
Get the
ReadsDataSource for this GATKTool . |
protected ReferenceDataSource |
directlyAccessEngineReferenceDataSource()
Get the
ReferenceDataSource for this GATKTool . |
protected java.lang.Object |
doWork()
Do the work after command line has been parsed.
|
htsjdk.samtools.SAMSequenceDictionary |
getBestAvailableSequenceDictionary()
Returns the "best available" sequence dictionary or
null if there is no single best dictionary. |
int |
getDefaultCloudIndexPrefetchBufferSize() |
int |
getDefaultCloudPrefetchBufferSize() |
java.util.List<ReadFilter> |
getDefaultReadFilters()
Returns the default list of ReadFilters that are used for this tool.
|
protected java.util.Set<htsjdk.variant.vcf.VCFHeaderLine> |
getDefaultToolVCFHeaderLines() |
java.util.List<java.lang.Class<? extends Annotation>> |
getDefaultVariantAnnotationGroups()
Returns the default list of annotation groups that are used for this tool.
|
java.util.List<Annotation> |
getDefaultVariantAnnotations()
Returns the default list of
Annotation s that are used for this tool. |
protected GenomicsDBOptions |
getGenomicsDBOptions()
Get the GenomicsDB read settings for the current tool
|
<T extends htsjdk.tribble.Feature> |
getHeaderForFeatures(FeatureInput<T> featureDescriptor)
Returns the header for the specified source of Features
|
htsjdk.samtools.SAMFileHeader |
getHeaderForReads()
Returns the SAM header for the reads data source.
|
protected htsjdk.samtools.SAMFileHeader |
getHeaderForSAMWriter()
Returns the SAM header suitable for writing SAM/BAM/CRAM files produced by this tool.
|
htsjdk.samtools.SAMSequenceDictionary |
getMasterSequenceDictionary()
Returns the master sequence dictionary if it has been set, otherwise null.
|
java.util.List<? extends org.broadinstitute.barclay.argparser.CommandLinePluginDescriptor<?>> |
getPluginDescriptors()
Return the list of GATKCommandLinePluginDescriptors to be used for this tool.
|
java.lang.String |
getProgressMeterRecordLabel() |
htsjdk.samtools.SAMSequenceDictionary |
getReferenceDictionary()
Returns the reference sequence dictionary if there is a reference (hasReference() == true), otherwise null.
|
protected SequenceDictionaryValidationArgumentCollection |
getSequenceDictionaryValidationArgumentCollection()
Get the
SequenceDictionaryValidationArgumentCollection for the tool. |
java.lang.String |
getToolName()
Returns the name of this tool.
|
protected java.util.stream.Stream<GATKRead> |
getTransformedReadStream(ReadFilter filter)
Returns a stream over the reads, which are:
1.
|
java.util.List<SimpleInterval> |
getTraversalIntervals()
Returns the list of intervals to iterate, either limited to the user-supplied intervals or the entire reference genome if none were specified.
|
boolean |
hasFeatures()
Are sources of Features available?
|
boolean |
hasReads()
Are sources of reads available?
|
boolean |
hasReference()
Is a source of reference data available?
|
boolean |
hasUserSuppliedIntervals()
Are sources of intervals available?
|
ReadTransformer |
makePostReadFilterTransformer()
Returns the post-filter read transformer (simple or composite) that will be applied to the reads after filtering.
|
ReadTransformer |
makePreReadFilterTransformer()
Returns the pre-filter read transformer (simple or composite) that will be applied to the reads before filtering.
|
CountingReadFilter |
makeReadFilter()
Returns a read filter (simple or composite) that can be applied to reads.
|
java.util.Collection<Annotation> |
makeVariantAnnotations()
Returns a list of annotations that can be applied to VariantContexts.
|
protected void |
onShutdown()
Close all data sources on shutdown.
|
protected void |
onStartup()
Initialize our data sources, make sure that all tool requirements for input data have been satisfied
and start the progress meter.
|
void |
onTraversalStart()
Operations performed just prior to the start of traversal.
|
java.lang.Object |
onTraversalSuccess()
Operations performed immediately after a successful traversal (ie when no uncaught exceptions were thrown during the traversal).
|
boolean |
requiresFeatures()
Does this tool require features? Traversals types and/or tools that do should override to return true.
|
boolean |
requiresIntervals()
Does this tool require intervals? Traversals types and/or tools that do should override to return true.
|
boolean |
requiresReads()
Does this tool require reads? Traversals types and/or tools that do should override to return true.
|
boolean |
requiresReference()
Does this tool require reference data? Traversals types and/or tools that do should override to return true.
|
protected java.util.List<SimpleInterval> |
transformTraversalIntervals(java.util.List<SimpleInterval> getIntervals,
htsjdk.samtools.SAMSequenceDictionary sequenceDictionary) |
abstract void |
traverse()
A complete traversal from start to finish.
|
boolean |
useVariantAnnotations()
Must be overridden in order to add annotation arguments to the engine.
|
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getSupportInformation, getToolkitName, getToolkitShortName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
@ArgumentCollection protected IntervalArgumentCollection intervalArgumentCollection
@ArgumentCollection protected final ReadInputArgumentCollection readArguments
@ArgumentCollection protected final ReferenceInputArgumentCollection referenceArguments
public static final java.lang.String SECONDS_BETWEEN_PROGRESS_UPDATES_NAME
@ArgumentCollection protected SequenceDictionaryValidationArgumentCollection seqValidationArguments
@Argument(fullName="create-output-bam-index", shortName="OBI", doc="If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.", optional=true, common=true) public boolean createOutputBamIndex
@Argument(fullName="create-output-bam-md5", shortName="OBM", doc="If true, create a MD5 digest for any BAM/SAM/CRAM file created", optional=true, common=true) public boolean createOutputBamMD5
@Argument(fullName="create-output-variant-index", shortName="OVI", doc="If true, create a VCF index when writing a coordinate-sorted VCF file.", optional=true, common=true) public boolean createOutputVariantIndex
@Argument(fullName="create-output-variant-md5", shortName="OVM", doc="If true, create a a MD5 digest any VCF file created.", optional=true, common=true) public boolean createOutputVariantMD5
@Argument(fullName="lenient", shortName="LE", doc="Lenient processing of VCF files", common=true, optional=true) protected boolean lenientVCFProcessing
@Argument(fullName="add-output-sam-program-record", shortName="add-output-sam-program-record", doc="If true, adds a PG tag to created SAM/BAM/CRAM files.", optional=true, common=true) public boolean addOutputSAMProgramRecord
@Argument(fullName="add-output-vcf-command-line", shortName="add-output-vcf-command-line", doc="If true, adds a command line header line to created VCF files.", optional=true, common=true) public boolean addOutputVCFCommandLine
@Argument(fullName="cloud-prefetch-buffer", shortName="CPB", doc="Size of the cloud-only prefetch buffer (in MB; 0 to disable).", optional=true) public int cloudPrefetchBuffer
@Argument(fullName="cloud-index-prefetch-buffer", shortName="CIPB", doc="Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset.", optional=true) public int cloudIndexPrefetchBuffer
@Argument(fullName="disable-bam-index-caching", shortName="DBIC", doc="If true, don\'t cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified.", optional=true) public boolean disableBamIndexCaching
@Argument(fullName="sites-only-vcf-output", doc="If true, don\'t emit genotype fields when writing vcf file output.", optional=true) public boolean outputSitesOnlyVCFs
public FeatureManager features
protected ProgressMeter progressMeter
ProgressMeter.update(Locatable)
after each record processed from
the primary input in their traverse()
method.protected ReferenceDataSource directlyAccessEngineReferenceDataSource()
ReferenceDataSource
for this GATKTool
.
Will throw a GATKException
if the reference is null.
Clients are expected to call the hasReference()
method prior to calling this.
Should only be called by walker base classes in the engine (such as ReadWalker
), or by "free-form" tools that
extend the GATKTool
class directly rather than one of the built-in walker types.
Tools that extend a walker type should get their data via apply()
rather than directly accessing
the engine datasources.ReferenceDataSource
for this GATKTool
. Never null
.protected ReadsDataSource directlyAccessEngineReadsDataSource()
ReadsDataSource
for this GATKTool
.
Will throw a GATKException
if the reads are null.
Clients are expected to call the hasReads()
method prior to calling this.
Should only be called by walker base classes in the engine (such as ReadWalker
), or by "free-form" tools that
extend the GATKTool
class directly rather than one of the built-in walker types.
Tools that extend a walker type should get their data via apply()
rather than directly accessing
the engine datasources.ReadsDataSource
for this GATKTool
. Never null
.protected FeatureManager directlyAccessEngineFeatureManager()
FeatureManager
for this GATKTool
.
Will throw a GATKException
if the features are null.
Clients are expected to call the hasFeatures()
method prior to calling this.
Should only be called by walker base classes in the engine (such as ReadWalker
), or by "free-form" tools that
extend the GATKTool
class directly rather than one of the built-in walker types.
Tools that extend a walker type should get their data via apply()
rather than directly accessing
the engine datasources.FeatureManager
for this GATKTool
. Never null
.public java.util.List<? extends org.broadinstitute.barclay.argparser.CommandLinePluginDescriptor<?>> getPluginDescriptors()
getPluginDescriptors
in interface org.broadinstitute.barclay.argparser.CommandLinePluginProvider
getPluginDescriptors
in class CommandLineProgram
public java.util.List<ReadFilter> getDefaultReadFilters()
getHeaderForReads()
. The actual SAMFileHeader is propagated to the read filters
by makeReadFilter()
after the filters have been merged with command line arguments.public CountingReadFilter makeReadFilter()
getDefaultReadFilters()
along with any read filter
command line directives specified by the user (such as enabling other filters or disabling default filters);
wraps each filter in the resulting list with a CountingReadFilter; and returns a single composite filter
resulting from the list by and'ing them together.
NOTE: Most tools will not need to override the method, and should only do so in order to provide custom
behavior or processing of the final merged read filter. To change the default read filters used by the tool,
override getDefaultReadFilters()
instead.
Implementations of traverse()
should call this method once before iterating over the reads, in order to
unnecessary avoid object allocation. Nevertheless, keeping state in filter objects is strongly discouraged.
Multiple filters can be composed by using ReadFilter
composition methods.public boolean useVariantAnnotations()
Annotation
s in the package defined by org.broadinstitute.hellbender.cmdline.GATKPlugin.GATKAnnotationPluginDescriptor#pluginPackageName
and automatically
generate and add command line arguments allowing the user to specify which annotations or groups of annotations to use.
To specify default annotations for a tool simply specify them using getDefaultVariantAnnotationGroups()
or getDefaultVariantAnnotations()
To access instantiated annotation objects simply use makeVariantAnnotations()
.public java.util.List<Annotation> getDefaultVariantAnnotations()
Annotation
s that are used for this tool. The annotations returned
by this method are subject to selective enabling/disabling by the user via the command line. The
default implementation returns an empty list. Subclasses can override to provide alternative annotations.public java.util.List<java.lang.Class<? extends Annotation>> getDefaultVariantAnnotationGroups()
getDefaultVariantAnnotations()
. Returned annotation groups are subject to selective enabling/disabling
by the user via the command line. The default implementation returns an empty list.public java.util.Collection<Annotation> makeVariantAnnotations()
getDefaultVariantAnnotations()
and getDefaultVariantAnnotationGroups()
)
along with any annotations command line directives specified by the user (such as enabling other annotations/groups
or disabling default annotations) and returns a collection of all the annotation arguments instantiated.
NOTE: Most tools will not need to override the method, and should only do so in order to provide custom
behavior or processing of the final annotations based on other command line input. To change the default
annotations used by the tool, override getDefaultVariantAnnotations()
instead.
To apply returned annotations to a VariantContext, simply use a VariantAnnotatorEngine
constructed with the discovered annotations.public ReadTransformer makePreReadFilterTransformer()
ReadTransformer.identity()
.
Default implementation of traverse()
calls this method once before iterating over the reads and reuses
the transformer object to avoid object allocation.
Subclasses can extend to provide own transformers (ie override and call super).
Multiple transformers can be composed by using ReadTransformer
composition methods.public ReadTransformer makePostReadFilterTransformer()
ReadTransformer.identity()
.
Default implementation of traverse()
calls this method once before iterating over the reads and reuses
the transformer object to avoid object allocation.
Subclasses can extend to provide own transformers (ie override and call super).
Multiple transformers can be composed by using ReadTransformer
composition methods.protected java.util.stream.Stream<GATKRead> getTransformedReadStream(ReadFilter filter)
makePreReadFilterTransformer()
.
2. Filtered with filter
.
3. Transformed with makePostReadFilterTransformer()
.
Note: the filter is passed to keep the state of CountingReadFilter
, obtained with makeReadFilter()
.public int getDefaultCloudPrefetchBufferSize()
GATKConfig
file.public int getDefaultCloudIndexPrefetchBufferSize()
getDefaultCloudPrefetchBufferSize()
.
The default implementation returns -1.
This value is maintained in the GATKConfig
file.public java.lang.String getProgressMeterRecordLabel()
ProgressMeter.DEFAULT_RECORD_LABEL
,
but tools may override to provide a more appropriate label (like "reads" or "regions")protected java.util.List<SimpleInterval> transformTraversalIntervals(java.util.List<SimpleInterval> getIntervals, htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
protected GenomicsDBOptions getGenomicsDBOptions()
public final boolean hasReference()
public final boolean hasReads()
public final boolean hasFeatures()
public final boolean hasUserSuppliedIntervals()
public boolean requiresReference()
public boolean requiresFeatures()
public boolean requiresReads()
public boolean requiresIntervals()
protected SequenceDictionaryValidationArgumentCollection getSequenceDictionaryValidationArgumentCollection()
SequenceDictionaryValidationArgumentCollection
for the tool.
Subclasses may override this method in order to customize validation options.public final htsjdk.samtools.SAMSequenceDictionary getReferenceDictionary()
public final htsjdk.samtools.SAMSequenceDictionary getMasterSequenceDictionary()
onStartup()
public htsjdk.samtools.SAMSequenceDictionary getBestAvailableSequenceDictionary()
null
if there is no single best dictionary.
The algorithm for selecting the best dictionary is as follows:
1) If a master sequence dictionary was specified, use that dictionary
2) if there is a reference, then the best dictionary is the reference sequence dictionary
3) Otherwise, if there are reads, then the best dictionary is the sequence dictionary constructed from the reads.
4) Otherwise, if there are features and the feature data source has only one dictionary, then that one is the best dictionary.
5) Otherwise, the result is null
.
TODO: check interval file(s) as well for a sequence dictionary
Subclasses may override if they prefer a different algorithm.null
if no one dictionary is the best one.public final htsjdk.samtools.SAMFileHeader getHeaderForReads()
public final <T extends htsjdk.tribble.Feature> java.lang.Object getHeaderForFeatures(FeatureInput<T> featureDescriptor)
T
- type of Feature in our FeatureInputfeatureDescriptor
- FeatureInput whose header to retrieveprotected void onStartup()
onStartup
in class CommandLineProgram
public final SAMFileGATKReadWriter createSAMWriter(java.io.File outputFile, boolean preSorted)
public final SAMFileGATKReadWriter createSAMWriter(java.nio.file.Path outputPath, boolean preSorted)
public htsjdk.variant.variantcontext.writer.VariantContextWriter createVCFWriter(java.io.File outFile)
createVCFWriter(Path)
instead.outFile
- output File for this writer. May not be null.public htsjdk.variant.variantcontext.writer.VariantContextWriter createVCFWriter(java.nio.file.Path outPath)
outPath
- output Path for this writer. May not be null.protected htsjdk.samtools.SAMFileHeader getHeaderForSAMWriter()
getHeaderForReads()
(and makes an empty header if that call returns null)
and optionally adds program tag to the header with a program version CommandLineProgram.getVersion()
, program name getToolName()
and command line CommandLineProgram.getCommandLine()
.
Subclasses may override.addOutputSAMProgramRecord
is true) program record appropriately.protected java.util.Set<htsjdk.variant.vcf.VCFHeaderLine> getDefaultToolVCFHeaderLines()
public FeatureInput<? extends htsjdk.tribble.Feature> addFeatureInputsAfterInitialization(java.lang.String filePath, java.lang.String name, java.lang.Class<? extends htsjdk.tribble.Feature> featureType, int featureQueryLookahead)
FeatureInput
s after initialization that were not
specified as command-line arguments.filePath
- path to the Feature file to registername
- what to call the Feature inputfeatureType
- class of featuresfeatureQueryLookahead
- look ahead this many bases during queries that produce cache missesFeatureInput
used as the key for this data source.public java.lang.String getToolName()
CommandLineProgram.getToolkitShortName()
followed by the simple
name of the class. Subclasses may override.public java.util.List<SimpleInterval> getTraversalIntervals()
protected void onShutdown()
onShutdown
in class CommandLineProgram
public void onTraversalStart()
public abstract void traverse()
public java.lang.Object onTraversalSuccess()
protected final java.lang.Object doWork()
CommandLineProgram
doWork
in class CommandLineProgram
public void closeTool()
doWork()
template method.
It is called regardless of whether the traverse()
has succeeded or not.
It is called after the onTraversalSuccess()
has completed (successfully or not)
but before the doWork()
method returns.
In other words, on successful runs both onTraversalSuccess()
and closeTool()
will be called (in this order) while
on failed runs (when traverse()
causes an exception), only closeTool()
will be called.
The default implementation does nothing.
Subclasses should override this method to close any resources that must be closed regardless of the success of traversal.