@DocumentedFeature public class PathSeqBuildReferenceTaxonomy extends CommandLineProgram
The tool reads the list of sequence accessions from the given reference. For each accession, it looks up the NCBI taxonomic ID of the corresponding organism and builds a taxonomic tree containing only organisms that are represented in the reference. The reference should only contain sequences from NCBI RefSeq and/or Genbank databases.
See argument documentation for information about where to download the archive files.
gatk PathSeqBuildKmers \ --reference microbe_reference.fasta \ --output taxonomy.db \ --refseq-catalog RefSeq-releaseXX.catalog.gz \ --tax-dump taxdump.tar.gz \ --min-non-virus-contig-length 2000
Often there are inconsistencies between the reference sequences, NCBI catalog, and taxonomy archive. To minimize this issue, ensure that the input files are retrieved on the same date.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
GENBANK_CATALOG_LONG_NAME |
static java.lang.String |
GENBANK_CATALOG_SHORT_NAME |
java.lang.String |
genbankCatalogPath
This may be supplied alone or in addition to the RefSeq catalog in the case that sequences from GenBank are
present in the reference.
|
static java.lang.String |
MIN_NON_VIRUS_CONTIG_LENGTH_LONG_NAME |
static java.lang.String |
MIN_NON_VIRUS_CONTIG_LENGTH_SHORT_NAME |
int |
minNonVirusContigLength
Sequences from non-virus organisms less than this length will be filtered out such that any reads aligning to them will
be ignored.
|
java.lang.String |
outputPath |
protected ReferenceInputArgumentCollection |
referenceArguments |
static java.lang.String |
REFSEQ_CATALOG_LONG_NAME |
static java.lang.String |
REFSEQ_CATALOG_SHORT_NAME |
java.lang.String |
refseqCatalogPath |
static java.lang.String |
TAX_DUMP_LONG_NAME |
static java.lang.String |
TAX_DUMP_SHORT_NAME |
java.lang.String |
taxdumpPath |
GATK_CONFIG_FILE, logger, NIO_MAX_REOPENS, QUIET, specialArgumentsCollection, TMP_DIR, useJdkDeflater, useJdkInflater, VERBOSITY
Constructor and Description |
---|
PathSeqBuildReferenceTaxonomy() |
Modifier and Type | Method and Description |
---|---|
java.lang.Object |
doWork()
Do the work after command line has been parsed.
|
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getMetricsFile, getPluginDescriptors, getSupportInformation, getToolkitName, getToolStatusWarning, getUsage, getVersion, instanceMain, instanceMainPostParseArgs, isBetaFeature, isExperimentalFeature, onShutdown, onStartup, parseArgs, printLibraryVersions, printSettings, printStartupMessage, runTool, setDefaultHeaders, warnOnToolStatus
public static final java.lang.String REFSEQ_CATALOG_LONG_NAME
public static final java.lang.String REFSEQ_CATALOG_SHORT_NAME
public static final java.lang.String GENBANK_CATALOG_LONG_NAME
public static final java.lang.String GENBANK_CATALOG_SHORT_NAME
public static final java.lang.String TAX_DUMP_LONG_NAME
public static final java.lang.String TAX_DUMP_SHORT_NAME
public static final java.lang.String MIN_NON_VIRUS_CONTIG_LENGTH_LONG_NAME
public static final java.lang.String MIN_NON_VIRUS_CONTIG_LENGTH_SHORT_NAME
@ArgumentCollection protected final ReferenceInputArgumentCollection referenceArguments
@Argument(doc="Local path for the output file. By convention, the extension should be \".db\"", shortName="O", fullName="output") public java.lang.String outputPath
@Argument(doc="Local path to catalog file (RefSeq-releaseXX.catalog.gz available at ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-catalog/)", fullName="refseq-catalog", shortName="RC", optional=true) public java.lang.String refseqCatalogPath
@Argument(doc="Local path to Genbank catalog file (gbXXX.catalog.XXX.txt.gz at ftp://ftp.ncbi.nlm.nih.gov/genbank/catalog/)", fullName="genbank-catalog", shortName="GC", optional=true) public java.lang.String genbankCatalogPath
@Argument(doc="Local path to taxonomy dump tarball (taxdump.tar.gz available at ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/)", fullName="tax-dump", shortName="TD") public java.lang.String taxdumpPath
@Argument(doc="Minimum reference contig length for non-viruses", fullName="min-non-virus-contig-length", shortName="min-non-virus-contig-length", minValue=0.0, minRecommendedValue=500.0, maxRecommendedValue=10000.0) public int minNonVirusContigLength
public java.lang.Object doWork()
CommandLineProgram
doWork
in class CommandLineProgram