T
- The type of Feature returned by this data sourcepublic final class FeatureDataSource<T extends htsjdk.tribble.Feature> extends java.lang.Object implements GATKDataSource<T>, java.lang.AutoCloseable
Two basic operations are available on this data source:
-Iteration over all Features in this data source, optionally restricted to Features overlapping
a set of intervals if intervals are provided via setIntervalsForTraversal(List)
. Traversal
by a set of intervals requires the file to have been indexed using the bundled tool IndexFeatureFile.
The set of intervals provided MUST be non-overlapping and sorted in increasing order of start position.
-Targeted queries by one interval at a time. This also requires the file to have been indexed using
the bundled tool IndexFeatureFile. Targeted queries by one interval at a time are unaffected by
any intervals for full traversal set via setIntervalsForTraversal(List)
.
To improve performance in the case of targeted queries by one interval at a time, this class uses a caching scheme that is optimized for the common access pattern of multiple separate queries over intervals with gradually increasing start positions. It optimizes for this use case by pre-fetching records immediately following each interval during a query and caching them. Performance will suffer if the access pattern is random, involves queries over intervals with DECREASING start positions instead of INCREASING start positions, or involves lots of very large jumps forward on the genome or lots of contig switches. Query caching can be disabled, if desired.
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_QUERY_LOOKAHEAD_BASES
Default value for queryLookaheadBases, if none is specified.
|
Constructor and Description |
---|
FeatureDataSource(FeatureInput<T> featureInput,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType)
Creates a FeatureDataSource backed by the provided FeatureInput.
|
FeatureDataSource(FeatureInput<T> featureInput,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType,
int cloudPrefetchBuffer,
int cloudIndexPrefetchBuffer)
Creates a FeatureDataSource backed by the provided FeatureInput.
|
FeatureDataSource(FeatureInput<T> featureInput,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType,
int cloudPrefetchBuffer,
int cloudIndexPrefetchBuffer,
GenomicsDBOptions genomicsDBOptions)
Creates a FeatureDataSource backed by the provided FeatureInput.
|
FeatureDataSource(FeatureInput<T> featureInput,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType,
int cloudPrefetchBuffer,
int cloudIndexPrefetchBuffer,
GenomicsDBOptions genomicsDBOptions,
boolean setNameOnCodec)
Creates a FeatureDataSource backed by the provided FeatureInput.
|
FeatureDataSource(FeatureInput<T> featureInput,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType,
int cloudPrefetchBuffer,
int cloudIndexPrefetchBuffer,
java.nio.file.Path reference)
Creates a FeatureDataSource backed by the provided FeatureInput.
|
FeatureDataSource(FeatureInput<T> featureInput,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType,
int cloudPrefetchBuffer,
int cloudIndexPrefetchBuffer,
java.nio.file.Path reference,
boolean setNameOnCodec)
Creates a FeatureDataSource backed by the provided FeatureInput.
|
FeatureDataSource(java.io.File featureFile)
Creates a FeatureDataSource backed by the provided File.
|
FeatureDataSource(java.io.File featureFile,
java.lang.String name)
Creates a FeatureDataSource backed by the provided File and assigns this data source the specified logical
name.
|
FeatureDataSource(java.io.File featureFile,
java.lang.String name,
int queryLookaheadBases)
Creates a FeatureDataSource backed by the provided File and assigns this data source the specified logical
name.
|
FeatureDataSource(java.lang.String featurePath)
Creates a FeatureDataSource backed by the provided path.
|
FeatureDataSource(java.lang.String featurePath,
java.lang.String name,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType)
Creates a FeatureDataSource backed by the resource at the provided path.
|
FeatureDataSource(java.lang.String featurePath,
java.lang.String name,
int queryLookaheadBases,
java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType,
int cloudPrefetchBuffer,
int cloudIndexPrefetchBuffer)
Creates a FeatureDataSource backed by the resource at the provided path.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Permanently close this data source, invalidating any open iteration over it, and making it invalid for future
iterations and queries.
|
protected static htsjdk.tribble.FeatureReader<htsjdk.variant.variantcontext.VariantContext> |
getGenomicsDBFeatureReader(GATKPath path,
java.io.File reference,
GenomicsDBOptions genomicsDBOptions) |
java.lang.Object |
getHeader()
Gets the header associated with this data source
|
java.lang.String |
getName()
Get the logical name of this data source.
|
htsjdk.samtools.SAMSequenceDictionary |
getSequenceDictionary()
Returns the sequence dictionary for this source of Features.
|
java.util.Iterator<T> |
iterator()
Gets an iterator over all Features in this data source, restricting traversal to Features
overlapping our intervals if intervals were provided via
setIntervalsForTraversal(List) |
java.util.Iterator<T> |
query(SimpleInterval interval)
Gets an iterator over all Features in this data source that overlap the provided interval.
|
java.util.List<T> |
queryAndPrefetch(htsjdk.samtools.util.Locatable interval)
Returns a List of all Features in this data source that overlap the provided interval.
|
void |
setIntervalsForTraversal(java.util.List<SimpleInterval> intervals)
Restricts traversals of this data source via
iterator() to only return Features that overlap the provided
intervals. |
public static final int DEFAULT_QUERY_LOOKAHEAD_BASES
public FeatureDataSource(java.io.File featureFile)
DEFAULT_QUERY_LOOKAHEAD_BASES
)
during queries that produce cache misses.featureFile
- file containing Featurespublic FeatureDataSource(java.lang.String featurePath)
DEFAULT_QUERY_LOOKAHEAD_BASES
)
during queries that produce cache misses.featurePath
- path or URI to source of Featurespublic FeatureDataSource(java.io.File featureFile, java.lang.String name)
DEFAULT_QUERY_LOOKAHEAD_BASES
) during queries
that produce cache misses.featureFile
- file containing Featuresname
- logical name for this data source (may be null)public FeatureDataSource(java.io.File featureFile, java.lang.String name, int queryLookaheadBases)
featureFile
- file containing Featuresname
- logical name for this data source (may be null)queryLookaheadBases
- look ahead this many bases during queries that produce cache missespublic FeatureDataSource(java.lang.String featurePath, java.lang.String name, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType)
featurePath
- path to file or GenomicsDB url containing featuresname
- logical name for this data source (may be null)queryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.public FeatureDataSource(FeatureInput<T> featureInput, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType)
featureInput
- a FeatureInput specifying a source of FeaturesqueryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.public FeatureDataSource(java.lang.String featurePath, java.lang.String name, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType, int cloudPrefetchBuffer, int cloudIndexPrefetchBuffer)
featurePath
- path to file or GenomicsDB url containing featuresname
- logical name for this data source (may be null)queryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.cloudPrefetchBuffer
- MB size of caching/prefetching wrapper for the data, if on Google Cloud (0 to disable).cloudIndexPrefetchBuffer
- MB size of caching/prefetching wrapper for the index, if on Google Cloud (0 to disable).public FeatureDataSource(FeatureInput<T> featureInput, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType, int cloudPrefetchBuffer, int cloudIndexPrefetchBuffer)
featureInput
- a FeatureInput specifying a source of FeaturesqueryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.cloudPrefetchBuffer
- MB size of caching/prefetching wrapper for the data, if on Google Cloud (0 to disable).cloudIndexPrefetchBuffer
- MB size of caching/prefetching wrapper for the index, if on Google Cloud (0 to disable).public FeatureDataSource(FeatureInput<T> featureInput, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType, int cloudPrefetchBuffer, int cloudIndexPrefetchBuffer, java.nio.file.Path reference)
featureInput
- a FeatureInput specifying a source of FeaturesqueryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.cloudPrefetchBuffer
- MB size of caching/prefetching wrapper for the data, if on Google Cloud (0 to disable).cloudIndexPrefetchBuffer
- MB size of caching/prefetching wrapper for the index, if on Google Cloud (0 to disable).reference
- the reference genome corresponding to the data to be readpublic FeatureDataSource(FeatureInput<T> featureInput, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType, int cloudPrefetchBuffer, int cloudIndexPrefetchBuffer, java.nio.file.Path reference, boolean setNameOnCodec)
featureInput
- a FeatureInput specifying a source of FeaturesqueryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.cloudPrefetchBuffer
- MB size of caching/prefetching wrapper for the data, if on Google Cloud (0 to disable).cloudIndexPrefetchBuffer
- MB size of caching/prefetching wrapper for the index, if on Google Cloud (0 to disable).reference
- the reference genome corresponding to the data to be readsetNameOnCodec
- If true, and if this FeatureDataSource uses a NameAwareCodec, the name of the FeatureInput will be used to set the codec's name. This exists as a mechanism to store the FeatureInput name in the source field of VariantContextspublic FeatureDataSource(FeatureInput<T> featureInput, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType, int cloudPrefetchBuffer, int cloudIndexPrefetchBuffer, GenomicsDBOptions genomicsDBOptions)
featureInput
- a FeatureInput specifying a source of FeaturesqueryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.cloudPrefetchBuffer
- MB size of caching/prefetching wrapper for the data, if on Google Cloud (0 to disable).cloudIndexPrefetchBuffer
- MB size of caching/prefetching wrapper for the index, if on Google Cloud (0 to disable).genomicsDBOptions
- options and info for reading from a GenomicsDB; may be nullpublic FeatureDataSource(FeatureInput<T> featureInput, int queryLookaheadBases, java.lang.Class<? extends htsjdk.tribble.Feature> targetFeatureType, int cloudPrefetchBuffer, int cloudIndexPrefetchBuffer, GenomicsDBOptions genomicsDBOptions, boolean setNameOnCodec)
featureInput
- a FeatureInput specifying a source of FeaturesqueryLookaheadBases
- look ahead this many bases during queries that produce cache missestargetFeatureType
- When searching for a FeatureCodec
for this data source, restrict the search to codecs
that produce this type of Feature. May be null, which results in an unrestricted search.cloudPrefetchBuffer
- MB size of caching/prefetching wrapper for the data, if on Google Cloud (0 to disable).cloudIndexPrefetchBuffer
- MB size of caching/prefetching wrapper for the index, if on Google Cloud (0 to disable).genomicsDBOptions
- options and info for reading from a GenomicsDB; may be nullsetNameOnCodec
- If true, and if this FeatureDataSource uses a NameAwareCodec, the name of the FeatureInput will be used to set the codec's name. This exists as a mechanism to store the FeatureInput name in the source field of VariantContextsprotected static htsjdk.tribble.FeatureReader<htsjdk.variant.variantcontext.VariantContext> getGenomicsDBFeatureReader(GATKPath path, java.io.File reference, GenomicsDBOptions genomicsDBOptions)
public htsjdk.samtools.SAMSequenceDictionary getSequenceDictionary()
public void setIntervalsForTraversal(java.util.List<SimpleInterval> intervals)
iterator()
to only return Features that overlap the provided
intervals. Calls to query(SimpleInterval)
and/or queryAndPrefetch(Locatable)
are not
affected by these intervals.
Intervals MUST be non-overlapping and sorted in order of increasing start position, otherwise traversal results will be incorrect.
Passing in a null or empty interval List clears the intervals for traversal, making future iterations over this data source unrestricted by intervals.
intervals
- Our next full traversal will return only Features overlapping these intervalspublic java.util.Iterator<T> iterator()
setIntervalsForTraversal(List)
Calling this method invalidates (closes) any previous iterator obtained from this method.
iterator
in interface java.lang.Iterable<T extends htsjdk.tribble.Feature>
setIntervalsForTraversal(List)
(if intervals were provided)public java.util.Iterator<T> query(SimpleInterval interval)
This operation is not affected by intervals provided via setIntervalsForTraversal(List)
.
Requires the backing file to have been indexed using the IndexFeatureFile tool, and to be sorted in increasing order of start position for each contig.
Query results are cached to improve the performance of future queries during typical access patterns. See notes to the class as a whole for a description of the caching strategy.
Calling this method potentially invalidates (closes) any other open iterator obtained
from this data source via a call to iterator()
query
in interface GATKDataSource<T extends htsjdk.tribble.Feature>
interval
- retrieve all Features overlapping this intervalpublic java.util.List<T> queryAndPrefetch(htsjdk.samtools.util.Locatable interval)
This operation is not affected by intervals provided via setIntervalsForTraversal(List)
.
Requires the backing file to have been indexed using the IndexFeatureFile tool, and to be sorted in increasing order of start position for each contig.
Query results are cached to improve the performance of future queries during typical access patterns. See notes to the class as a whole for a description of the caching strategy.
Calling this method potentially invalidates (closes) any other open iterator obtained
from this data source via a call to iterator()
interval
- retrieve all Features overlapping this intervalpublic java.lang.String getName()
public java.lang.Object getHeader()
public void close()
close
in interface java.lang.AutoCloseable