public final class IntervalUtils
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
IntervalUtils.IntervalBreakpointType
An enum to classify breakpoints whether the breakpoint is the start or end of a region.
|
Modifier and Type | Field and Description |
---|---|
static java.util.List<java.lang.String> |
INTERVAL_FILE_EXTENSIONS
Recognized extensions for interval files
|
static java.util.Comparator<htsjdk.samtools.util.Locatable> |
LEXICOGRAPHICAL_ORDER_COMPARATOR
Lexicographical (contig) order comparator.
|
Constructor and Description |
---|
IntervalUtils() |
Modifier and Type | Method and Description |
---|---|
static int |
beginOfShard(int shardIndex,
int shardSize)
first offset in this shard (1-based).
|
static <T extends htsjdk.samtools.util.Locatable> |
combineAndSortBreakpoints(java.util.List<T> unsortedLocatables1,
java.util.List<T> unsortedLocatables2,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Combine the breakpoints of multiple intervals and return a list of locatables based on the updated breakpoints.
|
static int |
compareContigs(htsjdk.samtools.util.Locatable first,
htsjdk.samtools.util.Locatable second,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Determines the relative contig ordering of first and second using the provided sequence dictionary
|
static int |
compareLocatables(htsjdk.samtools.util.Locatable first,
htsjdk.samtools.util.Locatable second,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Compare two locations using a
SAMSequenceDictionary sequence ordering |
static java.util.List<SimpleInterval> |
convertGenomeLocsToSimpleIntervals(java.util.List<GenomeLoc> genomeLocIntervals)
Convert a List of intervals in GenomeLoc format into a List of intervals in SimpleInterval format.
|
static htsjdk.samtools.QueryInterval |
convertSimpleIntervalToQueryInterval(SimpleInterval interval,
htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
Converts an interval in SimpleInterval format into an htsjdk QueryInterval.
|
static <T extends htsjdk.samtools.util.Locatable,U extends htsjdk.samtools.util.Locatable> |
createOverlapMap(java.util.List<T> keys,
java.util.List<U> vals,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Creates a map of which locatables (keys) overlap the other list of locatables (vals)
Input lists will be sorted sorted by the input dictionary.
|
static java.util.List<SimpleInterval> |
cutToShards(java.lang.Iterable<SimpleInterval> intervals,
int shardSize)
Splits the given input intervals into shards of at most the requested size.
|
static int |
endOfShard(int shardIndex,
int shardSize)
last offset in this shard (1-based).
|
static java.lang.String |
equateIntervals(java.util.List<GenomeLoc> masterArg,
java.util.List<GenomeLoc> testArg)
computes whether the test interval list is equivalent to master.
|
static java.util.List<GenomeLoc> |
featureFileToIntervals(GenomeLocParser parser,
java.lang.String featureFile)
Converts a Feature-containing file to a list of intervals
|
static java.util.List<GenomeLoc> |
flattenSplitIntervals(java.util.List<java.util.List<GenomeLoc>> splits) |
static java.util.List<GenomeLoc> |
genomeLocsFromLocatables(GenomeLocParser parser,
java.util.Collection<? extends htsjdk.samtools.util.Locatable> locatables)
Generates a list of
GenomeLoc instances given the appropriate GenomeLocParser factory
and a collection of Locatable instances. |
static java.util.List<SimpleInterval> |
getAllIntervalsForReference(htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
Builds a list of intervals that cover the whole given sequence.
|
static java.util.Map<java.lang.String,java.lang.Integer> |
getContigSizes(java.nio.file.Path reference)
Returns a map of contig names with their sizes.
|
static java.util.Comparator<htsjdk.samtools.util.Locatable> |
getDictionaryOrderComparator(htsjdk.samtools.SAMSequenceDictionary dictionary)
The order of contigs/sequences in the dictionary is the order of the sorting here.
|
static java.util.List<GenomeLoc> |
getIntervalsWithFlanks(GenomeLocParser parser,
java.util.List<GenomeLoc> locs,
int basePairs)
Returns a list of intervals between the passed int locs.
|
static java.util.List<SimpleInterval> |
getIntervalsWithFlanks(java.util.List<SimpleInterval> intervals,
int basePairs,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Pads the provided intervals by the specified amount, sorts the resulting intervals, and merges intervals
that are adjacent/overlapping after padding.
|
static java.util.List<SimpleInterval> |
getResolvedIntervals(java.lang.String intervalQueryString,
htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
Given an interval query string and a sequence dictionary, determine if the query string can be
resolved as a valid interval query against more than one contig in the dictionary, i.e., more than
one of:
prefix
prefix:nnn
prefix:nnn+
prefix:nnn-nnn
and return the list of all possible interpretations (there can never be more than 2).
|
static SimpleInterval |
getSpanningInterval(java.util.List<? extends htsjdk.samtools.util.Locatable> locations)
getSpanningInterval returns interval that covers all of the locations passed in.
|
static java.util.List<SimpleInterval> |
getSpanningIntervals(java.util.List<? extends htsjdk.samtools.util.Locatable> locations,
htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
Get a single interval per contig that includes all of the specified intervals
(This is used to improve GenomicsDB performance for exomes)
|
static java.util.List<java.util.List<SimpleInterval>> |
groupIntervalsByContig(java.util.List<SimpleInterval> sortedIntervals)
Accepts a sorted List of intervals, and returns a List of Lists of intervals grouped by contig,
one List per contig.
|
static java.util.List<GenomeLoc> |
intervalFileToList(GenomeLocParser glParser,
java.lang.String fileName)
Read a file of genome locations to process.
|
static boolean |
intervalIsOnDictionaryContig(SimpleInterval interval,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Determines whether the provided interval is within the bounds of its assigned contig according to the provided dictionary
|
static long |
intervalSize(java.util.List<GenomeLoc> locs) |
static boolean |
isAfter(htsjdk.samtools.util.Locatable first,
htsjdk.samtools.util.Locatable second,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Tests whether the first Locatable starts after the end of the second Locatable
|
static boolean |
isBefore(htsjdk.samtools.util.Locatable first,
htsjdk.samtools.util.Locatable second,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Tests whether the first Locatable ends before the start of the second Locatable
|
static boolean |
isIntervalFile(java.lang.String str)
Check if string argument was intended as a file
Accepted file extensions: .bed .list, .picard, .interval_list, .intervals.
|
static boolean |
isIntervalFile(java.lang.String str,
boolean checkExists)
Check if string argument was intended as a file
Accepted file extensions are defined in
INTERVAL_FILE_EXTENSIONS |
static boolean |
isReciprocalOverlap(SimpleInterval interval1,
SimpleInterval interval2,
double reciprocalOverlapThreshold)
Determine whether the two intervals specified overlap each other by at least the threshold proportion specified.
|
static GenomeLocSortedSet |
loadIntervals(java.util.List<java.lang.String> intervalStrings,
IntervalSetRule intervalSetRule,
IntervalMergingRule intervalMergingRule,
int padding,
GenomeLocParser genomeLocParser) |
static java.lang.String |
locatableToString(htsjdk.samtools.util.Locatable interval) |
static java.util.List<GenomeLoc> |
mergeIntervalLocations(java.util.List<GenomeLoc> raw,
IntervalMergingRule rule)
merge a list of genome locs that may be overlapping, returning the list of unique genomic locations
|
static java.util.List<GenomeLoc> |
mergeListsBySetOperator(java.util.List<GenomeLoc> setOne,
java.util.List<GenomeLoc> setTwo,
IntervalSetRule rule)
merge two interval lists, using an interval set rule
|
static boolean |
overlaps(htsjdk.samtools.util.Locatable left,
htsjdk.samtools.util.Locatable right)
Check whether two locatables overlap.
|
static java.util.List<GenomeLoc> |
parseIntervalArguments(GenomeLocParser parser,
java.util.List<java.lang.String> argList)
Turns a set of strings describing intervals into a parsed set of intervals.
|
static java.util.List<GenomeLoc> |
parseIntervalArguments(GenomeLocParser parser,
java.lang.String arg) |
static void |
scatterContigIntervals(htsjdk.samtools.SAMFileHeader fileHeader,
java.util.List<GenomeLoc> locs,
java.util.List<java.io.File> scatterParts)
Splits an interval list into multiple files.
|
static void |
scatterFixedIntervals(htsjdk.samtools.SAMFileHeader fileHeader,
java.util.List<java.util.List<GenomeLoc>> splits,
java.util.List<java.io.File> scatterParts)
Splits an interval list into multiple files.
|
static int |
shardIndex(int oneBasedOffset,
int shardSize)
number of the shard this offset is in.
|
static GenomeLocSortedSet |
sortAndMergeIntervals(GenomeLocParser parser,
java.util.List<GenomeLoc> intervals,
IntervalMergingRule mergingRule)
Sorts and merges an interval list.
|
static <T extends htsjdk.samtools.util.Locatable> |
sortLocatablesBySequenceDictionary(java.util.Collection<T> locatables,
htsjdk.samtools.SAMSequenceDictionary dictionary)
Sort by the contig then position as specified by the index order in the given sequence dictionary.
|
static java.util.List<java.util.List<GenomeLoc>> |
splitFixedIntervals(java.util.List<GenomeLoc> locs,
int numParts)
Splits the genome locs up by size.
|
static java.util.List<java.util.List<GenomeLoc>> |
splitIntervalsToSubLists(java.util.List<GenomeLoc> locs,
java.util.List<java.lang.Integer> splits)
Splits an interval list into multiple sublists.
|
static java.util.List<java.util.List<GenomeLoc>> |
splitLocusIntervals(java.util.List<GenomeLoc> locs,
int numParts) |
static SimpleInterval |
trimIntervalToContig(java.lang.String contig,
int start,
int stop,
int contigLength)
Create a new interval, bounding start and stop by the start and end of contig
This function will return null if start and stop cannot be adjusted in any reasonable way
to be on the contig.
|
static <T extends htsjdk.samtools.util.Locatable> |
validateNoOverlappingIntervals(java.util.List<T> locatables)
Throws Bad Input exception if any overlaps are detected within the list of locatables.
|
public static final java.util.List<java.lang.String> INTERVAL_FILE_EXTENSIONS
public static final java.util.Comparator<htsjdk.samtools.util.Locatable> LEXICOGRAPHICAL_ORDER_COMPARATOR
Intervals from different contigs order is according their enclosing contigs name ascending lexicographical order.
Intervals from the same contigs order is according to their start position ascending numerical order, and, in case of a tie, the stop position's.
The null
contig is supported and comes last.
public static final int compareLocatables(htsjdk.samtools.util.Locatable first, htsjdk.samtools.util.Locatable second, htsjdk.samtools.SAMSequenceDictionary dictionary)
SAMSequenceDictionary
sequence orderingjava.lang.IllegalArgumentException
- if either first or second contigs could not be found in the dictionarypublic static boolean isBefore(htsjdk.samtools.util.Locatable first, htsjdk.samtools.util.Locatable second, htsjdk.samtools.SAMSequenceDictionary dictionary)
first
- first Locatablesecond
- second Locatabledictionary
- sequence dictionary used to determine contig orderingpublic static boolean isAfter(htsjdk.samtools.util.Locatable first, htsjdk.samtools.util.Locatable second, htsjdk.samtools.SAMSequenceDictionary dictionary)
first
- first Locatablesecond
- second Locatabledictionary
- sequence dictionary used to determine contig orderingpublic static int compareContigs(htsjdk.samtools.util.Locatable first, htsjdk.samtools.util.Locatable second, htsjdk.samtools.SAMSequenceDictionary dictionary)
first
- first Locatablesecond
- second Locatabledictionary
- sequence dictionary used to determine contig orderingpublic static SimpleInterval getSpanningInterval(java.util.List<? extends htsjdk.samtools.util.Locatable> locations)
locations
- the locations to be spanned (on a single contig)java.lang.IllegalArgumentException
- if the argument is null
or if the argument contains any null element
or if the locations are not all on the same contig (compared by String.equals)public static java.util.List<SimpleInterval> convertGenomeLocsToSimpleIntervals(java.util.List<GenomeLoc> genomeLocIntervals)
genomeLocIntervals
- list of GenomeLoc intervals to convertpublic static htsjdk.samtools.QueryInterval convertSimpleIntervalToQueryInterval(SimpleInterval interval, htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
interval
- interval to convertsequenceDictionary
- sequence dictionary used to perform the conversionpublic static GenomeLocSortedSet loadIntervals(java.util.List<java.lang.String> intervalStrings, IntervalSetRule intervalSetRule, IntervalMergingRule intervalMergingRule, int padding, GenomeLocParser genomeLocParser)
public static java.util.List<GenomeLoc> parseIntervalArguments(GenomeLocParser parser, java.util.List<java.lang.String> argList)
parser
- Genome loc parser.argList
- A list of strings containing interval data.public static java.util.List<GenomeLoc> parseIntervalArguments(GenomeLocParser parser, java.lang.String arg)
public static java.util.List<GenomeLoc> featureFileToIntervals(GenomeLocParser parser, java.lang.String featureFile)
parser
- GenomeLocParser for creating intervalsfeatureFile
- file containing Features to convert to intervalsUserException.CouldNotReadInputFile
- if the provided file is not in a supported Feature file formatpublic static java.util.List<GenomeLoc> intervalFileToList(GenomeLocParser glParser, java.lang.String fileName)
glParser
- GenomeLocParserfileName
- interval filepublic static java.util.List<GenomeLoc> mergeListsBySetOperator(java.util.List<GenomeLoc> setOne, java.util.List<GenomeLoc> setTwo, IntervalSetRule rule)
setOne
- a list of genomeLocs, in order (cannot be NULL)setTwo
- a list of genomeLocs, also in order (cannot be NULL)rule
- the rule to use for merging, i.e. union, intersection, etcpublic static GenomeLocSortedSet sortAndMergeIntervals(GenomeLocParser parser, java.util.List<GenomeLoc> intervals, IntervalMergingRule mergingRule)
parser
- Genome loc parser for the intervals.intervals
- A collection of intervals to merge.mergingRule
- A descriptor for the type of merging to perform.public static java.lang.String equateIntervals(java.util.List<GenomeLoc> masterArg, java.util.List<GenomeLoc> testArg)
masterArg
- sorted master genome locstestArg
- sorted test genome locspublic static boolean isIntervalFile(java.lang.String str)
str
- token to identify as a filename.public static boolean isIntervalFile(java.lang.String str, boolean checkExists)
INTERVAL_FILE_EXTENSIONS
str
- token to identify as a filename.checkExists
- if true throws an exception if the file doesn't exist and has an interval file extensionpublic static java.util.Map<java.lang.String,java.lang.Integer> getContigSizes(java.nio.file.Path reference)
reference
- The reference for the intervals.public static void scatterContigIntervals(htsjdk.samtools.SAMFileHeader fileHeader, java.util.List<GenomeLoc> locs, java.util.List<java.io.File> scatterParts)
fileHeader
- The sam file header.locs
- The genome locs to split.scatterParts
- The output interval lists to write to.public static java.util.List<java.util.List<GenomeLoc>> splitIntervalsToSubLists(java.util.List<GenomeLoc> locs, java.util.List<java.lang.Integer> splits)
locs
- The genome locs to split.splits
- The stop points for the genome locs returned by splitFixedIntervals.public static void scatterFixedIntervals(htsjdk.samtools.SAMFileHeader fileHeader, java.util.List<java.util.List<GenomeLoc>> splits, java.util.List<java.io.File> scatterParts)
fileHeader
- The sam file header.splits
- Pre-divided genome locs returned by splitFixedIntervals.scatterParts
- The output interval lists to write to.public static java.util.List<java.util.List<GenomeLoc>> splitFixedIntervals(java.util.List<GenomeLoc> locs, int numParts)
locs
- Genome locs to split.numParts
- Number of parts to split the locs into.public static java.util.List<java.util.List<GenomeLoc>> splitLocusIntervals(java.util.List<GenomeLoc> locs, int numParts)
public static boolean overlaps(htsjdk.samtools.util.Locatable left, htsjdk.samtools.util.Locatable right)
Two locatables overlap if the share the same contig and they have at least one base in common based on their start and end positions.
This method returns false
if either input Locatable
has a null
contig.
left
- first locatable.right
- second locatable.true
iff there is an overlap between both locatables.java.lang.IllegalArgumentException
- if either left
or right
locatable
is null
.public static java.lang.String locatableToString(htsjdk.samtools.util.Locatable interval)
public static java.util.List<GenomeLoc> flattenSplitIntervals(java.util.List<java.util.List<GenomeLoc>> splits)
public static java.util.List<GenomeLoc> mergeIntervalLocations(java.util.List<GenomeLoc> raw, IntervalMergingRule rule)
raw
- the unchecked genome loc listrule
- the merging rule we're usingpublic static long intervalSize(java.util.List<GenomeLoc> locs)
public static java.util.List<GenomeLoc> getIntervalsWithFlanks(GenomeLocParser parser, java.util.List<GenomeLoc> locs, int basePairs)
parser
- A genome loc parser for creating the new intervalslocs
- Original genome locsbasePairs
- Number of base pairs on each side of locpublic static java.util.List<SimpleInterval> getIntervalsWithFlanks(java.util.List<SimpleInterval> intervals, int basePairs, htsjdk.samtools.SAMSequenceDictionary dictionary)
intervals
- intervals to padbasePairs
- number of bases of padding to add to each side of each intervaldictionary
- sequence dictionary used to restrict padded intervals to the bounds of their contigpublic static java.util.List<java.util.List<SimpleInterval>> groupIntervalsByContig(java.util.List<SimpleInterval> sortedIntervals)
sortedIntervals
- sorted List of intervals to group by contigpublic static java.util.List<GenomeLoc> genomeLocsFromLocatables(GenomeLocParser parser, java.util.Collection<? extends htsjdk.samtools.util.Locatable> locatables)
GenomeLoc
instances given the appropriate GenomeLocParser
factory
and a collection of Locatable
instances.
The order in the result list is will correspond to the traversal order in the input collection.
locatables
- input locatable collection.null
. The result is an unmodifiable list.java.lang.IllegalArgumentException
- if locatable
is null
or contains any null
.public static java.util.List<SimpleInterval> getAllIntervalsForReference(htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
public static java.util.List<SimpleInterval> getResolvedIntervals(java.lang.String intervalQueryString, htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
intervalQueryString
- sequenceDictionary
- sequenceDictionary
. If the list is empty, the query doesn't match any contig in the sequence
dictionary. If the list contains more than one interval, the query string is ambiguous and should be
rejected. If the list contains a single interval, the query is unambiguous and can be safely used to
conduct a query.java.lang.NumberFormatException
- if the query only matches a single contig in the dictionary, but the query
interval paramaters (start, end) cannot be parsedpublic static SimpleInterval trimIntervalToContig(java.lang.String contig, int start, int stop, int contigLength)
contig
- our contigstart
- our start as an arbitrary integer (may be negative, etc)stop
- our stop as an arbitrary integer (may be negative, etc)contigLength
- length of the contigpublic static boolean intervalIsOnDictionaryContig(SimpleInterval interval, htsjdk.samtools.SAMSequenceDictionary dictionary)
interval
- interval to checkdictionary
- dictionary to use to validate contig boundspublic static java.util.List<SimpleInterval> cutToShards(java.lang.Iterable<SimpleInterval> intervals, int shardSize)
ShardedIntervalIterator
public static int shardIndex(int oneBasedOffset, int shardSize)
public static int beginOfShard(int shardIndex, int shardSize)
public static int endOfShard(int shardIndex, int shardSize)
public static java.util.List<SimpleInterval> getSpanningIntervals(java.util.List<? extends htsjdk.samtools.util.Locatable> locations, htsjdk.samtools.SAMSequenceDictionary sequenceDictionary)
locations
- the intervals to be merged/spannedsequenceDictionary
- for contig sortingpublic static <T extends htsjdk.samtools.util.Locatable> java.util.List<htsjdk.samtools.util.Locatable> combineAndSortBreakpoints(java.util.List<T> unsortedLocatables1, java.util.List<T> unsortedLocatables2, htsjdk.samtools.SAMSequenceDictionary dictionary)
1 1000 2000List 2:
1 500 2500 1 2501 3000 1 4000 5000The result would be:
1 500 999 1 1000 2000 1 2001 2500 1 2501 3000 1 4000 5000Note that start breakpoints will always appear as starts of the resulting intervals.
Does not alter the input.
Any single list of input locatables containing duplicates or overlapping intervals will throw an exception. Intervals are assumed to include the start and end bases. This method performs all necessary sorting.unsortedLocatables1
- list of locatablesunsortedLocatables2
- list of locatablesdictionary
- Sequence dictionary to base the sort. The order of contigs/sequences in the dictionary is the order of the sorting here.public static <T extends htsjdk.samtools.util.Locatable> void validateNoOverlappingIntervals(java.util.List<T> locatables)
T
- Locatable classlocatables
- List of locatables to test. null
will never throw an exception.public static <T extends htsjdk.samtools.util.Locatable> java.util.List<T> sortLocatablesBySequenceDictionary(java.util.Collection<T> locatables, htsjdk.samtools.SAMSequenceDictionary dictionary)
T
- Locatablelocatables
- list of locatables.dictionary
- Never null
null
if locatables
is null
. Instances in the list are not copies of input.public static <T extends htsjdk.samtools.util.Locatable,U extends htsjdk.samtools.util.Locatable> java.util.Map<T,java.util.List<U>> createOverlapMap(java.util.List<T> keys, java.util.List<U> vals, htsjdk.samtools.SAMSequenceDictionary dictionary)
keys
- -- the intervals we wish to query. Sorted by interval. No intervals overlap. Never null
vals
- -- the intervals that we wish to map to the keys. Sorted by interval. No intervals overlap. Never null
dictionary
- -- the SAMSequenceDictionary that the intervals (and sorting) derive from. Never null
null
public static java.util.Comparator<htsjdk.samtools.util.Locatable> getDictionaryOrderComparator(htsjdk.samtools.SAMSequenceDictionary dictionary)
dictionary
- dictionary to use for the sorting. Intervals with sequences not in this dictionary will cause
exceptions to be thrown. Never null
.Comapator<Locatable>
for use in sorting of Locatables.public static boolean isReciprocalOverlap(SimpleInterval interval1, SimpleInterval interval2, double reciprocalOverlapThreshold)
interval1
- Never null
interval2
- Never null
reciprocalOverlapThreshold
- proportion of the segments that must overlap. Must be between 0.0 and 1.0 (inclusive).