Class SequenceDictionaryUtils
java.lang.Object
org.broadinstitute.hellbender.utils.SequenceDictionaryUtils
A series of utility functions that enable the GATK to compare two sequence dictionaries -- from the reference,
from BAMs, or from feature sources -- for consistency. The system supports two basic modes: get an enum state that
describes at a high level the consistency between two dictionaries, or a validateDictionaries that will
blow up with a UserException if the dicts are too incompatible.
Dictionaries are tested for contig name overlaps, consistency in ordering in these overlap set, and length,
if available.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
protected static final htsjdk.samtools.SAMSequenceRecord
-
Method Summary
Modifier and TypeMethodDescriptioncompareDictionaries
(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2, boolean checkContigOrdering) Workhorse routine that takes two dictionaries and returns their compatibility.getCommonContigsByName
(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2) Returns the set of contig names found in both dicts.getContigNames
(htsjdk.samtools.SAMSequenceDictionary dict) getContigNamesList
(htsjdk.samtools.SAMSequenceDictionary refSeqDict) static String
getDictionaryAsString
(htsjdk.samtools.SAMSequenceDictionary dict) Returns a compact String representation of the sequence dictionary it's passed The format of the returned String is: [ contig1Name(length: contig1Length) contig2Name(length: contig2Length) ...static boolean
sequenceRecordsAreEquivalent
(htsjdk.samtools.SAMSequenceRecord first, htsjdk.samtools.SAMSequenceRecord second) Helper routine that returns whether two sequence records are equivalent, defined as having the same name and lengths.static void
validateCRAMDictionaryAgainstReference
(htsjdk.samtools.SAMSequenceDictionary referenceDictionary, htsjdk.samtools.SAMSequenceDictionary cramDictionary) Tests for compatibility between a reference dictionary and a CRAM dictionary, using appropriate validation settings.static void
validateDictionaries
(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2) Tests for compatibility between two sequence dictionaries, using standard validation settings appropriate for the GATK.static void
validateDictionaries
(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2, boolean requireSuperset, boolean checkContigOrdering) Tests for compatibility between two sequence dictionaries.
-
Field Details
-
CHR1_HG18
protected static final htsjdk.samtools.SAMSequenceRecord CHR1_HG18 -
CHR2_HG18
protected static final htsjdk.samtools.SAMSequenceRecord CHR2_HG18 -
CHR10_HG18
protected static final htsjdk.samtools.SAMSequenceRecord CHR10_HG18 -
CHR1_HG19
protected static final htsjdk.samtools.SAMSequenceRecord CHR1_HG19 -
CHR2_HG19
protected static final htsjdk.samtools.SAMSequenceRecord CHR2_HG19 -
CHR10_HG19
protected static final htsjdk.samtools.SAMSequenceRecord CHR10_HG19 -
CHR1_B36
protected static final htsjdk.samtools.SAMSequenceRecord CHR1_B36 -
CHR2_B36
protected static final htsjdk.samtools.SAMSequenceRecord CHR2_B36 -
CHR10_B36
protected static final htsjdk.samtools.SAMSequenceRecord CHR10_B36 -
CHR1_B37
protected static final htsjdk.samtools.SAMSequenceRecord CHR1_B37 -
CHR2_B37
protected static final htsjdk.samtools.SAMSequenceRecord CHR2_B37 -
CHR10_B37
protected static final htsjdk.samtools.SAMSequenceRecord CHR10_B37
-
-
Method Details
-
validateDictionaries
public static void validateDictionaries(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2) Tests for compatibility between two sequence dictionaries, using standard validation settings appropriate for the GATK. If the dictionaries are incompatible, then UserExceptions are thrown with detailed error messages. The standard validation settings used by this method are: -Require the dictionaries to share a common subset of equivalent contigs -Do not require dict1 to be a superset of dict2. -Do not perform checks related to contig ordering: don't throw if the common contigs are in different orders with respect to each other, occur at different absolute indices, or are lexicographically sorted human dictionaries. GATK uses contig names rather than contig indices, and so should not be sensitive to contig ordering issues. For comparing a CRAM dictionary against a reference dictionary, callvalidateCRAMDictionaryAgainstReference(SAMSequenceDictionary, SAMSequenceDictionary)
instead.- Parameters:
name1
- name associated with dict1dict1
- the sequence dictionary dict1name2
- name associated with dict2dict2
- the sequence dictionary dict2
-
validateCRAMDictionaryAgainstReference
public static void validateCRAMDictionaryAgainstReference(htsjdk.samtools.SAMSequenceDictionary referenceDictionary, htsjdk.samtools.SAMSequenceDictionary cramDictionary) Tests for compatibility between a reference dictionary and a CRAM dictionary, using appropriate validation settings. If the dictionaries are incompatible, then UserExceptions are thrown with detailed error messages. The standard validation settings used by this method are: -Require the reference dictionary to be a superset of the cram dictionary -Do not perform checks related to contig ordering: don't throw if the common contigs are in different orders with respect to each other, occur at different absolute indices, or are lexicographically sorted human dictionaries. GATK uses contig names rather than contig indices, and so should not be sensitive to contig ordering issues.- Parameters:
referenceDictionary
- the sequence dictionary for the referencecramDictionary
- sequence dictionary from a CRAM file
-
validateDictionaries
public static void validateDictionaries(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2, boolean requireSuperset, boolean checkContigOrdering) Tests for compatibility between two sequence dictionaries. If the dictionaries are incompatible, then UserExceptions are thrown with detailed error messages. Two sequence dictionaries are compatible if they share a common subset of equivalent contigs, where equivalent contigs are defined as having the same name and length.- Parameters:
name1
- name associated with dict1dict1
- the sequence dictionary dict1name2
- name associated with dict2dict2
- the sequence dictionary dict2requireSuperset
- if true, require that dict1 be a superset of dict2, rather than dict1 and dict2 sharing a common subsetcheckContigOrdering
- if true, require common contigs to be in the same relative order with respect to each other and occur at the same absolute indices, and forbid lexicographically-sorted human dictionaries
-
compareDictionaries
public static SequenceDictionaryUtils.SequenceDictionaryCompatibility compareDictionaries(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2, boolean checkContigOrdering) Workhorse routine that takes two dictionaries and returns their compatibility.- Parameters:
dict1
- first sequence dictionarydict2
- second sequence dictionarycheckContigOrdering
- if true, perform checks related to contig ordering: forbid lexicographically-sorted dictionaries, and require common contigs to be in the same relative order and at the same absolute indices- Returns:
- A SequenceDictionaryCompatibility enum value describing the compatibility of the two dictionaries
-
sequenceRecordsAreEquivalent
public static boolean sequenceRecordsAreEquivalent(htsjdk.samtools.SAMSequenceRecord first, htsjdk.samtools.SAMSequenceRecord second) Helper routine that returns whether two sequence records are equivalent, defined as having the same name and lengths. NOTE: we allow the lengths to differ if one or both are UNKNOWN_SEQUENCE_LENGTH- Parameters:
first
- first sequence record to comparesecond
- second sequence record to compare- Returns:
- true if first and second have the same names and lengths, otherwise false
-
getCommonContigsByName
public static Set<String> getCommonContigsByName(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2) Returns the set of contig names found in both dicts.- Parameters:
dict1
-dict2
-- Returns:
-
getContigNames
-
getContigNamesList
-
getDictionaryAsString
Returns a compact String representation of the sequence dictionary it's passed The format of the returned String is: [ contig1Name(length: contig1Length) contig2Name(length: contig2Length) ... ]- Parameters:
dict
- a non-null SAMSequenceDictionary- Returns:
- A String containing all of the contig names and lengths from the sequence dictionary it's passed
-