Class SequenceDictionaryUtils

java.lang.Object
org.broadinstitute.hellbender.utils.SequenceDictionaryUtils

public final class SequenceDictionaryUtils extends Object
A series of utility functions that enable the GATK to compare two sequence dictionaries -- from the reference, from BAMs, or from feature sources -- for consistency. The system supports two basic modes: get an enum state that describes at a high level the consistency between two dictionaries, or a validateDictionaries that will blow up with a UserException if the dicts are too incompatible. Dictionaries are tested for contig name overlaps, consistency in ordering in these overlap set, and length, if available.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static enum 
     
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
    protected static final htsjdk.samtools.SAMSequenceRecord
     
  • Method Summary

    Modifier and Type
    Method
    Description
    compareDictionaries(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2, boolean checkContigOrdering)
    Workhorse routine that takes two dictionaries and returns their compatibility.
    static Set<String>
    getCommonContigsByName(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2)
    Returns the set of contig names found in both dicts.
    static Set<String>
    getContigNames(htsjdk.samtools.SAMSequenceDictionary dict)
     
    static List<String>
    getContigNamesList(htsjdk.samtools.SAMSequenceDictionary refSeqDict)
     
    static String
    getDictionaryAsString(htsjdk.samtools.SAMSequenceDictionary dict)
    Returns a compact String representation of the sequence dictionary it's passed The format of the returned String is: [ contig1Name(length: contig1Length) contig2Name(length: contig2Length) ...
    static boolean
    sequenceRecordsAreEquivalent(htsjdk.samtools.SAMSequenceRecord first, htsjdk.samtools.SAMSequenceRecord second)
    Helper routine that returns whether two sequence records are equivalent, defined as having the same name and lengths.
    static void
    validateCRAMDictionaryAgainstReference(htsjdk.samtools.SAMSequenceDictionary referenceDictionary, htsjdk.samtools.SAMSequenceDictionary cramDictionary)
    Tests for compatibility between a reference dictionary and a CRAM dictionary, using appropriate validation settings.
    static void
    validateDictionaries(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2)
    Tests for compatibility between two sequence dictionaries, using standard validation settings appropriate for the GATK.
    static void
    validateDictionaries(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2, boolean requireSuperset, boolean checkContigOrdering)
    Tests for compatibility between two sequence dictionaries.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • CHR1_HG18

      protected static final htsjdk.samtools.SAMSequenceRecord CHR1_HG18
    • CHR2_HG18

      protected static final htsjdk.samtools.SAMSequenceRecord CHR2_HG18
    • CHR10_HG18

      protected static final htsjdk.samtools.SAMSequenceRecord CHR10_HG18
    • CHR1_HG19

      protected static final htsjdk.samtools.SAMSequenceRecord CHR1_HG19
    • CHR2_HG19

      protected static final htsjdk.samtools.SAMSequenceRecord CHR2_HG19
    • CHR10_HG19

      protected static final htsjdk.samtools.SAMSequenceRecord CHR10_HG19
    • CHR1_B36

      protected static final htsjdk.samtools.SAMSequenceRecord CHR1_B36
    • CHR2_B36

      protected static final htsjdk.samtools.SAMSequenceRecord CHR2_B36
    • CHR10_B36

      protected static final htsjdk.samtools.SAMSequenceRecord CHR10_B36
    • CHR1_B37

      protected static final htsjdk.samtools.SAMSequenceRecord CHR1_B37
    • CHR2_B37

      protected static final htsjdk.samtools.SAMSequenceRecord CHR2_B37
    • CHR10_B37

      protected static final htsjdk.samtools.SAMSequenceRecord CHR10_B37
  • Method Details

    • validateDictionaries

      public static void validateDictionaries(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2)
      Tests for compatibility between two sequence dictionaries, using standard validation settings appropriate for the GATK. If the dictionaries are incompatible, then UserExceptions are thrown with detailed error messages. The standard validation settings used by this method are: -Require the dictionaries to share a common subset of equivalent contigs -Do not require dict1 to be a superset of dict2. -Do not perform checks related to contig ordering: don't throw if the common contigs are in different orders with respect to each other, occur at different absolute indices, or are lexicographically sorted human dictionaries. GATK uses contig names rather than contig indices, and so should not be sensitive to contig ordering issues. For comparing a CRAM dictionary against a reference dictionary, call validateCRAMDictionaryAgainstReference(SAMSequenceDictionary, SAMSequenceDictionary) instead.
      Parameters:
      name1 - name associated with dict1
      dict1 - the sequence dictionary dict1
      name2 - name associated with dict2
      dict2 - the sequence dictionary dict2
    • validateCRAMDictionaryAgainstReference

      public static void validateCRAMDictionaryAgainstReference(htsjdk.samtools.SAMSequenceDictionary referenceDictionary, htsjdk.samtools.SAMSequenceDictionary cramDictionary)
      Tests for compatibility between a reference dictionary and a CRAM dictionary, using appropriate validation settings. If the dictionaries are incompatible, then UserExceptions are thrown with detailed error messages. The standard validation settings used by this method are: -Require the reference dictionary to be a superset of the cram dictionary -Do not perform checks related to contig ordering: don't throw if the common contigs are in different orders with respect to each other, occur at different absolute indices, or are lexicographically sorted human dictionaries. GATK uses contig names rather than contig indices, and so should not be sensitive to contig ordering issues.
      Parameters:
      referenceDictionary - the sequence dictionary for the reference
      cramDictionary - sequence dictionary from a CRAM file
    • validateDictionaries

      public static void validateDictionaries(String name1, htsjdk.samtools.SAMSequenceDictionary dict1, String name2, htsjdk.samtools.SAMSequenceDictionary dict2, boolean requireSuperset, boolean checkContigOrdering)
      Tests for compatibility between two sequence dictionaries. If the dictionaries are incompatible, then UserExceptions are thrown with detailed error messages. Two sequence dictionaries are compatible if they share a common subset of equivalent contigs, where equivalent contigs are defined as having the same name and length.
      Parameters:
      name1 - name associated with dict1
      dict1 - the sequence dictionary dict1
      name2 - name associated with dict2
      dict2 - the sequence dictionary dict2
      requireSuperset - if true, require that dict1 be a superset of dict2, rather than dict1 and dict2 sharing a common subset
      checkContigOrdering - if true, require common contigs to be in the same relative order with respect to each other and occur at the same absolute indices, and forbid lexicographically-sorted human dictionaries
    • compareDictionaries

      public static SequenceDictionaryUtils.SequenceDictionaryCompatibility compareDictionaries(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2, boolean checkContigOrdering)
      Workhorse routine that takes two dictionaries and returns their compatibility.
      Parameters:
      dict1 - first sequence dictionary
      dict2 - second sequence dictionary
      checkContigOrdering - if true, perform checks related to contig ordering: forbid lexicographically-sorted dictionaries, and require common contigs to be in the same relative order and at the same absolute indices
      Returns:
      A SequenceDictionaryCompatibility enum value describing the compatibility of the two dictionaries
    • sequenceRecordsAreEquivalent

      public static boolean sequenceRecordsAreEquivalent(htsjdk.samtools.SAMSequenceRecord first, htsjdk.samtools.SAMSequenceRecord second)
      Helper routine that returns whether two sequence records are equivalent, defined as having the same name and lengths. NOTE: we allow the lengths to differ if one or both are UNKNOWN_SEQUENCE_LENGTH
      Parameters:
      first - first sequence record to compare
      second - second sequence record to compare
      Returns:
      true if first and second have the same names and lengths, otherwise false
    • getCommonContigsByName

      public static Set<String> getCommonContigsByName(htsjdk.samtools.SAMSequenceDictionary dict1, htsjdk.samtools.SAMSequenceDictionary dict2)
      Returns the set of contig names found in both dicts.
      Parameters:
      dict1 -
      dict2 -
      Returns:
    • getContigNames

      public static Set<String> getContigNames(htsjdk.samtools.SAMSequenceDictionary dict)
    • getContigNamesList

      public static List<String> getContigNamesList(htsjdk.samtools.SAMSequenceDictionary refSeqDict)
    • getDictionaryAsString

      public static String getDictionaryAsString(htsjdk.samtools.SAMSequenceDictionary dict)
      Returns a compact String representation of the sequence dictionary it's passed The format of the returned String is: [ contig1Name(length: contig1Length) contig2Name(length: contig2Length) ... ]
      Parameters:
      dict - a non-null SAMSequenceDictionary
      Returns:
      A String containing all of the contig names and lengths from the sequence dictionary it's passed