Class QualityEncodingDetector

java.lang.Object
htsjdk.samtools.util.QualityEncodingDetector

public class QualityEncodingDetector extends Object
Utility for determining the type of quality encoding/format (see FastqQualityFormat) used in a SAM/BAM or Fastq.

To use this class, invoke the detect() method with a SamReader or FastqReader, as appropriate. The consumer is responsible for closing readers.

  • Field Details

    • DEFAULT_MAX_RECORDS_TO_ITERATE

      public static final long DEFAULT_MAX_RECORDS_TO_ITERATE
      The maximum number of records over which the detector will iterate before making a determination, by default.
      See Also:
  • Constructor Details

    • QualityEncodingDetector

      public QualityEncodingDetector()
  • Method Details

    • add

      public long add(long maxRecords, FastqReader... readers)
      Adds the provided reader's records to the detector.
      Returns:
      The number of records read
    • add

      public long add(long maxRecords, SamReader reader)
      Adds the provided reader's records to the detector.
      Returns:
      The number of records read
    • add

      public long add(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities)
      Adds the provided iterator's records (optionally using the original qualities) to the detector.
      Returns:
      The number of records read
    • add

      public long add(long maxRecords, CloseableIterator<SAMRecord> iterator)
    • add

      public void add(FastqRecord fastqRecord)
      Adds the provided record's qualities to the detector.
    • add

      public void add(SAMRecord samRecord, boolean useOriginalQualities)
      Adds the provided record's qualities to the detector.
    • add

      public void add(SAMRecord samRecord)
    • isDeterminationAmbiguous

      public boolean isDeterminationAmbiguous()
      Tests whether or not the detector can make a determination without guessing (i.e., if all but one quality format can be excluded using established exclusion conventions).
      Returns:
      True if more than one format is possible after exclusions; false otherwise
    • generateCandidateQualities

      public EnumSet<FastqQualityFormat> generateCandidateQualities(boolean checkExpected)
      Processes collected quality data and applies rules to determine which quality formats are possible.

      Specifically, for each format's known range of possible values (its "quality scheme"), exclude formats if any observed values fall outside of that range. Additionally, exclude formats for which we expect to see at least one quality in a range of values, but do not. (For example, for Phred, we expect to eventually see a value below 58. If we never see such a value, we exclude Phred as a possible format unless the checkExpected flag is set to false in which case we leave Phred as a possible quality format.)

    • detect

      public static FastqQualityFormat detect(long maxRecords, FastqReader... readers)
      Reads through the records in the provided fastq reader and uses their quality scores to determine the quality format used in the fastq.
      Parameters:
      readers - The fastq readers from which qualities are to be read; at least one must be provided
      maxRecords - The maximum number of records to read from the reader before making a determination (a guess, so more records is better)
      Returns:
      The determined quality format
    • detect

      public static FastqQualityFormat detect(FastqReader... readers)
    • detect

      public static FastqQualityFormat detect(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities)
      Reads through the records in the provided SAM reader and uses their quality scores to determine the quality format used in the SAM.
      Parameters:
      iterator - The iterator from which SAM records are to be read
      maxRecords - The maximum number of records to read from the reader before making a determination (a guess,
      useOriginalQualities - whether to use the original qualities (if available) rather than the current ones so more records is better)
      Returns:
      The determined quality format
    • detect

      public static FastqQualityFormat detect(long maxRecords, CloseableIterator<SAMRecord> iterator)
    • detect

      public static FastqQualityFormat detect(long maxRecords, SamReader reader)
    • detect

      public static FastqQualityFormat detect(SamReader reader)
    • detect

      public static FastqQualityFormat detect(SamReader reader, FastqQualityFormat expectedQualityFormat)
      Reads through the records in the provided SAM reader and uses their quality scores to sanity check the expected quality passed in. If the expected quality format is sane we just hand this back otherwise we throw a SAMException.
    • generateBestGuess

      public FastqQualityFormat generateBestGuess(QualityEncodingDetector.FileContext context, FastqQualityFormat expectedQuality)
      Make the best guess at the quality format. If an expected quality is passed in the values are sanity checked (ignoring expected range) and if they are deemed acceptable the expected quality is passed back. Otherwise we use a set of heuristics to make our best guess.