Package htsjdk.samtools.util
Class QualityEncodingDetector
java.lang.Object
htsjdk.samtools.util.QualityEncodingDetector
Utility for determining the type of quality encoding/format (see
FastqQualityFormat
) used in a SAM/BAM or Fastq.
To use this class, invoke the detect() method with a SamReader
or FastqReader
, as appropriate. The consumer is
responsible for closing readers.-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final long
The maximum number of records over which the detector will iterate before making a determination, by default. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionlong
add
(long maxRecords, FastqReader... readers) Adds the provided reader's records to the detector.long
Adds the provided reader's records to the detector.long
add
(long maxRecords, CloseableIterator<SAMRecord> iterator) long
add
(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities) Adds the provided iterator's records (optionally using the original qualities) to the detector.void
add
(FastqRecord fastqRecord) Adds the provided record's qualities to the detector.void
void
Adds the provided record's qualities to the detector.static FastqQualityFormat
detect
(long maxRecords, FastqReader... readers) Reads through the records in the provided fastq reader and uses their quality scores to determine the quality format used in the fastq.static FastqQualityFormat
static FastqQualityFormat
detect
(long maxRecords, CloseableIterator<SAMRecord> iterator) static FastqQualityFormat
detect
(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities) Reads through the records in the provided SAM reader and uses their quality scores to determine the quality format used in the SAM.static FastqQualityFormat
detect
(FastqReader... readers) static FastqQualityFormat
static FastqQualityFormat
detect
(SamReader reader, FastqQualityFormat expectedQualityFormat) Reads through the records in the provided SAM reader and uses their quality scores to sanity check the expected quality passed in.generateBestGuess
(QualityEncodingDetector.FileContext context, FastqQualityFormat expectedQuality) Make the best guess at the quality format.generateCandidateQualities
(boolean checkExpected) Processes collected quality data and applies rules to determine which quality formats are possible.boolean
Tests whether or not the detector can make a determination without guessing (i.e., if all but one quality format can be excluded using established exclusion conventions).
-
Field Details
-
DEFAULT_MAX_RECORDS_TO_ITERATE
public static final long DEFAULT_MAX_RECORDS_TO_ITERATEThe maximum number of records over which the detector will iterate before making a determination, by default.- See Also:
-
-
Constructor Details
-
QualityEncodingDetector
public QualityEncodingDetector()
-
-
Method Details
-
add
Adds the provided reader's records to the detector.- Returns:
- The number of records read
-
add
Adds the provided reader's records to the detector.- Returns:
- The number of records read
-
add
public long add(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities) Adds the provided iterator's records (optionally using the original qualities) to the detector.- Returns:
- The number of records read
-
add
-
add
Adds the provided record's qualities to the detector. -
add
Adds the provided record's qualities to the detector. -
add
-
isDeterminationAmbiguous
public boolean isDeterminationAmbiguous()Tests whether or not the detector can make a determination without guessing (i.e., if all but one quality format can be excluded using established exclusion conventions).- Returns:
- True if more than one format is possible after exclusions; false otherwise
-
generateCandidateQualities
Processes collected quality data and applies rules to determine which quality formats are possible. Specifically, for each format's known range of possible values (its "quality scheme"), exclude formats if any observed values fall outside of that range. Additionally, exclude formats for which we expect to see at least one quality in a range of values, but do not. (For example, for Phred, we expect to eventually see a value below 58. If we never see such a value, we exclude Phred as a possible format unless the checkExpected flag is set to false in which case we leave Phred as a possible quality format.) -
detect
Reads through the records in the provided fastq reader and uses their quality scores to determine the quality format used in the fastq.- Parameters:
readers
- The fastq readers from which qualities are to be read; at least one must be providedmaxRecords
- The maximum number of records to read from the reader before making a determination (a guess, so more records is better)- Returns:
- The determined quality format
-
detect
-
detect
public static FastqQualityFormat detect(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities) Reads through the records in the provided SAM reader and uses their quality scores to determine the quality format used in the SAM.- Parameters:
iterator
- The iterator from which SAM records are to be readmaxRecords
- The maximum number of records to read from the reader before making a determination (a guess,useOriginalQualities
- whether to use the original qualities (if available) rather than the current ones so more records is better)- Returns:
- The determined quality format
-
detect
-
detect
-
detect
-
detect
Reads through the records in the provided SAM reader and uses their quality scores to sanity check the expected quality passed in. If the expected quality format is sane we just hand this back otherwise we throw aSAMException
. -
generateBestGuess
public FastqQualityFormat generateBestGuess(QualityEncodingDetector.FileContext context, FastqQualityFormat expectedQuality) Make the best guess at the quality format. If an expected quality is passed in the values are sanity checked (ignoring expected range) and if they are deemed acceptable the expected quality is passed back. Otherwise we use a set of heuristics to make our best guess.
-