public final class SAMUtils
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
static int |
MAX_PHRED_SCORE |
Constructor and Description |
---|
SAMUtils() |
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
calculateReadGroupRecordChecksum(java.io.File input,
java.io.File referenceFasta)
Calculate a hash code from identifying information in the RG (read group) records in a SAM file's
header.
|
static void |
chainSAMProgramRecord(SAMFileHeader header,
SAMProgramRecord program)
Chains
program in front of the first "head" item in the list of
SAMProgramRecords in header . |
static boolean |
cigarMapsNoBasesToRef(Cigar cigar)
Determines if a cigar has any element that both consumes read bases and consumes reference bases
(e.g.
|
static SAMRecord |
clipOverlappingAlignedBases(SAMRecord record,
boolean noSideEffects)
Returns a (possibly new) record that has been clipped if isa mapped paired and has overlapping bases with its mate.
|
static SAMRecord |
clipOverlappingAlignedBases(SAMRecord record,
int numOverlappingBasesToClip,
boolean noSideEffects)
Returns a (possibly new) SAMRecord with the given number of bases soft-clipped at the end of the read if is a mapped
paired and has overlapping bases with its mate.
|
static int |
combineMapqs(int m1,
int m2)
Hokey algorithm for combining two MAPQs into values that are comparable, being cognizant of the fact
that in MAPQ world, 1 > 255 > 0.
|
static int |
compareMapqs(int mapq1,
int mapq2) |
static byte[] |
compressedBasesToBytes(int length,
byte[] compressedBases,
int compressedOffset)
Convert from a byte array with basese stored in nybbles, with =, A, C, G, T represented as 0, 1, 2, 4, 8, 15,
to a a byte array containing =AaCcGgTtNn represented as ASCII.
|
static void |
fastqToPhred(byte[] fastq)
Converts printable qualities in Sanger fastq format to binary phred scores.
|
static int |
fastqToPhred(char ch)
Convert a single printable ASCII FASTQ format phred score to binary phred score.
|
static byte[] |
fastqToPhred(java.lang.String fastq)
Convert a string with phred scores in printable ASCII FASTQ format to an array
of binary phred scores.
|
static long |
findVirtualOffsetOfFirstRecordInBam(java.io.File bamFile)
Returns the virtual file offset of the first record in a BAM file - i.e.
|
static java.util.List<AlignmentBlock> |
getAlignmentBlocks(Cigar cigar,
int alignmentStart,
java.lang.String cigarTypeName)
Given a Cigar, Returns blocks of the sequence that have been aligned directly to the
reference sequence.
|
static java.lang.String |
getCanonicalRecordName(SAMRecord record)
Returns a string that is the the read group ID and read name separated by a colon.
|
static java.util.List<AlignmentBlock> |
getMateAlignmentBlocks(SAMRecord rec) |
static int |
getMateAlignmentEnd(SAMRecord rec)
This method uses the MateCigar value as determined from the attribute MC.
|
static Cigar |
getMateCigar(SAMRecord rec)
Returns the Mate Cigar or null if there is none.
|
static Cigar |
getMateCigar(SAMRecord rec,
boolean withValidation)
Returns the Mate Cigar or null if there is none.
|
static int |
getMateCigarLength(SAMRecord rec) |
static java.lang.String |
getMateCigarString(SAMRecord rec)
Returns the Mate Cigar String as stored in the attribute 'MC'.
|
static int |
getMateUnclippedEnd(SAMRecord rec) |
static int |
getMateUnclippedStart(SAMRecord rec) |
static int |
getNumOverlappingAlignedBasesToClip(SAMRecord rec)
Returns the number of bases that need to be clipped due to overlapping pairs.
|
static int |
getUnclippedEnd(int alignmentEnd,
Cigar cigar) |
static int |
getUnclippedStart(int alignmentStart,
Cigar cigar) |
static boolean |
hasMateCigar(SAMRecord rec)
Checks to see if it is valid for this record to have a mate CIGAR (MC) and then if there is a mate CIGAR available.
|
static boolean |
hasOriginalMappingInformation(SAMRecord rec)
See if any tags pertaining to original mapping information have been set.
|
static boolean |
isValidUnsignedIntegerAttribute(long value)
Checks if a long attribute value is within the allowed range of a 32-bit unsigned integer.
|
static void |
makeReadUnmapped(SAMRecord rec)
Strip mapping information from a SAMRecord.
|
static void |
makeReadUnmappedWithOriginalTags(SAMRecord rec)
Strip mapping information from a SAMRecord, but preserve it in the 'O' tags if it isn't already set.
|
static java.lang.String |
phredToFastq(byte[] data)
Convert an array of bytes, in which each byte is a binary phred quality score, to
printable ASCII representation of the quality scores, ala FASTQ format.
|
static java.lang.String |
phredToFastq(byte[] buffer,
int offset,
int length)
Convert an array of bytes, in which each byte is a binary phred quality score, to
printable ASCII representation of the quality scores, ala FASTQ format.
|
static char |
phredToFastq(int phredScore)
Convert a single binary phred score to printable ASCII representation, ala FASTQ format.
|
static void |
processValidationError(SAMValidationError validationError,
ValidationStringency validationStringency) |
static void |
processValidationErrors(java.util.List<SAMValidationError> validationErrors,
long samRecordIndex,
ValidationStringency validationStringency)
Handle a list of validation errors according to the validation stringency.
|
static boolean |
recordMapsEntirelyBeyondEndOfReference(SAMRecord record)
Tests if the provided record is mapped entirely beyond the end of the reference (i.e., the alignment start is greater than the
length of the sequence to which the record is mapped).
|
static java.util.List<SAMValidationError> |
validateCigar(SAMRecord rec,
Cigar cigar,
java.lang.Integer referenceIndex,
java.util.List<AlignmentBlock> alignmentBlocks,
long recordNumber,
java.lang.String cigarTypeName)
Run all validations of the mate's CIGAR.
|
static java.util.List<SAMValidationError> |
validateMateCigar(SAMRecord rec,
long recordNumber)
Run all validations of the mate's CIGAR.
|
public static final int MAX_PHRED_SCORE
public static byte[] compressedBasesToBytes(int length, byte[] compressedBases, int compressedOffset)
length
- Number of bases (not bytes) to convert.compressedBases
- Bases represented as nybbles, in BAM binary format.compressedOffset
- Byte offset in compressedBases to start.public static java.lang.String phredToFastq(byte[] data)
data
- Array of bytes in which each byte is a binar phred score.public static java.lang.String phredToFastq(byte[] buffer, int offset, int length)
buffer
- Array of bytes in which each byte is a binar phred score.offset
- Where in buffer to start conversion.length
- How many bytes of buffer to convert.public static char phredToFastq(int phredScore)
phredScore
- binary phred score.public static byte[] fastqToPhred(java.lang.String fastq)
fastq
- Phred scores in FASTQ printable ASCII format.public static void fastqToPhred(byte[] fastq)
public static int fastqToPhred(char ch)
ch
- Printable ASCII FASTQ format phred score.public static void processValidationErrors(java.util.List<SAMValidationError> validationErrors, long samRecordIndex, ValidationStringency validationStringency)
validationErrors
- List of errors to report, or null if there are no errors.samRecordIndex
- Record number of the SAMRecord corresponding to the validation errors, or -1 if
the record number is not known.validationStringency
- If STRICT, throw a SAMFormatException. If LENIENT, print the validation
errors to stderr. If SILENT, do nothing.public static void processValidationError(SAMValidationError validationError, ValidationStringency validationStringency)
public static java.lang.String calculateReadGroupRecordChecksum(java.io.File input, java.io.File referenceFasta)
public static void chainSAMProgramRecord(SAMFileHeader header, SAMProgramRecord program)
program
in front of the first "head" item in the list of
SAMProgramRecords in header
. This method should not be used
when there are multiple chains of program groups in a header, only when
it can safely be assumed that there is only one chain. It correctly handles
the case where program
has already been added to the header, so
it can be used whether creating a SAMProgramRecord with a constructor or when
calling SAMFileHeader.createProgramRecord().public static void makeReadUnmapped(SAMRecord rec)
public static void makeReadUnmappedWithOriginalTags(SAMRecord rec)
public static boolean hasOriginalMappingInformation(SAMRecord rec)
public static boolean cigarMapsNoBasesToRef(Cigar cigar)
public static boolean recordMapsEntirelyBeyondEndOfReference(SAMRecord record)
record
- must not have a null SamFileHeaderpublic static int compareMapqs(int mapq1, int mapq2)
public static int combineMapqs(int m1, int m2)
public static long findVirtualOffsetOfFirstRecordInBam(java.io.File bamFile)
public static java.util.List<AlignmentBlock> getAlignmentBlocks(Cigar cigar, int alignmentStart, java.lang.String cigarTypeName)
cigar
- The cigar containing the alignment informationalignmentStart
- The start (1-based) of the alignmentcigarTypeName
- The type of cigar passed - for error logging.public static int getUnclippedStart(int alignmentStart, Cigar cigar)
alignmentStart
- The start (1-based) of the alignmentcigar
- The cigar containing the alignment informationpublic static int getUnclippedEnd(int alignmentEnd, Cigar cigar)
alignmentEnd
- The end (1-based) of the alignmentcigar
- The cigar containing the alignment informationpublic static java.lang.String getMateCigarString(SAMRecord rec)
rec
- the SAM recordpublic static Cigar getMateCigar(SAMRecord rec, boolean withValidation)
rec
- the SAM recordwithValidation
- true if we are to validate the mate cigar before returning, false otherwise.public static Cigar getMateCigar(SAMRecord rec)
rec
- the SAM recordpublic static int getMateCigarLength(SAMRecord rec)
rec
- the SAM recordpublic static int getMateAlignmentEnd(SAMRecord rec)
rec
- the SAM recordpublic static int getMateUnclippedStart(SAMRecord rec)
rec
- the SAM recordpublic static int getMateUnclippedEnd(SAMRecord rec)
rec
- the SAM recordpublic static java.util.List<AlignmentBlock> getMateAlignmentBlocks(SAMRecord rec)
rec
- the SAM record
Returns blocks of the mate sequence that have been aligned directly to the
reference sequence. Note that clipped portions of the mate and inserted and
deleted bases (vs. the reference) are not represented in the alignment blocks.public static java.util.List<SAMValidationError> validateCigar(SAMRecord rec, Cigar cigar, java.lang.Integer referenceIndex, java.util.List<AlignmentBlock> alignmentBlocks, long recordNumber, java.lang.String cigarTypeName)
rec
- the SAM recordcigar
- The cigar containing the alignment informationreferenceIndex
- The reference indexalignmentBlocks
- The alignment blocks (parsed from the cigar)recordNumber
- For error reporting. -1 if not known.cigarTypeName
- For error reporting. "Read CIGAR" or "Mate Cigar"public static java.util.List<SAMValidationError> validateMateCigar(SAMRecord rec, long recordNumber)
rec
- the SAM recordrecordNumber
- For error reporting. -1 if not known.public static boolean hasMateCigar(SAMRecord rec)
rec
- public static java.lang.String getCanonicalRecordName(SAMRecord record)
record
- public static int getNumOverlappingAlignedBasesToClip(SAMRecord rec)
rec
- public static SAMRecord clipOverlappingAlignedBases(SAMRecord record, boolean noSideEffects)
getNumOverlappingAlignedBasesToClip(SAMRecord)
for how the number of overlapping bases is computed.
NB: this does not properly consider a cigar like: 100M20S10H.
NB: This method assumes that the record's mate is not contained within the given record's alignment.record
- the record from which to clip bases.noSideEffects
- if true a modified clone of the original record is returned, otherwise we modify the record directly.public static SAMRecord clipOverlappingAlignedBases(SAMRecord record, int numOverlappingBasesToClip, boolean noSideEffects)
record
- the record from which to clip bases.numOverlappingBasesToClip
- the number of bases to clip at the end of the read.noSideEffects
- if true a modified clone of the original record is returned, otherwise we modify the record directly.public static boolean isValidUnsignedIntegerAttribute(long value)
value
- a long value to checkBinaryCodec.MAX_UINT
, and false otherwise