Package picard.sam.util
Class ReadNameParser
java.lang.Object
picard.sam.util.ReadNameParser
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
OpticalDuplicateFinder
Provides access to the physical location information about a cluster.
All values should be defaulted to -1 if unavailable. ReadGroup and Tile should only allow
non-zero positive integers, x and y coordinates may be negative.
- See Also:
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionCreates are read name parser using the default read name regex and optical duplicate distance.ReadNameParser
(String readNameRegex) Creates are read name parser using the given read name regex.ReadNameParser
(String readNameRegex, htsjdk.samtools.util.Log log) Creates are read name parser using the given read name regex. -
Method Summary
Modifier and TypeMethodDescriptionboolean
addLocationInformation
(String readName, PhysicalLocation loc) static int
getLastThreeFields
(String readName, char delim, int[] tokens) Given a string, splits the string by the delimiter, and returns the the last three fields parsed as integers.static int
rapidParseInt
(String input) Very specialized method to rapidly parse a sequence of digits from a String up until the first non-digit character.
-
Field Details
-
DEFAULT_READ_NAME_REGEX
The read name regular expression (regex) is used to extract three pieces of information from the read name: tile, x location, and y location. Any read name regex should parse the read name to produce these and only these values. An example regex is: (?:.*:)?([0-9]+)[^:]*:([0-9]+)[^:]*:([0-9]+)[^:]*$ which assumes that fields in the read name are delimited by ':' and the last three fields correspond to the tile, x and y locations, ignoring any trailing non-digit characters. The default regex is optimized for fast parsing (seegetLastThreeFields(String, char, int[])
) by searching for the last three fields, ignoring any trailing non-digit characters, assuming the delimiter ':'. This should consider correctly read names where we have 5 or 7 field with the last three fields being tile/x/y, as is the case for the majority of read names produced by Illumina technology. -
readNameRegex
-
-
Constructor Details
-
ReadNameParser
public ReadNameParser()Creates are read name parser using the default read name regex and optical duplicate distance. SeeDEFAULT_READ_NAME_REGEX
for an explanation on how the read name is parsed. -
ReadNameParser
Creates are read name parser using the given read name regex. SeeDEFAULT_READ_NAME_REGEX
for an explanation on how to format the regular expression (regex) string.- Parameters:
readNameRegex
- the read name regular expression string to parse read names, null to never parse location information.
-
ReadNameParser
Creates are read name parser using the given read name regex. SeeDEFAULT_READ_NAME_REGEX
for an explanation on how to format the regular expression (regex) string.- Parameters:
readNameRegex
- the read name regular expression string to parse read names, null to never parse location information..log
- the log to which to write messages.
-
-
Method Details
-
addLocationInformation
-
getLastThreeFields
public static int getLastThreeFields(String readName, char delim, int[] tokens) throws NumberFormatException Given a string, splits the string by the delimiter, and returns the the last three fields parsed as integers. Parsing a field considers only a sequence of digits up until the first non-digit character. The three values are stored in the passed-in array.- Throws:
NumberFormatException
- if any of the tokens that should contain numbers do not start with parsable numbers
-
rapidParseInt
Very specialized method to rapidly parse a sequence of digits from a String up until the first non-digit character.- Throws:
NumberFormatException
- if the String does not start with an optional - followed by at least on digit
-