Package picard.sam.markduplicates
Class MarkDuplicatesWithMateCigar
java.lang.Object
picard.cmdline.CommandLineProgram
picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
picard.sam.markduplicates.MarkDuplicatesWithMateCigar
@DocumentedFeature
public class MarkDuplicatesWithMateCigar
extends AbstractMarkDuplicatesCommandLineProgram
An even better duplication marking algorithm that handles all cases including clipped
and gapped alignments.
This tool differs with MarkDuplicates as it may break ties differently. Furthermore,
as it is a one-pass algorithm, it cannot know the program records contained in the file
that should be chained in advance. Therefore it will only be able to examine the header
to attempt to infer those program group records that have no associated previous program
group record. If a read is encountered without a program record, or not one as previously
defined, it will not be updated.
This tool will also not work with alignments that have large gaps or skips, such as those
from RNA-seq data. This is due to the need to buffer small genomic windows to ensure
integrity of the duplicate marking, while large skips (ex. skipping introns) in the
alignment records would force making that window very large, thus exhausting memory.
-
Nested Class Summary
Nested classes/interfaces inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
AbstractMarkDuplicatesCommandLineProgram.SamHeaderAndIterator
-
Field Summary
FieldsFields inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
ASSUME_SORT_ORDER, ASSUME_SORTED, COMMENT, DUPLICATE_SCORING_STRATEGY, INPUT, METRICS_FILE, OUTPUT, pgIdsSeen, pgTagArgumentCollection, PROGRAM_GROUP_COMMAND_LINE, PROGRAM_GROUP_NAME, PROGRAM_GROUP_VERSION, PROGRAM_RECORD_ID, REMOVE_DUPLICATES
Fields inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
LOG, MAX_OPTICAL_DUPLICATE_SET_SIZE, OPTICAL_DUPLICATE_PIXEL_DISTANCE, opticalDuplicateFinder, READ_NAME_REGEX
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, SYNTAX_TRANSITION_URL, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
Constructor Summary
Constructors -
Method Summary
Methods inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
addReadToLibraryMetrics, addSingletonToCount, finalizeAndWriteMetrics, getChainedPgIds, openInputs, trackOpticalDuplicates
Methods inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
customCommandLineValidation, setupOpticalDuplicateFinder
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
Field Details
-
MINIMUM_DISTANCE
@Argument(doc="The minimum distance to buffer records to account for clipping on the 5\' end of the records. For a given alignment, this parameter controls the width of the window to search for duplicates of that alignment. Due to 5\' read clipping, duplicates do not necessarily have the same 5\' alignment coordinates, so the algorithm needs to search around the neighborhood. For single end sequencing data, the neighborhood is only determined by the amount of clipping (assuming no split reads), thus setting MINIMUM_DISTANCE to twice the sequencing read length should be sufficient. For paired end sequencing, the neighborhood is also determined by the fragment insert size, so you may want to set MINIMUM_DISTANCE to something like twice the 99.5% percentile of the fragment insert size distribution (see CollectInsertSizeMetrics). Or you can set this number to -1 to use either a) twice the first read\'s read length, or b) 100, whichever is smaller. Note that the larger the window, the greater the RAM requirements, so you could run into performance limitations if you use a value that is unnecessarily large.", optional=true) public int MINIMUM_DISTANCE -
BLOCK_SIZE
@Argument(doc="The block size for use in the coordinate-sorted record buffer.", optional=true) public int BLOCK_SIZE
-
-
Constructor Details
-
MarkDuplicatesWithMateCigar
public MarkDuplicatesWithMateCigar()
-
-
Method Details
-
doWork
protected int doWork()Main work method.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-