Package picard.sam.markduplicates
Class MarkDuplicatesWithMateCigar
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
-
- picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
-
- picard.sam.markduplicates.MarkDuplicatesWithMateCigar
-
@DocumentedFeature public class MarkDuplicatesWithMateCigar extends AbstractMarkDuplicatesCommandLineProgram
An even better duplication marking algorithm that handles all cases including clipped and gapped alignments. This tool differs with MarkDuplicates as it may break ties differently. Furthermore, as it is a one-pass algorithm, it cannot know the program records contained in the file that should be chained in advance. Therefore it will only be able to examine the header to attempt to infer those program group records that have no associated previous program group record. If a read is encountered without a program record, or not one as previously defined, it will not be updated. This tool will also not work with alignments that have large gaps or skips, such as those from RNA-seq data. This is due to the need to buffer small genomic windows to ensure integrity of the duplicate marking, while large skips (ex. skipping introns) in the alignment records would force making that window very large, thus exhausting memory.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
AbstractMarkDuplicatesCommandLineProgram.SamHeaderAndIterator
-
-
Field Summary
Fields Modifier and Type Field Description int
BLOCK_SIZE
int
MINIMUM_DISTANCE
-
Fields inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
ASSUME_SORT_ORDER, ASSUME_SORTED, COMMENT, DUPLICATE_SCORING_STRATEGY, INPUT, METRICS_FILE, OUTPUT, pgIdsSeen, pgTagArgumentCollection, PROGRAM_GROUP_COMMAND_LINE, PROGRAM_GROUP_NAME, PROGRAM_GROUP_VERSION, PROGRAM_RECORD_ID, REMOVE_DUPLICATES
-
Fields inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
LOG, MAX_OPTICAL_DUPLICATE_SET_SIZE, OPTICAL_DUPLICATE_PIXEL_DISTANCE, opticalDuplicateFinder, READ_NAME_REGEX
-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description MarkDuplicatesWithMateCigar()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected int
doWork()
Main work method.-
Methods inherited from class picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram
addDuplicateReadToMetrics, addReadToLibraryMetrics, addSingletonToCount, finalizeAndWriteMetrics, getChainedPgIds, openInputs, trackOpticalDuplicates
-
Methods inherited from class picard.sam.markduplicates.util.AbstractOpticalDuplicateFinderCommandLineProgram
customCommandLineValidation, setupOpticalDuplicateFinder
-
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
MINIMUM_DISTANCE
@Argument(doc="The minimum distance to buffer records to account for clipping on the 5\' end of the records. For a given alignment, this parameter controls the width of the window to search for duplicates of that alignment. Due to 5\' read clipping, duplicates do not necessarily have the same 5\' alignment coordinates, so the algorithm needs to search around the neighborhood. For single end sequencing data, the neighborhood is only determined by the amount of clipping (assuming no split reads), thus setting MINIMUM_DISTANCE to twice the sequencing read length should be sufficient. For paired end sequencing, the neighborhood is also determined by the fragment insert size, so you may want to set MINIMUM_DISTANCE to something like twice the 99.5% percentile of the fragment insert size distribution (see CollectInsertSizeMetrics). Or you can set this number to -1 to use either a) twice the first read\'s read length, or b) 100, whichever is smaller. Note that the larger the window, the greater the RAM requirements, so you could run into performance limitations if you use a value that is unnecessarily large.", optional=true) public int MINIMUM_DISTANCE
-
BLOCK_SIZE
@Argument(doc="The block size for use in the coordinate-sorted record buffer.", optional=true) public int BLOCK_SIZE
-
-
Method Detail
-
doWork
protected int doWork()
Main work method.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
-