Package picard.illumina
Class ExtractIlluminaBarcodes
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.illumina.ExtractIlluminaBarcodes
-
@DocumentedFeature public class ExtractIlluminaBarcodes extends CommandLineProgram
Determine the barcode for each read in an Illumina lane. For each tile, a file is written to the basecalls directory of the form s__ _barcode.txt. An output file contains a line for each read in the tile, aligned with the regular basecall output The output file contains the following tab-separated columns: - read subsequence at barcode position - Y or N indicating if there was a barcode match - matched barcode sequence (empty if read did not match one of the barcodes). If there is no match but we're close to the threshold of calling it a match we output the barcode that would have been matched but in lower case - distance to best matching barcode, "mismatches" (*) - distance to second-best matching barcode, "mismatchesToSecondBest" (*) NOTE (*): Due to an optimization the reported mismatches & mismatchesToSecondBest values may be inaccurate as long as the conclusion (match vs. no-match) isn't affected. For example, reported mismatches and mismatchesToSecondBest may be smaller than their true value if mismatches is truly larger than MAX_MISMATCHES. Also, mismatchesToSecondBest might be smaller than its true value if its true value is greater than mismatches + MIN_MISMATCH_DELTA.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ExtractIlluminaBarcodes.BarcodeMetric
Metrics produced by the ExtractIlluminaBarcodes program that is used to parse data in the basecalls directory and determine to which barcode each read should be assigned.static class
ExtractIlluminaBarcodes.PerTileBarcodeExtractor
Extracts barcodes and accumulates metrics for an entire tile.
-
Field Summary
Fields Modifier and Type Field Description List<String>
BARCODE
File
BARCODE_FILE
static String
BARCODE_NAME_COLUMN
Column header for the barcode name.static String
BARCODE_SEQUENCE_1_COLUMN
Column header for the first barcode sequence.static String
BARCODE_SEQUENCE_COLUMN
Column header for the first barcode sequence (preferred).File
BASECALLS_DIR
boolean
COMPRESS_OUTPUTS
DistanceMetric
DISTANCE_MODE
Integer
LANE
static String
LIBRARY_NAME_COLUMN
Column header for the library name.int
MAX_MISMATCHES
int
MAX_NO_CALLS
File
METRICS_FILE
int
MIN_MISMATCH_DELTA
int
MINIMUM_BASE_QUALITY
int
MINIMUM_QUALITY
int
NUM_PROCESSORS
File
OUTPUT_DIR
String
READ_STRUCTURE
-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description ExtractIlluminaBarcodes()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String[]
customCommandLineValidation()
Validate that POSITION >= 1, and that all BARCODEs are the same length and uniqueprotected int
doWork()
Do the work after command line has been parsed.static void
finalizeMetrics(Map<String,ExtractIlluminaBarcodes.BarcodeMetric> barcodeToMetrics, ExtractIlluminaBarcodes.BarcodeMetric noMatchMetric)
-
Methods inherited from class picard.cmdline.CommandLineProgram
getCommandLine, getCommandLineParser, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
BARCODE_SEQUENCE_COLUMN
public static final String BARCODE_SEQUENCE_COLUMN
Column header for the first barcode sequence (preferred).- See Also:
- Constant Field Values
-
BARCODE_SEQUENCE_1_COLUMN
public static final String BARCODE_SEQUENCE_1_COLUMN
Column header for the first barcode sequence.- See Also:
- Constant Field Values
-
BARCODE_NAME_COLUMN
public static final String BARCODE_NAME_COLUMN
Column header for the barcode name.- See Also:
- Constant Field Values
-
LIBRARY_NAME_COLUMN
public static final String LIBRARY_NAME_COLUMN
Column header for the library name.- See Also:
- Constant Field Values
-
BASECALLS_DIR
@Argument(doc="The Illumina basecalls directory. ", shortName="B") public File BASECALLS_DIR
-
OUTPUT_DIR
@Argument(doc="Where to write _barcode.txt files. By default, these are written to BASECALLS_DIR.", optional=true) public File OUTPUT_DIR
-
LANE
@Argument(doc="Lane number. ", shortName="L") public Integer LANE
-
READ_STRUCTURE
@Argument(doc="A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Sample Barcode, M for molecular barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of \"28T8M8B8S28T\" then the sequence may be split up into four reads:\n* read one with 28 cycles (bases) of template\n* read two with 8 cycles (bases) of molecular barcode (ex. unique molecular barcode)\n* read three with 8 cycles (bases) of sample barcode\n* 8 cycles (bases) skipped.\n* read four with 28 cycles (bases) of template\nThe skipped cycles would NOT be included in an output SAM/BAM file or in read groups therein.", shortName="RS") public String READ_STRUCTURE
-
BARCODE
@Argument(doc="Barcode sequence. These must be unique, and all the same length. This cannot be used with reads that have more than one barcode; use BARCODE_FILE in that case. ", mutex="BARCODE_FILE") public List<String> BARCODE
-
BARCODE_FILE
@Argument(doc="Tab-delimited file of barcode sequences, barcode name and, optionally, library name. Barcodes must be unique and all the same length. Column headers must be \'barcode_sequence\' (or \'barcode_sequence_1\'), \'barcode_sequence_2\' (optional), \'barcode_name\', and \'library_name\'.", mutex="BARCODE") public File BARCODE_FILE
-
METRICS_FILE
@Argument(doc="Per-barcode and per-lane metrics written to this file.", shortName="M") public File METRICS_FILE
-
MAX_MISMATCHES
@Argument(doc="Maximum mismatches for a barcode to be considered a match.") public int MAX_MISMATCHES
-
MIN_MISMATCH_DELTA
@Argument(doc="Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match.") public int MIN_MISMATCH_DELTA
-
MAX_NO_CALLS
@Argument(doc="Maximum allowable number of no-calls in a barcode read before it is considered unmatchable.") public int MAX_NO_CALLS
-
MINIMUM_BASE_QUALITY
@Argument(shortName="Q", doc="Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even if the bases match.") public int MINIMUM_BASE_QUALITY
-
MINIMUM_QUALITY
@Argument(doc="The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina\'s spec describes as the minimum, but in practice the value has been observed lower.") public int MINIMUM_QUALITY
-
COMPRESS_OUTPUTS
@Argument(shortName="GZIP", doc="Compress output s_l_t_barcode.txt files using gzip and append a .gz extension to the file names.") public boolean COMPRESS_OUTPUTS
-
NUM_PROCESSORS
@Argument(doc="Run this many PerTileBarcodeExtractors in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0 then the number of cores used will be the number available on the machine less NUM_PROCESSORS.") public int NUM_PROCESSORS
-
DISTANCE_MODE
@Argument(doc="The distance metric that should be used to compare the barcode-reads and the provided barcodes for finding the best and second-best assignments.") public DistanceMetric DISTANCE_MODE
-
-
Method Detail
-
doWork
protected int doWork()
Description copied from class:CommandLineProgram
Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
finalizeMetrics
public static void finalizeMetrics(Map<String,ExtractIlluminaBarcodes.BarcodeMetric> barcodeToMetrics, ExtractIlluminaBarcodes.BarcodeMetric noMatchMetric)
-
customCommandLineValidation
protected String[] customCommandLineValidation()
Validate that POSITION >= 1, and that all BARCODEs are the same length and unique- Overrides:
customCommandLineValidation
in classCommandLineProgram
- Returns:
- null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
-
-