Class AlignmentSummaryMetrics


  • @DocumentedFeature(groupName="Metrics",
                       groupSummary="Metrics",
                       summary="Alignment metrics")
    public class AlignmentSummaryMetrics
    extends MultilevelMetrics
    High level metrics about the alignment of reads within a SAM file, produced by the CollectAlignmentSummaryMetrics program and usually stored in a file with the extension ".alignment_summary_metrics".
    • Field Summary

      Fields 
      Modifier and Type Field Description
      double AVG_POS_3PRIME_SOFTCLIP_LENGTH
      The average length of the soft-clipped bases at the 3' end of reads.
      long BAD_CYCLES
      The number of instrument cycles in which 80% or more of base calls were no-calls.
      AlignmentSummaryMetrics.Category CATEGORY
      One of either UNPAIRED (for a fragment run), FIRST_OF_PAIR when metrics are for only the first read in a paired run, SECOND_OF_PAIR when the metrics are for only the second read in a paired run or PAIR when the metrics are aggregated for both first and second reads in a pair.
      double MAD_READ_LENGTH
      The median absolute deviation of the distribution of all read lengths.
      double MAX_READ_LENGTH
      The maximum read length.
      double MEAN_READ_LENGTH
      The mean read length of the set of reads examined.
      double MEDIAN_READ_LENGTH
      The median read length.
      double MIN_READ_LENGTH
      The minimum read length.
      double PCT_ADAPTER
      The fraction of PF reads that are unaligned or aligned with MQ0 and match to a known adapter sequence right from the start of the read (indication of adapter-dimer pairs).
      double PCT_CHIMERAS
      The fraction of reads that map outside of a maximum insert size (usually 100kb) or that have the two ends mapping to different chromosomes.
      double PCT_HARDCLIP
      The fraction of PF bases that are (on primary, aligned reads and) hard-clipped, as a fraction of the PF_ALIGNED_BASES (even though these are not aligned!)
      double PCT_PF_READS
      The fraction of reads that are PF (PF_READS / TOTAL_READS)
      double PCT_PF_READS_ALIGNED
      The percentage of PF reads that aligned to the reference sequence.
      double PCT_PF_READS_IMPROPER_PAIRS
      The fraction of (primary) reads that are *not* "properly" aligned in pairs (as per SAM flag 0x2).
      double PCT_READS_ALIGNED_IN_PAIRS
      The fraction of aligned reads whose mate pair was also aligned to the reference.
      double PCT_SOFTCLIP
      the fraction of PF bases that are on (primary) aligned reads and are soft-clipped, as a fraction of the PF_ALIGNED_BASES (even though these are not aligned!)
      long PF_ALIGNED_BASES
      The total number of aligned bases, in all mapped PF reads, that are aligned to the reference sequence.
      long PF_HQ_ALIGNED_BASES
      The number of bases aligned to the reference sequence in reads that were mapped at high quality.
      long PF_HQ_ALIGNED_Q20_BASES
      The subset of PF_HQ_ALIGNED_BASES where the base call quality was Q20 or higher.
      long PF_HQ_ALIGNED_READS
      The number of PF reads that were aligned to the reference sequence with a mapping quality of Q20 or higher signifying that the aligner estimates a 1/100 (or smaller) chance that the alignment is wrong.
      double PF_HQ_ERROR_RATE
      The fraction of bases that mismatch the reference in PF HQ aligned reads.
      double PF_HQ_MEDIAN_MISMATCHES
      The median number of mismatches versus the reference sequence in reads that were aligned to the reference at high quality (i.e.
      double PF_INDEL_RATE
      The number of insertion and deletion events per 100 aligned bases.
      double PF_MISMATCH_RATE
      The rate of bases mismatching the reference for all bases aligned to the reference sequence.
      long PF_NOISE_READS
      The number of PF reads that are marked as noise reads.
      long PF_READS
      The number of PF reads where PF is defined as passing Illumina's filter.
      long PF_READS_ALIGNED
      The number of PF reads that were aligned to the reference sequence.
      long PF_READS_IMPROPER_PAIRS
      The number of (primary) aligned reads that are **not** "properly" aligned in pairs (as per SAM flag 0x2).
      long READS_ALIGNED_IN_PAIRS
      The number of aligned reads whose mate pair was also aligned to the reference.
      double SD_READ_LENGTH
      The standard deviation of the read lengths.
      double STRAND_BALANCE
      The number of PF reads aligned to the positive strand of the genome divided by the number of PF reads aligned to the genome.
      long TOTAL_READS
      The total number of reads including all PF and non-PF reads.
    • Field Detail

      • CATEGORY

        public AlignmentSummaryMetrics.Category CATEGORY
        One of either UNPAIRED (for a fragment run), FIRST_OF_PAIR when metrics are for only the first read in a paired run, SECOND_OF_PAIR when the metrics are for only the second read in a paired run or PAIR when the metrics are aggregated for both first and second reads in a pair.
      • TOTAL_READS

        public long TOTAL_READS
        The total number of reads including all PF and non-PF reads. When CATEGORY equals PAIR this value will be 2x the number of clusters.
      • PF_READS

        public long PF_READS
        The number of PF reads where PF is defined as passing Illumina's filter.
      • PCT_PF_READS

        public double PCT_PF_READS
        The fraction of reads that are PF (PF_READS / TOTAL_READS)
      • PF_NOISE_READS

        public long PF_NOISE_READS
        The number of PF reads that are marked as noise reads. A noise read is one which is composed entirely of A bases and/or N bases. These reads are marked as they are usually artifactual and are of no use in downstream analysis.
      • PF_READS_ALIGNED

        public long PF_READS_ALIGNED
        The number of PF reads that were aligned to the reference sequence. This includes reads that aligned with low quality (i.e. their alignments are ambiguous).
      • PCT_PF_READS_ALIGNED

        public double PCT_PF_READS_ALIGNED
        The percentage of PF reads that aligned to the reference sequence. PF_READS_ALIGNED / PF_READS
      • PF_ALIGNED_BASES

        public long PF_ALIGNED_BASES
        The total number of aligned bases, in all mapped PF reads, that are aligned to the reference sequence.
      • PF_HQ_ALIGNED_READS

        public long PF_HQ_ALIGNED_READS
        The number of PF reads that were aligned to the reference sequence with a mapping quality of Q20 or higher signifying that the aligner estimates a 1/100 (or smaller) chance that the alignment is wrong.
      • PF_HQ_ALIGNED_BASES

        public long PF_HQ_ALIGNED_BASES
        The number of bases aligned to the reference sequence in reads that were mapped at high quality. Will usually approximate PF_HQ_ALIGNED_READS * READ_LENGTH but may differ when either mixed read lengths are present or many reads are aligned with gaps.
      • PF_HQ_ALIGNED_Q20_BASES

        public long PF_HQ_ALIGNED_Q20_BASES
        The subset of PF_HQ_ALIGNED_BASES where the base call quality was Q20 or higher.
      • PF_HQ_MEDIAN_MISMATCHES

        public double PF_HQ_MEDIAN_MISMATCHES
        The median number of mismatches versus the reference sequence in reads that were aligned to the reference at high quality (i.e. PF_HQ_ALIGNED READS).
      • PF_MISMATCH_RATE

        public double PF_MISMATCH_RATE
        The rate of bases mismatching the reference for all bases aligned to the reference sequence.
      • PF_HQ_ERROR_RATE

        public double PF_HQ_ERROR_RATE
        The fraction of bases that mismatch the reference in PF HQ aligned reads.
      • PF_INDEL_RATE

        public double PF_INDEL_RATE
        The number of insertion and deletion events per 100 aligned bases. Uses the number of events as the numerator, not the number of inserted or deleted bases.
      • MEAN_READ_LENGTH

        public double MEAN_READ_LENGTH
        The mean read length of the set of reads examined. When looking at the data for a single lane with equal length reads this number is just the read length. When looking at data for merged lanes with differing read lengths this is the mean read length of all reads. Computed using all read lengths including clipped bases.
      • SD_READ_LENGTH

        public double SD_READ_LENGTH
        The standard deviation of the read lengths. Computed using all read lengths including clipped bases.
      • MEDIAN_READ_LENGTH

        public double MEDIAN_READ_LENGTH
        The median read length. Computed using all read lengths including clipped bases.
      • MAD_READ_LENGTH

        public double MAD_READ_LENGTH
        The median absolute deviation of the distribution of all read lengths. If the distribution is essentially normal then the standard deviation can be estimated as ~1.4826 * MAD. Computed using all read lengths including clipped bases.
      • MIN_READ_LENGTH

        public double MIN_READ_LENGTH
        The minimum read length. Computed using all read lengths including clipped bases.
      • MAX_READ_LENGTH

        public double MAX_READ_LENGTH
        The maximum read length. Computed using all read lengths including clipped bases.
      • READS_ALIGNED_IN_PAIRS

        public long READS_ALIGNED_IN_PAIRS
        The number of aligned reads whose mate pair was also aligned to the reference.
      • PCT_READS_ALIGNED_IN_PAIRS

        public double PCT_READS_ALIGNED_IN_PAIRS
        The fraction of aligned reads whose mate pair was also aligned to the reference. READS_ALIGNED_IN_PAIRS / PF_READS_ALIGNED
      • PF_READS_IMPROPER_PAIRS

        public long PF_READS_IMPROPER_PAIRS
        The number of (primary) aligned reads that are **not** "properly" aligned in pairs (as per SAM flag 0x2).
      • PCT_PF_READS_IMPROPER_PAIRS

        public double PCT_PF_READS_IMPROPER_PAIRS
        The fraction of (primary) reads that are *not* "properly" aligned in pairs (as per SAM flag 0x2). PF_READS_IMPROPER_PAIRS / PF_READS_ALIGNED
      • BAD_CYCLES

        public long BAD_CYCLES
        The number of instrument cycles in which 80% or more of base calls were no-calls.
      • STRAND_BALANCE

        public double STRAND_BALANCE
        The number of PF reads aligned to the positive strand of the genome divided by the number of PF reads aligned to the genome.
      • PCT_CHIMERAS

        public double PCT_CHIMERAS
        The fraction of reads that map outside of a maximum insert size (usually 100kb) or that have the two ends mapping to different chromosomes.
      • PCT_ADAPTER

        public double PCT_ADAPTER
        The fraction of PF reads that are unaligned or aligned with MQ0 and match to a known adapter sequence right from the start of the read (indication of adapter-dimer pairs).
      • PCT_SOFTCLIP

        public double PCT_SOFTCLIP
        the fraction of PF bases that are on (primary) aligned reads and are soft-clipped, as a fraction of the PF_ALIGNED_BASES (even though these are not aligned!)
      • PCT_HARDCLIP

        public double PCT_HARDCLIP
        The fraction of PF bases that are (on primary, aligned reads and) hard-clipped, as a fraction of the PF_ALIGNED_BASES (even though these are not aligned!)
      • AVG_POS_3PRIME_SOFTCLIP_LENGTH

        public double AVG_POS_3PRIME_SOFTCLIP_LENGTH
        The average length of the soft-clipped bases at the 3' end of reads. This could be used as an estimate for the amount by which the insert-size must be increased in order to obtain a significant reduction in bases lost due to reading off the end of the insert.
    • Constructor Detail

      • AlignmentSummaryMetrics

        public AlignmentSummaryMetrics()