Skip to content

Commit

Permalink
Changed Profiling Efficiency to match expected behavior
Browse files Browse the repository at this point in the history
  • Loading branch information
agraubert committed May 29, 2019
1 parent 0a3c78e commit aa1993f
Show file tree
Hide file tree
Showing 5 changed files with 25 additions and 5 deletions.
22 changes: 21 additions & 1 deletion Metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ of all reads which were not Secondary Alignments or Platform/Vendor QC Failing r
* Base Mismatch: The total number of mismatched bases (as determined by the "NM" tag) of all "Mapped Reads" (as defined above) divided by the total aligned length of all "Mapped Reads".
* End 1 & 2 Mapping Rate: The proportion of Paired reads which were marked as First or Second in the pair, respectively, out of all "Mapped Reads" (above).
* End 1 & 2 Mismatch Rate: The proportion of mismatched bases (as determined by the "NM" tag) belonging to First or Second mates, divided by the total aligned length of all "Mapped" (above) First or Second mates, respectively.
* Expression Profiling Efficiency: The proportion of "Exonic Reads" (see "Exonic Rate", below) out of all "Mapped Reads" (above).
* Expression Profiling Efficiency: The proportion of "Exonic Reads" (see "Exonic Rate", below) out of all reads which were not Secondary Alignments or
Platform/Vendor QC Failing reads.
* High Quality Rate: The proportion of **properly paired** reads with less than 6 mismatched bases and a perfect mapping quality out of all "Mapped Reads" (above).
* Exonic Rate: The proportion of "Mapped Reads" (above) for which all aligned segments unambiguously aligned to exons of the same gene.
* Intronic Rate: The proportion of "Mapped Reads" (above) for which all aligned segments unambiguously aligned to the same gene, but none of which _intersected_ any exons of the gene.
Expand Down Expand Up @@ -46,3 +47,22 @@ This file contains the raw counts of the observed insert sizes of the sample. Fr
This file contains coverage data for all genes. Coverage computations are always performed, but this file of per-gene coverage data is not produced unless
the `--coverage` flag is provided. The first column contains the gene ID as given by the input annotation. The next three columns contain the mean, standard deviation, and coefficient of variation of coverage for each gene, respectively. The first and last 500bp of each gene are dropped and not considered when computing coverage. A value of 0 or `nan` may indicate that the gene's coding length was less than 1kb or that the gene had 0 coverage
over it's exons.

## Migrating between old and new columns

For users of the legacy tool, several metrics have been renamed, removed, or changed.
Below is a table of previous metrics and how to access them using the new metrics names:

Old Metric | New Metric | Notes
-|-|-
Base Mismatch Rate | Base Mismatch |
Duplication Rate of Mapped | Duplicate Rate of Mapped |
End 1/2 % Sense | End 1/2 Sense Rate |
Estimated Library Size | Esitmated Library Complexity |
Failed Vendor QC Check | Failed Vendor QC |
Fragment Length Mean | Average Fragment Length | The fragment length metrics have changed significantly
Fragment Length StdDev | Fragment Length Std
Intragenic Rate | Intragenic Rate | Some reads previously classified as `Intragenic` are now classified as `Ambiguous Alignments`. The equivalent of the old `Intragenic Rate` can be computed by summing `Intragenic Rate` + `Ambigous Alignment Rate`
Mapped | Mapped Reads |
Mapped Unique | Mapped Unique Reads |
Total Purity Filtered Reads Sequenced | Unique Mapping, Vendor QC Passed Reads | This counts reads without the Secondary or QC Fail flags set. For a true count of total alignments use `Total Reads`
2 changes: 1 addition & 1 deletion src/RNASeQC.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -528,7 +528,7 @@ int main(int argc, char* argv[])
output << "End 2 Mapping Rate\t"<< 2.0 * counter.frac("End 2 Mapped Reads", "Unique Mapping, Vendor QC Passed Reads") << endl;
output << "End 1 Mismatch Rate\t" << counter.frac("End 1 Mismatches", "End 1 Bases") << endl;
output << "End 2 Mismatch Rate\t" << counter.frac("End 2 Mismatches", "End 2 Bases") << endl;
output << "Expression Profiling Efficiency\t" << counter.frac("Exonic Reads", "Total Reads") << endl;
output << "Expression Profiling Efficiency\t" << counter.frac("Exonic Reads", "Unique Mapping, Vendor QC Passed Reads") << endl;
output << "High Quality Rate\t" << counter.frac("High Quality Reads", "Mapped Reads") << endl;
output << "Exonic Rate\t" << counter.frac("Exonic Reads", "Mapped Reads") << endl;
output << "Intronic Rate\t" << counter.frac("Intronic Reads", "Mapped Reads") << endl;
Expand Down
2 changes: 1 addition & 1 deletion test_data/chr1.output/chr1.bam.metrics.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ End 1 Mapping Rate 1.01474
End 2 Mapping Rate 0.985262
End 1 Mismatch Rate 0.00253608
End 2 Mismatch Rate 0.0170406
Expression Profiling Efficiency 0.694246
Expression Profiling Efficiency 0.807719
High Quality Rate 0.884446
Exonic Rate 0.807719
Intronic Rate 0.131935
Expand Down
2 changes: 1 addition & 1 deletion test_data/downsampled.output/downsampled.bam.metrics.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ End 1 Mapping Rate 0.359515
End 2 Mapping Rate 0.349158
End 1 Mismatch Rate 0.00267655
End 2 Mismatch Rate 0.0175762
Expression Profiling Efficiency 0.24059
Expression Profiling Efficiency 0.275876
High Quality Rate 0.881642
Exonic Rate 0.778571
Intronic Rate 0.114795
Expand Down
2 changes: 1 addition & 1 deletion test_data/legacy.output/downsampled.bam.metrics.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ End 1 Mapping Rate 0.359421
End 2 Mapping Rate 0.34904
End 1 Mismatch Rate 0.00267648
End 2 Mismatch Rate 0.017572
Expression Profiling Efficiency 0.238034
Expression Profiling Efficiency 0.272945
High Quality Rate 0.881406
Exonic Rate 0.770301
Intronic Rate 0.159378
Expand Down

0 comments on commit aa1993f

Please sign in to comment.