Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A high duplication rate for WGS #372

Open
MaryGoAround opened this issue Aug 7, 2024 · 0 comments
Open

A high duplication rate for WGS #372

MaryGoAround opened this issue Aug 7, 2024 · 0 comments

Comments

@MaryGoAround
Copy link

MaryGoAround commented Aug 7, 2024

Hi I can WGS on cell lines. When I use wgs_calling_regions.hg38.bed as an input for making interval file and when I set off-target regions to 130000 for coverage calculation, I get things as below; Do you think I should be worry about duplication rate?
Picture 1

  1. mean.coverage.ontarget: This value represents the average coverage of the target regions. In this case, it is approximately 8.77, meaning on average, each base in the target regions was sequenced 8.77 times.
  2. mean.coverage.offtarget: This value represents the average coverage of the off-target regions, which is about 0.16. This is significantly lower than the on-target coverage, as expected in targeted sequencing where the focus is primarily on specific regions of interest.

3. mean.duplication.ontarget: This indicates the mean duplication rate of the on-target regions, approximately 0.99. A high duplication rate close to 1.0 suggests that almost all on-target reads are duplicates, which can happen if the library complexity is low or if there is an over-amplification during the library preparation.

  1. mean.duplication.offtarget: Similar to the on-target duplication rate, this value represents the mean duplication rate of the off-target regions, also approximately 0.99.
  2. mom.raw.ontarget: This stands for “median of means” of the raw coverage data for the on-target regions. The value 0.991334418167518 suggests high uniformity in coverage across the target regions before any normalization.
  3. mom.raw.offtarget: This is the “median of means” of the raw coverage data for the off-target regions. It is similar to the on-target raw data value, indicating consistency across off-target areas.
  4. mom.post.gc.ontarget: This is the “median of means” after GC bias correction for on-target regions. The correction attempts to account for GC-content bias in sequencing. The value 0.991334418167518 suggests minimal deviation from uniformity post-GC correction.
  5. mom.post.gc.offtarget: This value, similar to the on-target GC correction, applies to the off-target regions. The value shows how the coverage looks after adjusting for GC content across non-targeted regions.
  6. mom.post.reptiming.ontarget: After considering replication timing (the phase of DNA replication when particular sequences are duplicated), this metric shows the coverage median of means for the on-target areas, remaining unchanged in your case.
  7. mom.post.reptiming.offtarget: Like the on-target replication timing metric, this value is for off-target regions. It remains unchanged, indicating consistent coverage and bias corrections.

Thanks for any idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant