Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roslin 2.4.2 with a human+mouse merged reference #264

Closed
3 tasks done
ckandoth opened this issue Jun 18, 2019 · 2 comments
Closed
3 tasks done

Roslin 2.4.2 with a human+mouse merged reference #264

ckandoth opened this issue Jun 18, 2019 · 2 comments
Assignees

Comments

@ckandoth
Copy link
Collaborator

ckandoth commented Jun 18, 2019

Xenograft and PDX (patient derived xenografts) samples are contaminated by mouse DNA, that need to be filtered out before somatic variant calling is performed. BIC's solution to this for several years has been to use a merged human+mouse merged reference fasta against which BWA-MEM is performed, and then all reads that align better to mouse chromosomes are chopped out. We should theoretically be able to replicate this in Roslin 2.4.2 - which already runs GATK FindCoveredIntervals limited to human chromosomes only. These intervals are passed into Abra and also the variant callers, so all steps downstream of alignment should already limit themselves to human chromosomes.

Here is the merged aka hybrid reference that BIC uses, but it uses hg19 chrom names instead of GRCh37. So we may need to create or download our own.

/ifs/depot/assemblies/hybrids/H.sapiens_M.musculus/b37_mm10/index/bwa/0.7.12

Tasks:

  • Prep a merged human+mouse reference that uses GRCh37 chrom names for human, and some other names for mouse chromosomes (that won't be picked up by FindCoveredIntervals).
  • Test Roslin 2.4.2 on 06437_F using the merged reference (modify request_to_yaml as needed).
  • Monitor the run, and report bugs as comments on this issue, so we can figure out whether/how to fix them.
@ckandoth
Copy link
Collaborator Author

@timosong @allanbolipata this is to be a short term solution that uses existing Roslin infrastructure to process and deliver PDX data to investigators. In the meantime, Allan and team will work on a better process that incorporates this optional sample-specific step into CWL subworkflows, and uses AZN's https://github.com/AstraZeneca-NGS/disambiguate to create human-only BAMs for variant calling.

@ckandoth ckandoth changed the title Roslin 2.4.2 with a human+mouse merged reference on PDX data Roslin 2.4.2 with a human+mouse merged reference Jun 18, 2019
@ckandoth
Copy link
Collaborator Author

Marking this as done. The task to add "fraction of mouse reads" into data_clinical is in #270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants