You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Xenograft and PDX (patient derived xenografts) samples are contaminated by mouse DNA, that need to be filtered out before somatic variant calling is performed. BIC's solution to this for several years has been to use a merged human+mouse merged reference fasta against which BWA-MEM is performed, and then all reads that align better to mouse chromosomes are chopped out. We should theoretically be able to replicate this in Roslin 2.4.2 - which already runs GATK FindCoveredIntervals limited to human chromosomes only. These intervals are passed into Abra and also the variant callers, so all steps downstream of alignment should already limit themselves to human chromosomes.
Here is the merged aka hybrid reference that BIC uses, but it uses hg19 chrom names instead of GRCh37. So we may need to create or download our own.
Prep a merged human+mouse reference that uses GRCh37 chrom names for human, and some other names for mouse chromosomes (that won't be picked up by FindCoveredIntervals).
Test Roslin 2.4.2 on 06437_F using the merged reference (modify request_to_yaml as needed).
Monitor the run, and report bugs as comments on this issue, so we can figure out whether/how to fix them.
The text was updated successfully, but these errors were encountered:
@timosong@allanbolipata this is to be a short term solution that uses existing Roslin infrastructure to process and deliver PDX data to investigators. In the meantime, Allan and team will work on a better process that incorporates this optional sample-specific step into CWL subworkflows, and uses AZN's https://github.com/AstraZeneca-NGS/disambiguate to create human-only BAMs for variant calling.
ckandoth
changed the title
Roslin 2.4.2 with a human+mouse merged reference on PDX data
Roslin 2.4.2 with a human+mouse merged reference
Jun 18, 2019
Xenograft and PDX (patient derived xenografts) samples are contaminated by mouse DNA, that need to be filtered out before somatic variant calling is performed. BIC's solution to this for several years has been to use a merged human+mouse merged reference fasta against which BWA-MEM is performed, and then all reads that align better to mouse chromosomes are chopped out. We should theoretically be able to replicate this in Roslin 2.4.2 - which already runs GATK FindCoveredIntervals limited to human chromosomes only. These intervals are passed into Abra and also the variant callers, so all steps downstream of alignment should already limit themselves to human chromosomes.
Here is the merged aka hybrid reference that BIC uses, but it uses hg19 chrom names instead of GRCh37. So we may need to create or download our own.
Tasks:
The text was updated successfully, but these errors were encountered: