-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not generating .best and .sing2 output files #69
Comments
Hi @slyahn, did you figure out what the problem was? Having the same issue too! |
Would Fix #59 fix the memory issue? |
Dear @hyunminkang, I am having the same issue as stated above. The software runs to the step where *.single has been generated, but reports a segmentation fault. The *.sing2 and *.best files are empty. I am not sure how to debug this error. Memory does not seem to be the issue (my machine has 386G RAM). Do you have any idea why this would happen? Could you point us to the right path? |
Hi @boxiangliu, The way it's coded in demuxlet is definitely not the best, i.e. they generate huge HUGE arrays (which is both not optimal, and not needed). For e.g. there is a line which creates an array
So in my example of 5M snps (nsnps), 50 genotypes (nv), and since double size is 8 bytes, this would generate an array of size 5000000 x 50 x 50 x 9 x 8 = 900Gb. Do you have 900Gb of RAM? :D That's why I suggested the Fix #59 two years ago, which does not create the array, and just compute the data on the go without storing it to RAM. But it was never merged to the main branch. I guess you can maybe try it (Fix #59), to see if it solves your issue. Hope this helps. Cheers |
I don't think it is a good idea to put 5M SNPs with the current
implementation. I suggest to use only common variants in coding region.
Thanks,
Hyun.
-----------------------------------------------------
Hyun Min Kang, Ph.D.
Professor of Biostatistics
University of Michigan, Ann Arbor
Email : ***@***.***
…On Wed, May 18, 2022 at 6:30 AM Vincent Gardeux ***@***.***> wrote:
Hi @boxiangliu <https://github.com/boxiangliu>,
The way it's coded in demuxlet is definitely not the best, i.e. they
generate huge HUGE arrays (which is both not optimal, and not needed). For
e.g. there is a line which creates an array gpAB:
double* gpAB = new double[scl.nsnps * nv * nv * 9];
So in my example of 5M snps (nsnps), 50 genotypes (nv), and since double
size is 8 bytes, this would generate an array of size 5000000*50*50*9*8 =
900Gb. Do you have 900Gb of RAM? :D
That's why I suggested the Fix #59
<#59> two years ago, which does
not create the array, and just compute the data on the go without storing
it to RAM. But it was never merged to the main branch.
I guess you can maybe try it (Fix #59
<#59>), to see if it solves your
issue.
Hope this helps.
Cheers
—
Reply to this email directly, view it on GitHub
<#69 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPY5OLKEKQUTKUNZKXB62LVKTBCDANCNFSM4QEGSCLA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I am processing 8 multiplexed samples through demuxlet and everything seems to run fine until the very end. Demuxlet generates the .single file but not the .best and .sing2 files. The standard output shows that it finishes processing the droplets ("Finished processing 21976 droplets total") but then reports a segmentation fault (core dumped) error.
I started with 60GB memory and went up to 180GB and that did not fix it. The vcf is filtered to include only biallelic SNPS, it is sorted, and the contigs match in the bam and the vcf. I don't know what could be causing it to fail at the very end when writing the .best file. Do you have any suggestions?
Edited to add:
I've tried downsampling the bam to 10% of the original, and I still get the same segmentation fault and only the .single file is generated, so I don't think it's a memory issue.
I should note that this experiment is essentially a simulation using real data. We combined fastq files from 8 individual runs to simulate a multiplexed run. The combined fastq was processed with Cellranger without error. The genotype vcf was generated by a private company who did low pass whole genome sequencing and imputation.
The text was updated successfully, but these errors were encountered: