not generating .best and .sing2 output files #69

slyahn · 2020-08-18T22:23:38Z

I am processing 8 multiplexed samples through demuxlet and everything seems to run fine until the very end. Demuxlet generates the .single file but not the .best and .sing2 files. The standard output shows that it finishes processing the droplets ("Finished processing 21976 droplets total") but then reports a segmentation fault (core dumped) error.

I started with 60GB memory and went up to 180GB and that did not fix it. The vcf is filtered to include only biallelic SNPS, it is sorted, and the contigs match in the bam and the vcf. I don't know what could be causing it to fail at the very end when writing the .best file. Do you have any suggestions?

Edited to add:
I've tried downsampling the bam to 10% of the original, and I still get the same segmentation fault and only the .single file is generated, so I don't think it's a memory issue.

I should note that this experiment is essentially a simulation using real data. We combined fastq files from 8 individual runs to simulate a multiplexed run. The combined fastq was processed with Cellranger without error. The genotype vcf was generated by a private company who did low pass whole genome sequencing and imputation.

ddsouz5 · 2020-11-11T01:05:21Z

Hi @slyahn, did you figure out what the problem was? Having the same issue too!

VincentGardeux · 2021-02-01T14:30:53Z

Would Fix #59 fix the memory issue?
We tested on ~50 genotypes / 5M snps and it runs without out of RAM.

boxiangliu · 2022-05-16T18:45:44Z

Dear @hyunminkang, I am having the same issue as stated above. The software runs to the step where *.single has been generated, but reports a segmentation fault. The *.sing2 and *.best files are empty.

I am not sure how to debug this error. Memory does not seem to be the issue (my machine has 386G RAM). Do you have any idea why this would happen? Could you point us to the right path?

VincentGardeux · 2022-05-18T10:29:40Z

Hi @boxiangliu,

The way it's coded in demuxlet is definitely not the best, i.e. they generate huge HUGE arrays (which is both not optimal, and not needed). For e.g. there is a line which creates an array gpAB:

double* gpAB = new double[scl.nsnps * nv * nv * 9];

So in my example of 5M snps (nsnps), 50 genotypes (nv), and since double size is 8 bytes, this would generate an array of size 5000000 x 50 x 50 x 9 x 8 = 900Gb. Do you have 900Gb of RAM? :D

That's why I suggested the Fix #59 two years ago, which does not create the array, and just compute the data on the go without storing it to RAM. But it was never merged to the main branch.

I guess you can maybe try it (Fix #59), to see if it solves your issue.

Hope this helps.

Cheers

hyunminkang · 2022-10-11T07:32:42Z

I don't think it is a good idea to put 5M SNPs with the current implementation. I suggest to use only common variants in coding region. Thanks, Hyun. ----------------------------------------------------- Hyun Min Kang, Ph.D. Professor of Biostatistics University of Michigan, Ann Arbor Email : ***@***.***

…

On Wed, May 18, 2022 at 6:30 AM Vincent Gardeux ***@***.***> wrote: Hi @boxiangliu <https://github.com/boxiangliu>, The way it's coded in demuxlet is definitely not the best, i.e. they generate huge HUGE arrays (which is both not optimal, and not needed). For e.g. there is a line which creates an array gpAB: double* gpAB = new double[scl.nsnps * nv * nv * 9]; So in my example of 5M snps (nsnps), 50 genotypes (nv), and since double size is 8 bytes, this would generate an array of size 5000000*50*50*9*8 = 900Gb. Do you have 900Gb of RAM? :D That's why I suggested the Fix #59 <#59> two years ago, which does not create the array, and just compute the data on the go without storing it to RAM. But it was never merged to the main branch. I guess you can maybe try it (Fix #59 <#59>), to see if it solves your issue. Hope this helps. Cheers — Reply to this email directly, view it on GitHub <#69 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPY5OLKEKQUTKUNZKXB62LVKTBCDANCNFSM4QEGSCLA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not generating .best and .sing2 output files #69

not generating .best and .sing2 output files #69

slyahn commented Aug 18, 2020 •

edited

Loading

ddsouz5 commented Nov 11, 2020

VincentGardeux commented Feb 1, 2021

boxiangliu commented May 16, 2022

VincentGardeux commented May 18, 2022 •

edited

Loading

hyunminkang commented Oct 11, 2022 via email

not generating .best and .sing2 output files #69

not generating .best and .sing2 output files #69

Comments

slyahn commented Aug 18, 2020 • edited Loading

ddsouz5 commented Nov 11, 2020

VincentGardeux commented Feb 1, 2021

boxiangliu commented May 16, 2022

VincentGardeux commented May 18, 2022 • edited Loading

hyunminkang commented Oct 11, 2022 via email

slyahn commented Aug 18, 2020 •

edited

Loading

VincentGardeux commented May 18, 2022 •

edited

Loading