Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: iCount xlsites expected runtime #191

Open
mirax87 opened this issue Mar 27, 2019 · 2 comments
Open

Question: iCount xlsites expected runtime #191

mirax87 opened this issue Mar 27, 2019 · 2 comments

Comments

@mirax87
Copy link

mirax87 commented Mar 27, 2019

Hi,

in order to process our D. melanogaster iCLIP library, I used snakemake to put the iCount steps together and integrated benchmarking, specifically for iCount xlsites with quantification based on cDNA and reads.

Here, I am observing runtimes of ~1 - 4 days on our cluster system for iCount xlsites. The number of reads per multiplexing barcode is quite variable, which correlates with runtime.

In terms of parameters, I use

  • --group_by start
  • mapq_th 3

using the output gtf from iCount segment

I wonder what - next to total number of mapped reads - determines the runtime of iCount xlsites and whether there are some useful pre-filtering strategies of the BAM files to speed up the process without losing (too much) sensitivity.

Cheers

@JureZmrzlikar
Copy link
Contributor

Hi @mirax87 !

Are you using --segmentation input? If you do, this i the main reason that iCount xlsites is taking so long. Please run it without segmentation (AFAIK, this is the way most users do it). We should speed up the algorithm in case segmentation is given, but never found the time to do it properly

Regarding other factors that could affect runtime:

  • group_by should have zero effect on runtime
  • higher mapq_th will take into account less (poorly mapped) reads, so this should speed things up a bit. But if the quality of mapping is suffcient this should not be very significant
  • If you have really high coverage (>10k, 100k), lowering the max_barcodes parameter can speed up things significantly, but this should be used only in such cases.

@mirax87
Copy link
Author

mirax87 commented Mar 28, 2019

Hi @JureZmrzlikar,

you are right, I am using iCount xlsites --segmentation. I'll try without.

Thanks for the quick feedback.
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants