Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multithreading not working #15

Open
steffenheyne opened this issue Jun 19, 2018 · 5 comments
Open

multithreading not working #15

steffenheyne opened this issue Jun 19, 2018 · 5 comments
Assignees
Milestone

Comments

@steffenheyne
Copy link

Hi!

I use eg.
cond1_fit = enrichR(treatment = cond1_bam$name, control = cond1_bam_input$name, genome = genome, countConfig = countConfigPE, procs = 10,verbose = TRUE)

but I always get only two threads running at 100%, no matter what I specify for procs=?

@your-highness
Copy link
Owner

Dear @steffenheyne ,

During quantification of the signals (read counts) in the bam file, the method uses at most 2 threads, i.e. one for treatment and control bam each -- The rationale behind being the I/O bottleneck.

Only the downstream fitting and enrichment quantification procedures utilizes the specified procs argument.

Best,

@steffenheyne
Copy link
Author

ok,I see

... but I think the usage of more threads would be still useful as bamsignals really profits from and with at least 10 threads we don't have any I/O issues on our cluster ... so leave this decision better to the user would be nice

@your-highness
Copy link
Owner

your-highness commented Jun 19, 2018

Dear @steffenheyne ,

You are right! On clusters I/O usually scales well with multhreading when the data is cached on a local disk. Unfortunately, bamsignals does not come with multithreading support :(

In fact, normR utilizes parallel::mcmapply() for counting with bamsignals::bamProfile() for treatment and control simulateously - which leads to your observation of two working threads:

normR/R/methods.R

Lines 131 to 143 in c5f8d1b

counts <- parallel::mcmapply(
bamsignals::bamProfile, bampath=c(treatment, control),
MoreArgs=list(gr=gr, binsize=countConfig@binsize,
mapq=countConfig@mapq,
shift=countConfig@shift,
paired.end=getFilter(countConfig),
tlenFilter=countConfig@tlenFilter,
filteredFlag=countConfig@filteredFlag,
verbose=FALSE),
mc.cores=procs, SIMPLIFY=FALSE
)
counts[[1]] <- unlist(as.list(counts[[1]]))
counts[[2]] <- unlist(as.list(counts[[2]]))

Alternatively, I routinely used a wrapper around bamsignals::bamCount for multithreading based on chromosomes and fit enrichR directly with obtained counts:

processByChromosome <- function(bam.files, gr, mapqual, procs) {
  require( bamsignals )
  x <- parallel::mclapply(
         X = as.character(unique(seqnames(gr))), 
         FUN = function(chunk) {
           gr.sub <- gr[ seqnames(gr) %in% chunk]
           lapply( bam.files, count, gr=gr.sub, mapqual=mapqual, paired.end=paired.end, verbose=F, paired.end.midpoint=paired.end)
  }, mc.cores=procs)
  invisible(
    list("treament"=unlist(lapply(x, "[[",1)), "control" =unlist(lapply(x, "[[",2)) 
  )
}
counts <- processByChromosome( bam.files=c(treatment.bampath, control.bampath), gr=gr, mapqual=mapqual, procs=procs)

Note that this code does not include all the bamCountConfig parameters. I see what I can do to add this feature to normR in the next days or so.

Best,

@your-highness your-highness self-assigned this Jun 19, 2018
@your-highness your-highness added this to the 1.0 milestone Jun 19, 2018
@steffenheyne
Copy link
Author

yeah I see, thanks!
...I somehow remembered that epicseg is quite fast on counting due to multithreading, but it uses exactly your suggestion mclapply()

@your-highness
Copy link
Owner

Working on this currently... Will be in normR v1.19 bioc release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants