A quick addition of multiprocessing.Pool for higher throughput by cerebis · Pull Request #2 · LaboratorioBioinformatica/vHULK

cerebis · 2021-08-26T00:30:04Z

This change is not thoroughly tested nor implemented in a way that I would consider complete. I am providing it just as an exmaple, in case an maintainer wanted to fettle this into vHULK.

The intent is to remove the significant bottleneck that is Prokka annotation. Since Prokka is pretty lightweight, the viral genomes are small, and the tasks are embarrassingly parallel, it would be far faster to parallelise this step in an inverse fashion.

Therefore, I have just quickly hacked in a multiprocessing.Pool and call prokka with a single thread. I have also removed a bit of the print spam and set Prokka's verbose output to /dev/null. Because I seem to love staring at progress bars, I also added tqdm.

Instead of what looked to be hours, it now processes my 6200+ viral genomes in under 7 minutes with 50 cpus.

Now that the job is on to the step of hmmscan, I see that using the above strategy, it also would likely enjoy a similar speed up.

cerebis · 2021-08-26T00:47:39Z

I've done something similar to hmmscan with commit c189d61

On a small test set of 10 genomes, the run completed successfully.

cerebis added 2 commits August 26, 2021 10:22

quick addition of multiprocessing.Pool for higher throughput

8813cfe

added multiprcesssing pool to hmmscan step

c189d61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A quick addition of multiprocessing.Pool for higher throughput#2

A quick addition of multiprocessing.Pool for higher throughput#2
cerebis wants to merge 2 commits intoLaboratorioBioinformatica:masterfrom
cerebis:faster_prokka

cerebis commented Aug 26, 2021 •

edited

Loading

Uh oh!

cerebis commented Aug 26, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cerebis commented Aug 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cerebis commented Aug 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cerebis commented Aug 26, 2021 •

edited

Loading

cerebis commented Aug 26, 2021 •

edited

Loading