-
Notifications
You must be signed in to change notification settings - Fork 91
Description
Hi,
I am posting it here to just get your openion before diving into the process.
Previously i had 10 samples which i needed to annotate, so i used Funannotate v1.8.17 the conventional way by installing it via conda and also installed all required lincenced programs. and it worked. took too long time on some steps specifically related to PASA.
Now i have to annotate around 115 genomes too. all are different. and if complete pipeline takes 4-days then overall it will take way too much time/days. Even using HPC where i can book upto 10 nodes, it will still take alot of time.
Things i am thinking of doing.
- During funannotate train raw reads nortmalization taked around 4 hrs for me, but i checked those files are generated without any kind of incorporation of genome-fasta, so my guess is that i can re-use those files. (funannotate does accept pre-normalized files too)
Question here: Will this be a good thing to do or it can introduce any biseness? - secondly, the most time taking step is
PASAas it usesSQL-litewhich is single-threaded. Funannotate documentation does suggest a way to useMySQLso PASA can work on multiple threads, but mySQL becomes problomatic on HPC due to security.
For that i was thinking, to create my own Docker/singularity container, where i install complete funannotate and set it up to use MySQL instead of SQL-lite. I am not sure if the pre-build container from OG-developer is based on MySQL.
I just need your thoughts, Will this be a good approach ?, Is it possible to do. As my guess is that i have to install all tools and dependencies manually without conda.
Or if there is already a container set up with MySQL available ?
YOur input/thoughts will be helpfull.
Regards