Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neighborhood based paralog splitting does not finish #2

Open
marade opened this issue Jan 16, 2020 · 2 comments
Open

Neighborhood based paralog splitting does not finish #2

marade opened this issue Jan 16, 2020 · 2 comments

Comments

@marade
Copy link

marade commented Jan 16, 2020

For ~200 ~6Mb bacteria genomes, the neighborhood based paralog splitting step alone is taking over 24 hours on a c5.2xlarge EC2 instance, while the previous steps finished in a timely fashion. Notably the CPU usage for the entire period is very low (less than 1%), while memory usage remains fairly constant at 40%, indicating some sort of CPU bottleneck.

@zheminzhou
Copy link
Owner

Hi, thank you for the report. This is certainly much much slower than my tests. According to your text, this is most likely to have a bottleneck in the I/O.

PEPPA writes and reads lots of data from the file system. This does not seem to be an issue in my test, even when I used a mounted netdrive. But I have not tested it in an AWS instance yet. I have updated PEPPA a little bit to optimize its I/O performance. However, please do not expect too much.

@marade
Copy link
Author

marade commented Jan 16, 2020

Thanks, I appreciate the prompt support. Perhaps you could add some sort of debugging capability so that the issue can be isolated? I'm not eager to run something for hours and not get an answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants