CAFE5 run is taking forever #100
-
I've been running CAFE5 with around 12,000 Orthogroups on 4 species by using 32 CPUs. It's been a week and it is still searching for lambda parameters. I haven't played with any values for lambda or gamma, to be able to see the vanilla run with the CAFE. Is that the reason it is taking forever? or, is there a way to estimate the running time? In the docs, it says 200 Lambda values will be run for 10,000 orthogroups and I could only come to about 70 lambda values far in a week. Here is how I ran the code: Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 9 replies
-
No, it won't take a week. It's clearly locked up somewhere - likely in a thread race issue. Try it with just 8 threads to try to minimize that possibility. I would allow an hour per 1000 groups in that case, no more. |
Beta Was this translation helpful? Give feedback.
-
I see but still, it didn't help with my case. I am suspicious of my tree file as I've used the direct output from Orthofinder without trying to make it an ultrametric tree. I suppose CAFE5 can handle that, right? My tree file looks like this:
or should I fix the lambda value to 1 with -l parameter? |
Beta Was this translation helpful? Give feedback.
-
Your tree should be ultrametric, though the program will not throw an error if it's not. |
Beta Was this translation helpful? Give feedback.
-
Okay, I've made an ultrametric tree and used only 8 cores but still, Cafe has been running for 17 hours with 14 000 orthogroups. Now my ultrametric tree looks like this:
|
Beta Was this translation helpful? Give feedback.
-
Roughly how many lambdas has it generated, and how different are the lambdas near the end? If they aren't that different, there are parameters that would have it stop without trying to get the high precision that is the default. Alternatively, if you're happy with the lambdas it's creating, you can simply generate a reconstruction by specifying the lambda in the command line. |
Beta Was this translation helpful? Give feedback.
-
So far it has generated around 45 lambda values and doesn't seem to be approaching any particular value. Below are the lambda values near the end: Why do you think that is the case? I have one species with a highly duplicated genome and that might be the reason. In the CAFE tutorial, it is mentioned that filtration needs to be performed by a script: "Gene families that have large gene copy number variance can cause parameter estimates to be non-informative. You can remove gene families with large variance from your dataset, but we found that putting aside the gene families in which one or more species have ≥ 100 gene copies does the trick." Is the filtration process already involved in the CAFE5 script or do I have to run the script priorly? Actually, I'm not sure which lambda value I should choose. Isn't that supposed to be calculated by the software? or Can I just choose a lambda value by try and error? |
Beta Was this translation helpful? Give feedback.
-
Yes, the fact that you're getting "inf" likelihoods from the very beginning means that no likelihood can be calculated. You may have to further filter your data, before running CAFE. |
Beta Was this translation helpful? Give feedback.
-
I also have the same problem. It took one week and still was running. |
Beta Was this translation helpful? Give feedback.
Yes, the fact that you're getting "inf" likelihoods from the very beginning means that no likelihood can be calculated. You may have to further filter your data, before running CAFE.