-
Thanks for the tool. I have a tree with 350 species and family sizes up to 100'000. When I run cafe5, for example with
I quickly run out of RAM. I started removing the largest families and found that a family size of about 22'000 needs only around 450 Mb of RAM. With a family of size 42'000, it already needs about 13 Gb. The largest with 93'000 members would need more than 120 Gb, no clue how much. The scaling looks somehow weird to me... Is this behaving correctly? Well, it seems that it would anyway run for a very long time. What tree and family sizes are still reasonable in your opinion? Best, Marc |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
I don't see how those could actually be gene families. I'm not aware of a species that even has 35,000 genes. Are you using gene families in some different sense? CAFE uses matrices for its calculations that are size |
Beta Was this translation helpful? Give feedback.
-
Hi Ben Thanks for the swift answer. Well, it's 35'000 genes in 350 species, so just 100 per species. Receptor-like proteins (trans-membrane domain and another domain fused to it, for example LRRs). "With 120 Gb, no clue how much" I meant that my 128 GBs were not enough to even start calculating - it crashed after filling up everything. Ok, if you have max_family_size^2, doubling the family size should just quadruple the memory requirements. However, if I roughly double the size from 20 k to 40 k, I get a 40-fold increase from 400 MB to 12 GB (and doubling it again does not result in 48 GB but crashes at 128 GB). That's why I was wondering whether there's a mistake with the scaling at this size. On the other hand, it doesn't really matter because it would probably run way too long anyway. Reducing the number of species would probably be smarter :) Best, Marc |
Beta Was this translation helpful? Give feedback.
-
The |
Beta Was this translation helpful? Give feedback.
-
I see, based on the single largest value in the family file I would have expected around a six-fold increase. Thanks, will try with less :) |
Beta Was this translation helpful? Give feedback.
The
max_family_size
I referred to is the single largest value in the families file - those values are never added together for any purpose. If your max family size is around 100, that should not take up much memory at all. So perhaps it is just the number of species that is taking up the memory.