Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aves use case scalability challenge #8

Open
nfranz opened this issue Dec 23, 2015 · 0 comments
Open

Aves use case scalability challenge #8

nfranz opened this issue Dec 23, 2015 · 0 comments

Comments

@nfranz
Copy link
Member

nfranz commented Dec 23, 2015

For the current Aves use case, we have single, working input datasets for the entire use case that extend from the root (Class) to the Order level, and also to the Family level. However, at present we seemingly cannot scale to the species level with a single input file and using Euler/X default reasoners, meaning that we need to partition that root-to-species level file into two complementary datasets, provisionally called (each of these is consistent and "solvable"):

(1) 2015-Pala_Neoa_Grade_Species_Complete.txt and
(2) 2015-Acci_Aust_Clade_Species_Complete.txt

Originally there were three species-level partitions (each of these also completes well):

(A) 2015-Pala_Gall_Grade-Species-Complete.txt => 6 kb
(B) 2015-Neoaves-Part-Species-Complete.txt => 22 kb
(C) 2015-Acci_Aust_Clade-Species-Complete.txt => 23 kb

(2) and (C) above are identifical.
(1) above is a merge of (A) and (B), with 174 x 409 and 71,166 MIR. Running (1) on my laptop with "euler2 align" took 10.5 hours but was successful. However, running a merge of (B) and (C) above - called..

(3) 2015-Neoaves-All-Species-Cannot-Process.txt

..produced an "inconsistent/repair" output, I believe also after more than 8-10 hours (overnight). This might mean - assuming that the (3) merge is actually consistent (it should be), that our scalability limits are currently in the interval/complexity range between (1) and (3).

The aforementioned input files, and the successful 10.5 hour run of (1) are in the following DropBox folder:

Dropbox/Euler-Runs/BirdPhylogenies/Scalability-Challenge

Issues:

(i) Can others replicate these results?
(ii) Can we overcome the challenge of scaling to the level of complexity of (3), either with conventional or with custom reasoners?
(iii) Notice that "no coverage" is used 85 times in (3); to account for differential species-level sampling across the two input trees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant