-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More info in the final table of text output #3
Comments
Hello Nojhan, Our output in the console is focused on supervising the execution and has not being optimized for parsing, as you might noticed. To provide more information about the experiments, configurations, etc. we save an Rdata object that stores all the information. By default the file is named irace.Rdata. Do you want just training data mean or maybe you are interested in test data? You can also provide a set of test instances so irace can execute the best configurations in those at the end of the execution. When that option is active, the test performance matrix is printed in the standard output. Assuming that you are interested in training data and that you can execute Rscript in your pipeline, you can use these lines:
Note that this line requires you to provide the right Rdata path to the load function. You can also do more complex analysis if you are willing to spend some time preparing an Rscript. |
If it helps, we could also add a helper command-line program to extract from irace.Rdata and dump into csv files whatever information you think could be useful. As Leslie says, the standard output of irace is not really designed to be parsed and it is already too verbose (which probably slows irace down in time sensitive scenarios). Printing a possibly huge table would only make things worse. It would be better to dump the info needed on request given the log file. The log file contains much more information than we will ever be able to print to standard output. There are scenarios where irace makes hundreds of thousands of runs! |
Thanks for the detailed (and fast) answers! I'm aiming at using irace as a substep of a much larger pipeline, hence I'm looking for maximizing performances. I'm targeting large scale budgets for irace (hundreds of thousands target runs), with several irace runs and large number of iraces processes running in parallel (possibly on the same computer). Usually, an approach that works well on such a setting is to stay close to classical POSIX' KISS CLI interfaces:
I feel that a good compromise in the irace case would be:
Let's say that if I ever need to do something fancier (like using a robust estimator instead of the mean), then it would be worth spawning R on irace.Rdata anyway. In that case, yes, I guess having some example scripts on how to extract basic data in the (already impressive) doc would be a good addition. I think this would not break the existing interface, while easing large scale use of irace. |
If the code is contributed in the right way, I don't think there is nothing in principle against merging it. However, I would suggest to ask us early to take a look as the current codebase is a bit of a mess of styles but we want to move closer to https://style.tidyverse.org/ We also have several private branches with ongoing work so it would be better to merge work in chunks rather than having a big merge. |
I would say anything that's easily parsed with Python's
That's a fairly good point. However, I don't actually want to test if irace generalizes well, I want to test if an higher abstraction (embedding irace) does generalizes well. In that sense, I was planning to do cross-validation only at the upper level, and keep all irace within the learning bucket. If I start splitting learning data also at irace's level, I've got the feeling it'll be more difficult to track down were generalization is leaking. I'll give it more thoughts anyway, thanks for the recall.
I think it's definitely OK to let the user decide herself while having quality information and warnings.
I honestly don't know when I'll have time to do it, but it's now on my TODO list anyway. |
Fork from master please. |
There is now a |
I'm using irace as a part of a larger pipeline and I need to automatically parse the performances of the selected configurations.
Unfortunately, only the average of the best-so-far configuration is available in the text output.
Moreover, it is not displayed in the end table, which would be the most expected location, but needs to be parsed in log texts.
As a newcomer, I would have expected the following behavior:
stderr
for logs andstdout
for the final table, so that I can easily redirect and parse raw data (without having to grep/tail the whole stream),The text was updated successfully, but these errors were encountered: