Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More info in the final table of text output #3

Open
nojhan opened this issue Jan 29, 2021 · 7 comments
Open

More info in the final table of text output #3

nojhan opened this issue Jan 29, 2021 · 7 comments
Labels
enhancement New feature or request waiting for info The reporter must provide additional information

Comments

@nojhan
Copy link

nojhan commented Jan 29, 2021

I'm using irace as a part of a larger pipeline and I need to automatically parse the performances of the selected configurations.
Unfortunately, only the average of the best-so-far configuration is available in the text output.
Moreover, it is not displayed in the end table, which would be the most expected location, but needs to be parsed in log texts.

As a newcomer, I would have expected the following behavior:

  • separated text outputs: stderr for logs and stdout for the final table, so that I can easily redirect and parse raw data (without having to grep/tail the whole stream),
  • some row of the final table containing the average performance, for each returned configurations,
  • even better: all the runs, with the corresponding performance values of the selected configurations (I can average myself, and it would allow for more statistics).
@leslieperez
Copy link
Collaborator

Hello Nojhan,

Our output in the console is focused on supervising the execution and has not being optimized for parsing, as you might noticed. To provide more information about the experiments, configurations, etc. we save an Rdata object that stores all the information. By default the file is named irace.Rdata.

Do you want just training data mean or maybe you are interested in test data? You can also provide a set of test instances so irace can execute the best configurations in those at the end of the execution. When that option is active, the test performance matrix is printed in the standard output.

Assuming that you are interested in training data and that you can execute Rscript in your pipeline, you can use these lines:

  • To get the full performance table of elite configurations:
    Rscript -e "load('irace.Rdata');id_elites<-iraceResults\$allElites[[length(iraceResults\$allElites)]];iraceResults\$experiments[,id_elites]"

  • To get the mean of the performance of the elite configurations:
    Rscript -e "load('irace.Rdata');id_elites<-iraceResults\$allElites[[length(iraceResults\$allElites)]];colMeans(iraceResults\$experiments[,id_elites])"

  • If you want only the best configuration:
    Rscript -e "load('irace.Rdata');id<-iraceResults\$iterationElites[length(iraceResults\$iterationElites)];mean(iraceResults\$experiments[,id])"

Note that this line requires you to provide the right Rdata path to the load function.
The elite configurations are ordered, so the best one would be always the first.

You can also do more complex analysis if you are willing to spend some time preparing an Rscript.
There is actually a lot of information you can squeeze from configuration data ;-)
We can give you some pointers about where you can find the data inside the Rdata object,
feel free to contact us.

@MLopez-Ibanez
Copy link
Owner

If it helps, we could also add a helper command-line program to extract from irace.Rdata and dump into csv files whatever information you think could be useful.

As Leslie says, the standard output of irace is not really designed to be parsed and it is already too verbose (which probably slows irace down in time sensitive scenarios). Printing a possibly huge table would only make things worse. It would be better to dump the info needed on request given the log file. The log file contains much more information than we will ever be able to print to standard output. There are scenarios where irace makes hundreds of thousands of runs!

@nojhan
Copy link
Author

nojhan commented Jan 31, 2021

Thanks for the detailed (and fast) answers!

I'm aiming at using irace as a substep of a much larger pipeline, hence I'm looking for maximizing performances.
In my setting, I don't really want to look at the detailed results from irace (there's too many runs of irace), I really just want to get back the best configuration and its estimated performance (I just trust irace). Ideally I would have guarantee that the final performance has been estimated on a given number of runs.

I'm targeting large scale budgets for irace (hundreds of thousands target runs), with several irace runs and large number of iraces processes running in parallel (possibly on the same computer).
To grab some speed, I usually start by withdrawing any useless disk access, which are classically a bottleneck (and think of the disk space, with so many log files), and by reducing the need to spawn slow software.
Hence, I was looking for avoiding large dump of data or calling R processes.

Usually, an approach that works well on such a setting is to stay close to classical POSIX' KISS CLI interfaces:

  • Input/output made on standard streams (command line arguments, standard input, standard/error outputs), not in files.
  • Log output on stderr, data output on stdout (so as to be able to redirect log to /dev/null and output to further parsing, without slow intermediates).
  • Ability to (almost) completely disable logs (a "quiet" mode).
  • Data output easily parsed with POSIX text tools (which are damn fasts).

I feel that a good compromise in the irace case would be:

  1. Add a switch to disable the text log output (the Rdata having already been taken care of ;).
  2. Output the text log on stderr.
  3. Output (at least) a parsable final best configuration (along with its estimated performance) on stdout.
  4. I've the feeling that there is already a way to ensure that the final configuration has been checked on at least a given number of runs?

Let's say that if I ever need to do something fancier (like using a robust estimator instead of the mean), then it would be worth spawning R on irace.Rdata anyway. In that case, yes, I guess having some example scripts on how to extract basic data in the (already impressive) doc would be a good addition.

I think this would not break the existing interface, while easing large scale use of irace.
Of course, I can do some of the code, if you feel it can be merged at the end.

@MLopez-Ibanez
Copy link
Owner

  • Having a --quiet mode would be welcome.
  • Separating the log output and data output would also be welcome but one has to be careful with the output of R itself. We started adding convenience functions irace.error() irace.warning() etc and probably using more of such functions would help with this.
  • We need to decide what "parsable" means. Another reason to NOT output the performance of the found configuration is that we don't want users to use this metric in their papers. We want to encourage a separation between training and testing. Would it be OK to add some text warning about this to the data output like we now explain the printing order?

If the code is contributed in the right way, I don't think there is nothing in principle against merging it. However, I would suggest to ask us early to take a look as the current codebase is a bit of a mess of styles but we want to move closer to https://style.tidyverse.org/

We also have several private branches with ongoing work so it would be better to merge work in chunks rather than having a big merge.

@nojhan
Copy link
Author

nojhan commented Jan 31, 2021

We need to decide what "parsable" means.

I would say anything that's easily parsed with Python's line.split() or GNU's cut.
IMHO the best option is simple CSV tabular data.

Another reason to NOT output the performance of the found configuration is that we don't want users to use this metric in their papers. We want to encourage a separation between training and testing.

That's a fairly good point. However, I don't actually want to test if irace generalizes well, I want to test if an higher abstraction (embedding irace) does generalizes well. In that sense, I was planning to do cross-validation only at the upper level, and keep all irace within the learning bucket. If I start splitting learning data also at irace's level, I've got the feeling it'll be more difficult to track down were generalization is leaking.

I'll give it more thoughts anyway, thanks for the recall.

Would it be OK to add some text warning about this to the data output like we now explain the printing order?

I think it's definitely OK to let the user decide herself while having quality information and warnings.

f the code is contributed in the right way, I don't think there is nothing in principle against merging it. However, I would suggest to ask us early to take a look as the current codebase is a bit of a mess of styles […]

I honestly don't know when I'll have time to do it, but it's now on my TODO list anyway.
As for the code review, it's kind of mandatory on PR, so it should not be a problem.
My only concern is to know from which branch I should fork?

@MLopez-Ibanez
Copy link
Owner

Fork from master please.

@MLopez-Ibanez MLopez-Ibanez added the enhancement New feature or request label Feb 16, 2021
@MLopez-Ibanez MLopez-Ibanez added the waiting for info The reporter must provide additional information label Apr 30, 2022
@MLopez-Ibanez
Copy link
Owner

There is now a --quiet option that disables all output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request waiting for info The reporter must provide additional information
Projects
None yet
Development

No branches or pull requests

3 participants