More info in the final table of text output #3

nojhan · 2021-01-29T14:32:39Z

I'm using irace as a part of a larger pipeline and I need to automatically parse the performances of the selected configurations.
Unfortunately, only the average of the best-so-far configuration is available in the text output.
Moreover, it is not displayed in the end table, which would be the most expected location, but needs to be parsed in log texts.

As a newcomer, I would have expected the following behavior:

separated text outputs: stderr for logs and stdout for the final table, so that I can easily redirect and parse raw data (without having to grep/tail the whole stream),
some row of the final table containing the average performance, for each returned configurations,
even better: all the runs, with the corresponding performance values of the selected configurations (I can average myself, and it would allow for more statistics).

The text was updated successfully, but these errors were encountered:

leslieperez · 2021-01-30T21:30:22Z

Hello Nojhan,

Our output in the console is focused on supervising the execution and has not being optimized for parsing, as you might noticed. To provide more information about the experiments, configurations, etc. we save an Rdata object that stores all the information. By default the file is named irace.Rdata.

Do you want just training data mean or maybe you are interested in test data? You can also provide a set of test instances so irace can execute the best configurations in those at the end of the execution. When that option is active, the test performance matrix is printed in the standard output.

Assuming that you are interested in training data and that you can execute Rscript in your pipeline, you can use these lines:

To get the full performance table of elite configurations:
Rscript -e "load('irace.Rdata');id_elites<-iraceResults\$allElites[[length(iraceResults\$allElites)]];iraceResults\$experiments[,id_elites]"
To get the mean of the performance of the elite configurations:
Rscript -e "load('irace.Rdata');id_elites<-iraceResults\$allElites[[length(iraceResults\$allElites)]];colMeans(iraceResults\$experiments[,id_elites])"
If you want only the best configuration:
Rscript -e "load('irace.Rdata');id<-iraceResults\$iterationElites[length(iraceResults\$iterationElites)];mean(iraceResults\$experiments[,id])"

Note that this line requires you to provide the right Rdata path to the load function.
The elite configurations are ordered, so the best one would be always the first.

You can also do more complex analysis if you are willing to spend some time preparing an Rscript.
There is actually a lot of information you can squeeze from configuration data ;-)
We can give you some pointers about where you can find the data inside the Rdata object,
feel free to contact us.

MLopez-Ibanez · 2021-01-30T22:43:12Z

If it helps, we could also add a helper command-line program to extract from irace.Rdata and dump into csv files whatever information you think could be useful.

As Leslie says, the standard output of irace is not really designed to be parsed and it is already too verbose (which probably slows irace down in time sensitive scenarios). Printing a possibly huge table would only make things worse. It would be better to dump the info needed on request given the log file. The log file contains much more information than we will ever be able to print to standard output. There are scenarios where irace makes hundreds of thousands of runs!

nojhan · 2021-01-31T10:09:47Z

Thanks for the detailed (and fast) answers!

I'm aiming at using irace as a substep of a much larger pipeline, hence I'm looking for maximizing performances.
In my setting, I don't really want to look at the detailed results from irace (there's too many runs of irace), I really just want to get back the best configuration and its estimated performance (I just trust irace). Ideally I would have guarantee that the final performance has been estimated on a given number of runs.

I'm targeting large scale budgets for irace (hundreds of thousands target runs), with several irace runs and large number of iraces processes running in parallel (possibly on the same computer).
To grab some speed, I usually start by withdrawing any useless disk access, which are classically a bottleneck (and think of the disk space, with so many log files), and by reducing the need to spawn slow software.
Hence, I was looking for avoiding large dump of data or calling R processes.

Usually, an approach that works well on such a setting is to stay close to classical POSIX' KISS CLI interfaces:

Input/output made on standard streams (command line arguments, standard input, standard/error outputs), not in files.
Log output on stderr, data output on stdout (so as to be able to redirect log to /dev/null and output to further parsing, without slow intermediates).
Ability to (almost) completely disable logs (a "quiet" mode).
Data output easily parsed with POSIX text tools (which are damn fasts).

I feel that a good compromise in the irace case would be:

Add a switch to disable the text log output (the Rdata having already been taken care of ;).
Output the text log on stderr.
Output (at least) a parsable final best configuration (along with its estimated performance) on stdout.
I've the feeling that there is already a way to ensure that the final configuration has been checked on at least a given number of runs?

Let's say that if I ever need to do something fancier (like using a robust estimator instead of the mean), then it would be worth spawning R on irace.Rdata anyway. In that case, yes, I guess having some example scripts on how to extract basic data in the (already impressive) doc would be a good addition.

I think this would not break the existing interface, while easing large scale use of irace.
Of course, I can do some of the code, if you feel it can be merged at the end.

MLopez-Ibanez · 2021-01-31T11:47:15Z

Having a --quiet mode would be welcome.
Separating the log output and data output would also be welcome but one has to be careful with the output of R itself. We started adding convenience functions irace.error() irace.warning() etc and probably using more of such functions would help with this.
We need to decide what "parsable" means. Another reason to NOT output the performance of the found configuration is that we don't want users to use this metric in their papers. We want to encourage a separation between training and testing. Would it be OK to add some text warning about this to the data output like we now explain the printing order?

If the code is contributed in the right way, I don't think there is nothing in principle against merging it. However, I would suggest to ask us early to take a look as the current codebase is a bit of a mess of styles but we want to move closer to https://style.tidyverse.org/

We also have several private branches with ongoing work so it would be better to merge work in chunks rather than having a big merge.

nojhan · 2021-01-31T18:11:29Z

We need to decide what "parsable" means.

I would say anything that's easily parsed with Python's line.split() or GNU's cut.
IMHO the best option is simple CSV tabular data.

Another reason to NOT output the performance of the found configuration is that we don't want users to use this metric in their papers. We want to encourage a separation between training and testing.

That's a fairly good point. However, I don't actually want to test if irace generalizes well, I want to test if an higher abstraction (embedding irace) does generalizes well. In that sense, I was planning to do cross-validation only at the upper level, and keep all irace within the learning bucket. If I start splitting learning data also at irace's level, I've got the feeling it'll be more difficult to track down were generalization is leaking.

I'll give it more thoughts anyway, thanks for the recall.

Would it be OK to add some text warning about this to the data output like we now explain the printing order?

I think it's definitely OK to let the user decide herself while having quality information and warnings.

f the code is contributed in the right way, I don't think there is nothing in principle against merging it. However, I would suggest to ask us early to take a look as the current codebase is a bit of a mess of styles […]

I honestly don't know when I'll have time to do it, but it's now on my TODO list anyway.
As for the code review, it's kind of mandatory on PR, so it should not be a problem.
My only concern is to know from which branch I should fork?

MLopez-Ibanez · 2021-02-01T20:49:49Z

Fork from master please.

MLopez-Ibanez · 2022-06-23T19:24:52Z

There is now a --quiet option that disables all output.

MLopez-Ibanez added the enhancement New feature or request label Feb 16, 2021

MLopez-Ibanez added the waiting for info The reporter must provide additional information label Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More info in the final table of text output #3

More info in the final table of text output #3

nojhan commented Jan 29, 2021

leslieperez commented Jan 30, 2021

MLopez-Ibanez commented Jan 30, 2021

nojhan commented Jan 31, 2021

MLopez-Ibanez commented Jan 31, 2021

nojhan commented Jan 31, 2021

MLopez-Ibanez commented Feb 1, 2021

MLopez-Ibanez commented Jun 23, 2022

More info in the final table of text output #3

More info in the final table of text output #3

Comments

nojhan commented Jan 29, 2021

leslieperez commented Jan 30, 2021

MLopez-Ibanez commented Jan 30, 2021

nojhan commented Jan 31, 2021

MLopez-Ibanez commented Jan 31, 2021

nojhan commented Jan 31, 2021

MLopez-Ibanez commented Feb 1, 2021

MLopez-Ibanez commented Jun 23, 2022