Add github workflow tests #716

nychiang · 2025-03-21T20:11:32Z

Test CI on github
So far I tested the following 4 combinations:

LpNorm problem with LCB
LpNorm problem with EI
Branin problem with LCB
Branin problem with EI

nychiang · 2025-03-21T20:25:32Z

@cnpetra @thartland This is just a draft. I switched off other pipelines to focus on this one. Some parameters still need to be tuned, but I just want to see if you have any comments about this setup.

cnpetra · 2025-03-21T21:30:10Z

src/Drivers/hiopbbpy/BODriver.py

+      mean_obj[prob_type][acq_type] /= num_repeat
+      print("Mean Opt.Obj for ", prob_type, "-", acq_type, mean_obj[prob_type][acq_type])
+
+      r_error = np.abs((mean_obj[prob_type][acq_type] - saved_sol[prob_type][acq_type])/saved_sol[prob_type][acq_type])


to guard against small denominators, the denominator should be 1+saved_sol[prob_type][acq_type]

the 0.5 threshold seems a bit arbitrary, isn't?

As an alternative, I would do 1000 runs with a generous amount of iterations to compute saved_sol (you mean saved_obj ?), in fact, its lowest value and its highest value. Then, in the test, I would do 10 runs with the same amount of iterations and declare failure if more than one (out of 10) saved_obj fall outside the interval [lowest_value-d, highest_value+d], where d = max(1e-6, 0.01*(highest_value-lowest_value))

we can revise this later depending on how accurate is.

I'm open to other suggestions as well.

Yes, I agree. I need to run a sufficiently large number of iterations to compute saved_obj. Currently, I repeat the process 20 times, but that's clearly insufficient --- see how it fails the CI test with a 50% threshold
Note that all the numbers (50% threshold, 20 BO iterations, and 20 repetitions) are arbitrary. My goal is simply to demonstrate how the process works and how fast a single test runs. From the same link above, you can see that completing 20 runs for these four setups takes over 11 minutes. (SLOW!)

Since I am implementing more test problems from SMT, I'm considering splitting the tests into some separate pipelines to run them in parallel. However, this approach differs from the pipeline tests for C++ HiOp, where all tests are run sequentially. What do you think?

I suggest that instead of using extreme values lowest_value and highest_value to determine d we do 1000 runs with 20 BO steps and save each of the estimated minimum values. This will just require storing 1000 scalar values. We can then estimate mean and variance of the random approximate minimizer (obtained from 20 BO iterations) from the 1000 runs and use that as saved reference data . I suggest we use the variance to determine d, the idea being that highest_value - lowest_value will grow indefinitely if we were to increase the number of runs (currently 1000). The variance will not grow indefinitely.

seems like we only need to save 4 scalar values, i.e., min, mean, max and variance.

Yes, 4 scalar values would be the only reference data that would need to be saved. However, we will need to save all 1000 approximate minimizers to then compute min, mean, max, and variance of the results from 1000 runs from 20 BO iterations.

yes, that's not a problem. I have already done 1000 run over the weekend to compute the mean/max/min values.
I just correct the code, and now it uses Cosmin's suggestion to define if a test is okay or failed.

src/Drivers/hiopbbpy/BODriver.py

nychiang · 2025-03-26T18:04:31Z

@thartland @cnpetra I saved 1000 optimal y values in a separate file, and used 5% percentile from each side to determine the threshold. Please have a look of the latest commit. Thanks!

thartland · 2025-03-26T23:02:48Z

src/Drivers/hiopbbpy/BODriver.py

+saved_min_obj = {"LpNorm": {"LCB": 0.0007586314501994839, "EI": 0.002094016049616341}, "Branin": {"LCB": 0.3979820338569908, "EI": 0.39789916461969455}}
+saved_mean_obj = {"LpNorm": {"LCB": 0.018774638321851504, "EI": 0.11583915178648867}, "Branin": {"LCB": 0.5079001079219421, "EI": 0.4377466109837465}}
+saved_max_obj = {"LpNorm": {"LCB": 0.0755173754382861, "EI": 0.4175676394969743}, "Branin": {"LCB": 1.107240543567082, "EI": 0.7522382699410031}}
+saved_yopt = np.load("yopt_20iter_1000run.npy",allow_pickle=True).item()


This file is not saved to this repo and so if I were to clone the repo and try to run this file it would fail due to not being able to find the yopt_20iter_1000run.npy. Perhaps though, it doesn't need to be saved. Offline you can determine the 5th and 95th percentile.

I apologize, I see the file now. Disregard my previous comment about the .npy file not being present.

thartland · 2025-03-26T23:06:22Z

src/Drivers/hiopbbpy/BODriver.py

+print("Summary:")
+for prob_type in prob_type_l:
+   for acq_type in acq_type_l:
+      allowed_error = max(1e-6, 0.01*(saved_max_obj[prob_type][acq_type]-saved_min_obj[prob_type][acq_type]))


Do we still need allowed_error? My opinion is that we do not need and we can just use lb = left_value and ub = right_value. Also, perhaps we can make the language more uniform, that is use either left or lower but not both, similarly for upper for ub and right.

yes, allowed_error is not required anymore.

thartland · 2025-03-26T23:26:39Z

src/Drivers/hiopbbpy/BODriver.py

+      is_failed = (y_opt[prob_type][acq_type] < lb) | (y_opt[prob_type][acq_type] > ub)
+      num_fail = np.sum(is_failed)
+
+      if num_fail > 1:


Is num_fail > 1 that unlikely to occur? The probability of failure is 10% having used 5th and 95th percentiles. The probability that the first two runs of ten experiments fail is 0.43% but the probability that any two of ten experiments fail is 19.4%. This may seem unintuitive but the probability that all succeed is 0.9^10 = 34.9%.

Using a binomial distribution calculation I see that their is a 26.4% likelihood of 2 or more failures of 10 when the likelihood of failure for a single event is 10%.

I'd suggest we use something like num_fail > 7 which has a 3.7e-7 probability or num_fail > 8 which has a 9.1e-9 probability.

This is a good point! I also think num_fail > 1 is too easy to reach, i.e., the probability is 19.4%, and all we can do is to rerun the test till it succeeds. instead of using num_fail > 7, I am thinking of using 1% percentile, where any two of ten experiments fail is 0.415%

If a success is being between the 1st and 99th percentile then there is a 98% chance of success in a single instance of this 20 BO iteration run. The chance of 2 or more failures of 10 is 1.6%. I don't know exactly what the target probability we should be aiming for here, but I think it is better to use 3 or more failures of 10 as the test criteria as it is is 0.086% likely to occur.

I agree, either 3 or more failures to be within 1st and 99th or 2 or more failures within 0.5 and 99.5 percentiles (probability is 0.00426). I think the first is safer given you used only 1k samples to compute the percentiles.

nychiang · 2025-03-28T03:43:39Z

@cnpetra @thartland comments addressed

…crowding, obtaining the path of the python __file__ in order that we can load the saved data according to the python file and not from where the file is run from. This enables us to be able to run (from hiop base dir) python src/Drivers/hiopbbpy/BODriver.py and (from hiop/src/Drivers/hiopbbpy) python BODriver.py. Previously there was an error when I tried running the latter case.

thartland

Great work @nychiang

I made one final change so that we can run the command python src/Drivers/hiopbbpy/BODriver.py from the hiop base directory and python BODriver.py from the src/Drivers/hiopbbpy directory and not have issues with the path to access the data. I also moved the saved data out of the hiop base directory to avoid overcrowding the hiop base directory with data files.

nychiang · 2025-03-28T18:10:35Z

@thartland Thanks a lot for your help!

nychiang added 2 commits March 21, 2025 13:10

test CI

78539d4

remove ipopt as a dependency

c23a66d

nychiang requested review from cnpetra and thartland March 21, 2025 20:16

update some parameters

15ed6d8

cnpetra reviewed Mar 21, 2025

View reviewed changes

address Cosmin's comments

0bbf840

thartland reviewed Mar 24, 2025

View reviewed changes

src/Drivers/hiopbbpy/BODriver.py Show resolved Hide resolved

thartland reviewed Mar 24, 2025

View reviewed changes

src/Drivers/hiopbbpy/BODriver.py Show resolved Hide resolved

nychiang added 2 commits March 24, 2025 10:06

test CI

ae8c178

address comments

c611084

nychiang force-pushed the hiopbbpy-v0.03-dev branch from e2f5c0b to c611084 Compare March 24, 2025 18:31

nychiang added 2 commits March 25, 2025 08:55

fix bug, turn on CI

36ab06d

save y_opt from 1000 runs

3b2ab5d

nychiang marked this pull request as ready for review March 26, 2025 17:59

thartland reviewed Mar 26, 2025

View reviewed changes

address comments

69ed15a

cnpetra self-requested a review March 28, 2025 13:35

cnpetra approved these changes Mar 28, 2025

View reviewed changes

thartland approved these changes Mar 28, 2025

View reviewed changes

cnpetra merged commit f2c38f7 into develop Mar 30, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add github workflow tests #716

Add github workflow tests #716

nychiang commented Mar 21, 2025 •

edited

Loading

nychiang commented Mar 21, 2025

cnpetra Mar 21, 2025 •

edited

Loading

nychiang Mar 22, 2025

thartland Mar 24, 2025 •

edited

Loading

nychiang Mar 24, 2025

thartland Mar 24, 2025 •

edited

Loading

nychiang Mar 24, 2025

nychiang commented Mar 26, 2025

thartland Mar 26, 2025

thartland Mar 26, 2025

thartland Mar 26, 2025 •

edited

Loading

nychiang Mar 27, 2025

thartland Mar 26, 2025 •

edited

Loading

nychiang Mar 27, 2025

thartland Mar 27, 2025 •

edited

Loading

cnpetra Mar 27, 2025

nychiang commented Mar 28, 2025

thartland left a comment

nychiang commented Mar 28, 2025

Add github workflow tests #716

Add github workflow tests #716

Conversation

nychiang commented Mar 21, 2025 • edited Loading

nychiang commented Mar 21, 2025

cnpetra Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thartland Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thartland Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nychiang commented Mar 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thartland Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thartland Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thartland Mar 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nychiang commented Mar 28, 2025

thartland left a comment

Choose a reason for hiding this comment

nychiang commented Mar 28, 2025

nychiang commented Mar 21, 2025 •

edited

Loading

cnpetra Mar 21, 2025 •

edited

Loading

thartland Mar 24, 2025 •

edited

Loading

thartland Mar 24, 2025 •

edited

Loading

thartland Mar 26, 2025 •

edited

Loading

thartland Mar 26, 2025 •

edited

Loading

thartland Mar 27, 2025 •

edited

Loading