Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add github workflow tests #716

Merged
merged 10 commits into from
Mar 30, 2025
Merged

Add github workflow tests #716

merged 10 commits into from
Mar 30, 2025

Conversation

nychiang
Copy link
Collaborator

@nychiang nychiang commented Mar 21, 2025

Test CI on github
So far I tested the following 4 combinations:

  • LpNorm problem with LCB
  • LpNorm problem with EI
  • Branin problem with LCB
  • Branin problem with EI

@nychiang nychiang requested review from cnpetra and thartland March 21, 2025 20:16
@nychiang
Copy link
Collaborator Author

@cnpetra @thartland This is just a draft. I switched off other pipelines to focus on this one. Some parameters still need to be tuned, but I just want to see if you have any comments about this setup.

mean_obj[prob_type][acq_type] /= num_repeat
print("Mean Opt.Obj for ", prob_type, "-", acq_type, mean_obj[prob_type][acq_type])

r_error = np.abs((mean_obj[prob_type][acq_type] - saved_sol[prob_type][acq_type])/saved_sol[prob_type][acq_type])
Copy link
Collaborator

@cnpetra cnpetra Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to guard against small denominators, the denominator should be 1+saved_sol[prob_type][acq_type]

the 0.5 threshold seems a bit arbitrary, isn't?

As an alternative, I would do 1000 runs with a generous amount of iterations to compute saved_sol (you mean saved_obj ?), in fact, its lowest value and its highest value. Then, in the test, I would do 10 runs with the same amount of iterations and declare failure if more than one (out of 10) saved_obj fall outside the interval [lowest_value-d, highest_value+d], where d = max(1e-6, 0.01*(highest_value-lowest_value))

we can revise this later depending on how accurate is.

I'm open to other suggestions as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I need to run a sufficiently large number of iterations to compute saved_obj. Currently, I repeat the process 20 times, but that's clearly insufficient --- see how it fails the CI test with a 50% threshold
Note that all the numbers (50% threshold, 20 BO iterations, and 20 repetitions) are arbitrary. My goal is simply to demonstrate how the process works and how fast a single test runs. From the same link above, you can see that completing 20 runs for these four setups takes over 11 minutes. (SLOW!)

Since I am implementing more test problems from SMT, I'm considering splitting the tests into some separate pipelines to run them in parallel. However, this approach differs from the pipeline tests for C++ HiOp, where all tests are run sequentially. What do you think?

Copy link
Collaborator

@thartland thartland Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that instead of using extreme values lowest_value and highest_value to determine d we do 1000 runs with 20 BO steps and save each of the estimated minimum values. This will just require storing 1000 scalar values. We can then estimate mean and variance of the random approximate minimizer (obtained from 20 BO iterations) from the 1000 runs and use that as saved reference data . I suggest we use the variance to determine d, the idea being that highest_value - lowest_value will grow indefinitely if we were to increase the number of runs (currently 1000). The variance will not grow indefinitely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we only need to save 4 scalar values, i.e., min, mean, max and variance.

Copy link
Collaborator

@thartland thartland Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 4 scalar values would be the only reference data that would need to be saved. However, we will need to save all 1000 approximate minimizers to then compute min, mean, max, and variance of the results from 1000 runs from 20 BO iterations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's not a problem. I have already done 1000 run over the weekend to compute the mean/max/min values.
I just correct the code, and now it uses Cosmin's suggestion to define if a test is okay or failed.

@nychiang nychiang force-pushed the hiopbbpy-v0.03-dev branch from e2f5c0b to c611084 Compare March 24, 2025 18:31
@nychiang nychiang marked this pull request as ready for review March 26, 2025 17:59
@nychiang
Copy link
Collaborator Author

@thartland @cnpetra I saved 1000 optimal y values in a separate file, and used 5% percentile from each side to determine the threshold. Please have a look of the latest commit. Thanks!

saved_min_obj = {"LpNorm": {"LCB": 0.0007586314501994839, "EI": 0.002094016049616341}, "Branin": {"LCB": 0.3979820338569908, "EI": 0.39789916461969455}}
saved_mean_obj = {"LpNorm": {"LCB": 0.018774638321851504, "EI": 0.11583915178648867}, "Branin": {"LCB": 0.5079001079219421, "EI": 0.4377466109837465}}
saved_max_obj = {"LpNorm": {"LCB": 0.0755173754382861, "EI": 0.4175676394969743}, "Branin": {"LCB": 1.107240543567082, "EI": 0.7522382699410031}}
saved_yopt = np.load("yopt_20iter_1000run.npy",allow_pickle=True).item()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not saved to this repo and so if I were to clone the repo and try to run this file it would fail due to not being able to find the yopt_20iter_1000run.npy. Perhaps though, it doesn't need to be saved. Offline you can determine the 5th and 95th percentile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I see the file now. Disregard my previous comment about the .npy file not being present.

print("Summary:")
for prob_type in prob_type_l:
for acq_type in acq_type_l:
allowed_error = max(1e-6, 0.01*(saved_max_obj[prob_type][acq_type]-saved_min_obj[prob_type][acq_type]))
Copy link
Collaborator

@thartland thartland Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need allowed_error? My opinion is that we do not need and we can just use lb = left_value and ub = right_value. Also, perhaps we can make the language more uniform, that is use either left or lower but not both, similarly for upper for ub and right.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, allowed_error is not required anymore.

is_failed = (y_opt[prob_type][acq_type] < lb) | (y_opt[prob_type][acq_type] > ub)
num_fail = np.sum(is_failed)

if num_fail > 1:
Copy link
Collaborator

@thartland thartland Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is num_fail > 1 that unlikely to occur? The probability of failure is 10% having used 5th and 95th percentiles. The probability that the first two runs of ten experiments fail is 0.43% but the probability that any two of ten experiments fail is 19.4%. This may seem unintuitive but the probability that all succeed is 0.9^10 = 34.9%.

Using a binomial distribution calculation I see that their is a 26.4% likelihood of 2 or more failures of 10 when the likelihood of failure for a single event is 10%.

I'd suggest we use something like num_fail > 7 which has a 3.7e-7 probability or num_fail > 8 which has a 9.1e-9 probability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point! I also think num_fail > 1 is too easy to reach, i.e., the probability is 19.4%, and all we can do is to rerun the test till it succeeds. instead of using num_fail > 7, I am thinking of using 1% percentile, where any two of ten experiments fail is 0.415%

Copy link
Collaborator

@thartland thartland Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a success is being between the 1st and 99th percentile then there is a 98% chance of success in a single instance of this 20 BO iteration run. The chance of 2 or more failures of 10 is 1.6%. I don't know exactly what the target probability we should be aiming for here, but I think it is better to use 3 or more failures of 10 as the test criteria as it is is 0.086% likely to occur.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, either 3 or more failures to be within 1st and 99th or 2 or more failures within 0.5 and 99.5 percentiles (probability is 0.00426). I think the first is safer given you used only 1k samples to compute the percentiles.

@nychiang
Copy link
Collaborator Author

@cnpetra @thartland comments addressed

@cnpetra cnpetra self-requested a review March 28, 2025 13:35
…crowding, obtaining the path of the python __file__ in order that we can load the saved data according to the python file and not from where the file is run from. This enables us to be able to run (from hiop base dir) python src/Drivers/hiopbbpy/BODriver.py and (from hiop/src/Drivers/hiopbbpy) python BODriver.py. Previously there was an error when I tried running the latter case.
Copy link
Collaborator

@thartland thartland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @nychiang

I made one final change so that we can run the command python src/Drivers/hiopbbpy/BODriver.py from the hiop base directory and python BODriver.py from the src/Drivers/hiopbbpy directory and not have issues with the path to access the data. I also moved the saved data out of the hiop base directory to avoid overcrowding the hiop base directory with data files.

@nychiang
Copy link
Collaborator Author

@thartland Thanks a lot for your help!

@cnpetra cnpetra merged commit f2c38f7 into develop Mar 30, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants