-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add github workflow tests #716
Conversation
@cnpetra @thartland This is just a draft. I switched off other pipelines to focus on this one. Some parameters still need to be tuned, but I just want to see if you have any comments about this setup. |
src/Drivers/hiopbbpy/BODriver.py
Outdated
mean_obj[prob_type][acq_type] /= num_repeat | ||
print("Mean Opt.Obj for ", prob_type, "-", acq_type, mean_obj[prob_type][acq_type]) | ||
|
||
r_error = np.abs((mean_obj[prob_type][acq_type] - saved_sol[prob_type][acq_type])/saved_sol[prob_type][acq_type]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to guard against small denominators, the denominator should be 1+saved_sol[prob_type][acq_type]
the 0.5
threshold seems a bit arbitrary, isn't?
As an alternative, I would do 1000 runs with a generous amount of iterations to compute saved_sol
(you mean saved_obj
?), in fact, its lowest value and its highest value. Then, in the test, I would do 10 runs with the same amount of iterations and declare failure if more than one (out of 10) saved_obj
fall outside the interval [lowest_value-d, highest_value+d]
, where d = max(1e-6, 0.01*(highest_value-lowest_value))
we can revise this later depending on how accurate is.
I'm open to other suggestions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I need to run a sufficiently large number of iterations to compute saved_obj
. Currently, I repeat the process 20 times, but that's clearly insufficient --- see how it fails the CI test with a 50% threshold
Note that all the numbers (50% threshold, 20 BO iterations, and 20 repetitions) are arbitrary. My goal is simply to demonstrate how the process works and how fast a single test runs. From the same link above, you can see that completing 20 runs for these four setups takes over 11 minutes. (SLOW!)
Since I am implementing more test problems from SMT, I'm considering splitting the tests into some separate pipelines to run them in parallel. However, this approach differs from the pipeline tests for C++ HiOp, where all tests are run sequentially. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that instead of using extreme values lowest_value
and highest_value
to determine d
we do 1000
runs with 20 BO steps and save each of the estimated minimum values. This will just require storing 1000
scalar values. We can then estimate mean and variance of the random approximate minimizer (obtained from 20
BO iterations) from the 1000
runs and use that as saved
reference data . I suggest we use the variance to determine d
, the idea being that highest_value - lowest_value
will grow indefinitely if we were to increase the number of runs (currently 1000
). The variance will not grow indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we only need to save 4 scalar values, i.e., min, mean, max and variance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 4 scalar values would be the only reference data that would need to be saved. However, we will need to save all 1000
approximate minimizers to then compute min, mean, max, and variance of the results from 1000
runs from 20
BO iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's not a problem. I have already done 1000 run over the weekend to compute the mean/max/min values.
I just correct the code, and now it uses Cosmin's suggestion to define if a test is okay or failed.
e2f5c0b
to
c611084
Compare
@thartland @cnpetra I saved 1000 optimal y values in a separate file, and used 5% percentile from each side to determine the threshold. Please have a look of the latest commit. Thanks! |
src/Drivers/hiopbbpy/BODriver.py
Outdated
saved_min_obj = {"LpNorm": {"LCB": 0.0007586314501994839, "EI": 0.002094016049616341}, "Branin": {"LCB": 0.3979820338569908, "EI": 0.39789916461969455}} | ||
saved_mean_obj = {"LpNorm": {"LCB": 0.018774638321851504, "EI": 0.11583915178648867}, "Branin": {"LCB": 0.5079001079219421, "EI": 0.4377466109837465}} | ||
saved_max_obj = {"LpNorm": {"LCB": 0.0755173754382861, "EI": 0.4175676394969743}, "Branin": {"LCB": 1.107240543567082, "EI": 0.7522382699410031}} | ||
saved_yopt = np.load("yopt_20iter_1000run.npy",allow_pickle=True).item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not saved to this repo and so if I were to clone the repo and try to run this file it would fail due to not being able to find the yopt_20iter_1000run.npy
. Perhaps though, it doesn't need to be saved. Offline you can determine the 5th and 95th percentile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, I see the file now. Disregard my previous comment about the .npy
file not being present.
src/Drivers/hiopbbpy/BODriver.py
Outdated
print("Summary:") | ||
for prob_type in prob_type_l: | ||
for acq_type in acq_type_l: | ||
allowed_error = max(1e-6, 0.01*(saved_max_obj[prob_type][acq_type]-saved_min_obj[prob_type][acq_type])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need allowed_error
? My opinion is that we do not need and we can just use lb = left_value
and ub = right_value
. Also, perhaps we can make the language more uniform, that is use either left
or lower
but not both, similarly for upper
for ub
and right
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, allowed_error is not required anymore.
src/Drivers/hiopbbpy/BODriver.py
Outdated
is_failed = (y_opt[prob_type][acq_type] < lb) | (y_opt[prob_type][acq_type] > ub) | ||
num_fail = np.sum(is_failed) | ||
|
||
if num_fail > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is num_fail > 1
that unlikely to occur? The probability of failure is 10% having used 5th and 95th percentiles. The probability that the first two runs of ten experiments fail is 0.43% but the probability that any two of ten experiments fail is 19.4%. This may seem unintuitive but the probability that all succeed is 0.9^10 = 34.9%
.
Using a binomial distribution calculation I see that their is a 26.4%
likelihood of 2 or more failures of 10 when the likelihood of failure for a single event is 10%.
I'd suggest we use something like num_fail > 7
which has a 3.7e-7
probability or num_fail > 8
which has a 9.1e-9
probability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point! I also think num_fail > 1
is too easy to reach, i.e., the probability is 19.4%, and all we can do is to rerun the test till it succeeds. instead of using num_fail > 7
, I am thinking of using 1% percentile, where any two of ten experiments fail is 0.415%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a success is being between the 1st and 99th percentile then there is a 98% chance of success in a single instance of this 20 BO iteration run. The chance of 2 or more failures of 10 is 1.6%
. I don't know exactly what the target probability we should be aiming for here, but I think it is better to use 3 or more failures of 10 as the test criteria as it is is 0.086%
likely to occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, either 3 or more failures to be within 1st and 99th or 2 or more failures within 0.5 and 99.5 percentiles (probability is 0.00426). I think the first is safer given you used only 1k samples to compute the percentiles.
@cnpetra @thartland comments addressed |
…crowding, obtaining the path of the python __file__ in order that we can load the saved data according to the python file and not from where the file is run from. This enables us to be able to run (from hiop base dir) python src/Drivers/hiopbbpy/BODriver.py and (from hiop/src/Drivers/hiopbbpy) python BODriver.py. Previously there was an error when I tried running the latter case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @nychiang
I made one final change so that we can run the command python src/Drivers/hiopbbpy/BODriver.py
from the hiop base directory and python BODriver.py
from the src/Drivers/hiopbbpy
directory and not have issues with the path to access the data. I also moved the saved data out of the hiop base directory to avoid overcrowding the hiop base directory with data files.
@thartland Thanks a lot for your help! |
Test CI on github
So far I tested the following 4 combinations: