-
Notifications
You must be signed in to change notification settings - Fork 34
test larger cpu runner #1170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test larger cpu runner #1170
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1170 +/- ##
==========================================
- Coverage 94.66% 92.51% -2.16%
==========================================
Files 143 143
Lines 10994 11012 +18
==========================================
- Hits 10408 10188 -220
- Misses 586 824 +238
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running." Good to know! Re-running now |
|
large worked but timed out after 12 hours (which we can set up to 1 week) -- I will try non-intergration tests since AFAIK that is what @IAlibay is trying to run -- just the slow tests. |
Yeah runninng the "integration" tests is probably overkill without a GPU. |
|
large: |
|
xlarge |
|
better than 2x improvement |
|
Last check, going to see if the intel flavor is any faster |
|
@mikemhenry what flags are you using for these CPU runners? --runslow or --integration too? 3h seems wayy too long for just the slow tests. |
|
integration as well -- I wanted to get some benchmarking data on the integration tests without a GPU |
|
I actually turned off integration tests back in 98cec71 |
|
But you right, that is kinda slow for just the slow tests |
|
Now the runners are running out of disk space when installing the env, need to check if there are new deps making the env bigger or something else going on. I can also increase the EBS image size. |
|
Sweet, getting: |
|
timing info btw |
|
@IAlibay how do you invoke the tests? |
Testing right now with the |
|
@mikemhenry it runs in 35 mins for me |
|
While we wait for that, looking at the slowest runs: Should we move some of these tests to integration tests? I was thinking we could print all the test durations and see what it looks like to figure out which ones we should move. We could also mark them as needing a GPU or something. This reminds me of #1133 |
@mikemhenry a lot of these are significantly faster with #1131 - we were meant to use the output of this PR to test out that PR. |
@mikemhenry see above - 35 mins locally. |
Was this with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to block because it's Friday and I'm rather unsure as to what should be happening here.
@mikemhenry I think I just don't understand what we're trying to achieve here. The plan was for this to be a "fast" way to run slow tests. As it stands this AWS CPU runner is slower than running the package install tests (which also run slow tests?) by nearly 3x. Unless I'm missing something, this seems kinda not worth it?
|
No API break detected ✅ |
|
I was using the wrong instance family for this. The T-series have "burst" CPU but are not meant for constant use. If we use a |
|
@IAlibay are we happy with this now? |
|
offline we discussed that we will merge this in and then test irfans PR that speeds up CI, then figure out if we want to optimize this further with a large instance |
|
No API break detected ✅ |
1 similar comment
|
No API break detected ✅ |
| OFE_SLOW_TESTS: "true" | ||
| DUECREDIT_ENABLE: 'yes' | ||
| OFE_INTEGRATION_TESTS: FALSE | ||
| OFE_INTEGRATION_TESTS: TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think this should be false until we figure out the GPU stuff?
…u_runner' into feat/test_larger_cpu_runner
|
No API break detected ✅ |
|
https://github.com/OpenFreeEnergy/openfe/actions/runs/15592958046 testing it here before we merge it in |
|
No API break detected ✅ |
Checklist
newsentryDevelopers certificate of origin