Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely Rebuild our CI System #10683

Merged
merged 80 commits into from
Sep 3, 2024
Merged

Completely Rebuild our CI System #10683

merged 80 commits into from
Sep 3, 2024

Conversation

Myoldmopar
Copy link
Member

@Myoldmopar Myoldmopar commented Aug 21, 2024

Pull request overview

Original Comment

Mac CI acted crazy the last two days, and I can't figure out why. I'm considering just moving it to GHA. As a quick first step, I want to add basic build/test. If that goes in smoothly, I may spend some time getting regressions working on GHA as well, using caching to store develop results for efficiency. If this goes smoothly, this would rapidly eliminate Decent CI...

For now, just a test.

Updated

Alright, so I completely rebuilt our CI system around GitHub Actions, and am going to throw Decent CI aside. I'll give a full walkthrough of my changes here, but here are some highlights:

  • Regressions now show back up on the pull request page, not hidden on the commit
  • No more waiting on me to turn the CI machine back on if it goes down for updates
  • No more having to build and deploy custom machines at NREL
  • Super transparent build/test processes documented directly in the action workflow files
  • Running tests in a fully isolated environment
  • No more waiting on our CI machine to run one build at a time
  • No more Ruby code in my life, and no more managing our fork of Decent CI
  • I added regression testing across all three platforms again
  • No more managing the build results repository and page build issues
  • Nicely handle the standardized [ ci skip ] types of commit message hints.
  • Our testing focuses on open PRs, not just stray branches.
  • You can focus on the "Checks" tab at the top of the PR for full info about allll of our builds and tests
  • Probably lots more.

Anyway, this is very close to being ready. I just pushed a significant change that may need a tweak or two, but still it's close. For what it's worth, this doesn't touch any of our build processes for release packages, just testing.

The one piece I think is missing is the performance metrics. I can add another workflow for that, but I'd love to discuss the best path forward to make it really useful.

@Myoldmopar Myoldmopar added the DoNotPublish Includes changes that shouldn't be reported in the changelog label Aug 21, 2024
@Myoldmopar Myoldmopar self-assigned this Aug 21, 2024
@Myoldmopar
Copy link
Member Author

OK, I just pushed a handful of commits that further clean things up. Key aspects are:

  • Trying RelWithDebInfo for both debug builds, which should drastically clean things up
  • Removing macos-12 from regular PR testing, we don't need to waste those cycles for every commit, and we are still building it for commits to develop and release packages
  • I'm trying to explicitly list the build targets for the debug coverage run, should also help save time
  • I tried to clean up the names of the workflows and jobs. I think it looks a lot better in the list, but I'm open to further refinement!

Things I did not do yet:

  • Add any caching, whether using GHA caching, or ccache itself. I just didn't want to introduce too many changes in one set of GHA runs
  • Dynamically set NPROC, just to again avoid breaking something random with all my other changes.
  • Make any changes to the GITHUB_STEP_SUMMARY or ctest output. Still open to these, whether in this PR or later.

@Myoldmopar
Copy link
Member Author

That naming is pretty nice looking! I'm definitely open to further refinement, but this is great.

image

There is also the workflow.run_name attribute that we could add, but I'm not sure exactly where that is shown on the GitHub UI, so maybe it's not relevant to us.

@Myoldmopar
Copy link
Member Author

I think I actually will bring the debug builds together in the same workflow file at least. I'm not sure if I'll do the artifact upload/download just yet, because I think we may have each compilation split up sufficiently that they are pretty quick now. But either way, putting them in the same workflow will make the UI look that much nicer.

@Myoldmopar
Copy link
Member Author

  • Unit test coverage: 1 hour
  • Integration debug: 38 minutes! (1 test failure due to NaNs in SSC)
  • 3 unit/regressions: all done in less than an hour

Others all done quickly. This is looking super great. I am now pushing one more time to skip that SSC file, and also regroup our related CI builds together. I don't think it should affect anything, but if so I'll fix it up with one more commit and I think we're safe to merge after that. Follow-up work would include fixing up caching, NPROC, parsing the ctest results, and the GITHUB_STEP_SUMMARY stuff. But this doesn't need to wait on those.

@Myoldmopar Myoldmopar marked this pull request as ready for review August 30, 2024 20:28
@Myoldmopar
Copy link
Member Author

So it seems the RelWithDebInfo cmake build type is actually muting debug assertions, because it is still definiing NDEBUG. I think what I'd like to do is just modify our Cmake build commands so that it removes NDEBUG during RelWithDebInfo builds. I think nearly no one ever builds with that type, so it won't affect anything. And regardless it seems reasonable to include assertions if you need debug info...

Thoughts @jmarrec ?

@amirroth
Copy link
Collaborator

I build with it fairly often (certain issues don't reproduce in debug builds), but I also muck around with the Makefiles locally so you can do whatever you need to for CI purposes.

@jmarrec
Copy link
Contributor

jmarrec commented Sep 2, 2024

So it seems the RelWithDebInfo cmake build type is actually muting debug assertions, because it is still definiing NDEBUG. I think what I'd like to do is just modify our Cmake build commands so that it removes NDEBUG during RelWithDebInfo builds. I think nearly no one ever builds with that type, so it won't affect anything. And regardless it seems reasonable to include assertions if you need debug info...

Thoughts @jmarrec ?

https://github.com/Kitware/CMake/blob/9673f61be1573b7e2b701da8a6ad0fb4c8cac4a1/Modules/Compiler/GNU.cmake#L61-L65

Indeed it does. Yeah we could tweak our cmake config to remove it, or I think we could just do -UNDEBUG.

target_compile_options(project_options INTERFACE $<$<CONFIG:RelWithDebInfo>:-UNDEBUG>)

@Myoldmopar
Copy link
Member Author

I think we could just do -UNDEBUG

Thanks @jmarrec, that did the trick. I added it specifically into the GCC compiler flag block so that it wouldn't affect anything else for now. I added an assert(false) into SimulationManager and ran it with Debug, Release, and RelWithDebInfo. It passed on release, of course, but asserted on both of the others. This is perfect. OK, this was the last thing holding this up. I think this can merge at will if it comes back clean this morning.

As an upcoming step, I will add a nice way to report some metrics from the performance testing, but honestly, I'd like to get input from the team on what we'd like to see from that stuff. We can probably do lots better than what we are doing now, and I don't want to waste time simply bringing over existing stuff just to throw it away.

@Myoldmopar
Copy link
Member Author

Another interesting change here. It looks like GitHub is automatically checking out the /merge/ ref, which brings develop into the branch. This eliminates the issue of being behind develop and having false diffs. So that's great. However.....it may just not run if there are conflicts. So that's not great. I can look into this later, but I don't think this should wait any longer. It just completed the entire suite of tests in an hour and is all green. Anyone have any final thoughts here? Note that this PR does bring in a few changes to develop branch testing as well, so I'm totally open to the possibility that I will need to quickly fix something once it merges. But maybe it will all be fine? 🤷

@rraustad
Copy link
Contributor

rraustad commented Sep 3, 2024

This will disable the existing CI system. Might be good to have the Linux CI active for a period of time to compare results?

@Myoldmopar
Copy link
Member Author

I don't hate that idea @rraustad, although I was already comparing diffs on this branch by temporarily including your log1p change.

Decent CI:

image
image

The new CI regression diffs:

image

I think things are working really great. If you feel strongly about it, I'm OK putting those Decent files back in, but if not, I'll lean away from it and see how things go.

@Myoldmopar
Copy link
Member Author

Oh there actually is a difference, but it's only because on the new CI I am skipping the long running Basement files, which includes the KivaBasement. Now that the new CI stuff is working so much better, I could probably just add that back in, which will bring it into agreement with the existing Decent CI diffs.

@Myoldmopar
Copy link
Member Author

Tell you what..... I'll do both for now. I'm going to re-enable Basement tests and also add back in Decent CI's Linux and Windows stuff temporarily.

@Myoldmopar
Copy link
Member Author

Nope, scratch that as well. I've taken out some CMake functionality that would need to go back in. I am happy to help get that stuff added back into your fork when the time comes, but I'm just going to pull the plug on our Decent CI stuff for now. Worst case for me is that this falls apart when I merge and I have to just revert the whole PR :) That's all.

I will take out the ctest argument that skips the Basement files so that our regressions will match exactly what Decent CI was producing. And then I will merge this after lunch.

@Myoldmopar
Copy link
Member Author

OK, all super happy now. Adding the basement files only added a couple minutes on it looks like. I am going to merge this. Let's see how it goes once in the develop branch, and with more branches running.

@Myoldmopar Myoldmopar merged commit 83098d3 into develop Sep 3, 2024
12 checks passed
@Myoldmopar Myoldmopar deleted the MacBuildsOnGHA branch September 3, 2024 17:44
@Myoldmopar Myoldmopar mentioned this pull request Sep 3, 2024
6 tasks
@Myoldmopar
Copy link
Member Author

WELLLLL after all that testing and no issues with queued up runners waiting, I merged it into develop, and immediately it's too much. Geez. OK, well, I'm reverting this to avoid breaking anything, and I'll look back into self-hosted runners. Or whatever.

@Myoldmopar Myoldmopar restored the MacBuildsOnGHA branch September 3, 2024 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DoNotPublish Includes changes that shouldn't be reported in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants