-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the way compute-coverage finds the hpc dir in dist-newstyle #2116
Conversation
In order to actually check that this fixes the problem, should we create an additional pull request to merge a version bump into this branch and see if the CI works there? (then close it) or is that not worthwhile? I did check that it locates the correct directory in my own checkout, which currently does have two versions' worth of build results in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to actually check that this fixes the problem, should we create an additional pull request to merge a version bump into this branch and see if the CI works there?
I'd be OK with that. Alternatively, we could just bump the version in a temporary commit in this PR, and if that works as expected, we can undo that commit before landing the PR.
compute-coverage.sh
Outdated
# The hpc dir we want is parallel to the build dir, so drop the end bits off | ||
# and replace them. | ||
SAW=$(cabal list-bin -v0 exe:saw) | ||
HPC_ROOT=$(echo "$SAW" | sed 's,build/saw/saw$,hpc,') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, I asked the folks on the #ghc Matrix channel if this particular subdirectory was liable to change in future cabal
versions. I'm glad I asked, because the location actually changed in cabal-install-3.12.0.0
. In that version, cabal
now puts generated hpc
-related files under a subdirectory that looks something like:
/home/ryanscott/Documents/Hacking/Haskell/saw-script/dist-newstyle/build/x86_64-linux/ghc-9.4.8/saw-script-1.0.0.99/build/extra-compilation-artifacts/hpc/vanilla/mix/saw-script-1.0.0.99-inplace
That is to say, they're now located under saw-script-<ver>/build
rather than saw-script-<ver>/hpc
. It might be worth calling find
on saw-script-<ver>
instead to be forward-compatible with cabal-install-3.12.0.0
or later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it now does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending testing.
Hm, it failed after reverting. So you can bump the version up, but not down. Is that good enough? The script appears to be doing what we intended, I'm guessing the problem is that reverting the version doesn't recompile/rerun enough things (whereas bumping it creates a new tree)...? |
Not quite enough information in the logs to determine if it's a |
I will defer to you to decide what to do (my inclination would be to merge what we've got since it seems to be at least a partial step forward)... I don't know enough about how the tool works to have any useful input. |
In a different context (#2118), I realized that we are inadvertently uploading the compiled |
Worth a try, I think |
ok, that worked, now rolling it back again |
Nope, same result. |
Two of the intTests timed out, too, not sure what to make of that |
Something is very wrong here. The fact that the I'm really quite baffled as to how this happened, given that I haven't observed anything like what is going on in this PR happening in any other jobs (the nightly job in particular is passing without issues), and this PR doesn't change anything related to how artifacts are uploaded or downloaded. |
Eeeenteresting, as they say... it must be something about this particular run and the state that's been left behind here, because this is the third run with the same tree, and the fifth if you don't count the version changes as different (and they should not be able to cause "real" problems...) Sometimes I think every test run should both clone a fresh tree and build it and pull/update and rebuild the already-built tree from the previous run, and then diff them and fail if they aren't identical. I am starting to suspect a cabal problem... |
I would be somewhat surprised if state had something to do with this, given that GitHub Actions artifacts are tied to particular workflows. That being said, I am out of other plausible explanations. Perhaps we should try nuking the cache and trying again just to be extra sure. |
Well, we know it's reusing the build dir or we wouldn't have seen the original behavior with two versions at once in dist-newstyle. That's definitely enough state for things to go off the rails if there's a problem in the rebuild logic somewhere. |
I have nuked the cache and rebuilt from the current state of d283404. Let's see what happens when we try a round of version back-and-forth now... |
Another thing that got suggested was to try bumping back to a version that already got built once (that is, 1.2.0.99 -> 1.2.1.50 -> 1.2.0.99 and then back to 1.2.1.50), and see if that gives other behavior that might be illuminating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI now passes this round of back-and-forth after nuking the cache. Personally, I'm satisfied with this as evidence that this works, so I'll go ahead and approve this. (If you want to try decrementing the version, that's fine too, although I don't think it's strictly necessary to demonstrate that this works as intended.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is always so damn fiddly. I agree that this PR seems valid and the cache-based issues are not likely to be a behavioral regression, although we should add the behavior/cache-nuke-resolution to RyanM's documentation in case it is encountered in the future.
I find it puzzling that nuking the cache would make it able to revert a version bump. I think there must be something bad going on, but I think it's likely a cabal bug and unless we want to go chasing after it for upstream's benefit there's little to be gained by mucking around further here. (And if we do, the first thing is to try to replicate the behavior outside the CI environment, which I'm not really yet in a position to do right now.) So I'll merge. |
However, I'm going to squash all the bump/unbump commits since we don't want those on the trunk forever. |
Previously it searched blindly and would get confused if there were build results from multiple saw-script versions. Now we use cabal list-bin to find the proper version and look there.
This brings in the recent CI changes and particularly #2118.
e223310
to
d09ca22
Compare
(I also squashed the original review adjustments, no real reason to keep those separate; but I kept the original branch point and the merge with master as those might be relevant if someone comes back later.) |
Previously it searched blindly and would get confused if there were build results from multiple saw-script versions. Now we use cabal list-bin to find the proper version and look there.
Fixes #2114.