-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build output caching does not key off the current module's source? #395
Comments
Hello, the cached directories by default are the those that are outputs of The cache invalidates if any of the files in the input To invalidate the cache on the changes to files other than Did the answer help? |
Thanks, yes, that confirms what I've gathered from the docs/source. One other problem I just realized is that the cache key also misses dynamic things like build tags, linker flags, Given that behavior, I'd say there are three actual problems: Docs should say the caching is for dependencies, not the current module (by default)And it doesn't really give guidance for caching the current module (like you'd just given me). Even if we added that, I'd say there are a lot of caveats; I as a user probably wouldn't feel confident configuring it, and I bet I'm not alone. The saved cache may include objects for the current module anywayThis bloats the size of the cache bundle, meaning longer download and extraction times. Not generally a huge problem in the grand scheme of things, but for example, our cache download and extract currently adds about 10 seconds, and due to problems mentioned above, largely doesn't help with build times later on. I don't have a good suggestion for this. AFAICT there's no good way to even ask Go which cache objects belong to which packages. Current cache key is probably inadequateEven just for dependencies, I think the current key is probably inadequate for all but the simplest builds. I'm struggling to find a reference that lists all the various inputs to a build that could invalidate the cache, but I bet they're a lot. I'm not sure how to even begin to address this. You'd basically have to implement a large chunk of the internal Just how broken is this?I would say that if you're doing anything much more complicated than vanilla And I'm not sure PS I don't want it to sound like I'm ragging on the team for their efforts in implementing this; it's hard! Caching builds is far more complicated than one would think. |
Hi @nfi-hashicorp, you're more than welcome to describe the missing features. I understand your requirements and the short answer is: the actions/cache is for you. Reasoning For example, "current key is probably inadequate for all but the simplest builds" - this is meant to be a compromise for an average use case: caching rarely changing third-party dependencies. Trying to make it more specific, or to include more than 3rd party dependencies, leads to the cache being invalidated too often (and becoming useless), or growing rapidly and hitting the cache size limit. "The saved cache may still contain items for the current module" - yes, but the structure of the project is not always the same, and it is nearly impossible to determine which folders can be cached without resetting the cache on every build. "Docs should say the caching is for dependencies" - this may or may not be true depending on the value of Summary. |
Yeah, possibly. But that means coming up with a cache key that incorporates every input. Possible, maybe, but certainly untenable in the general case.
Personally I think caching should be off by default. It is wasteful in all but the simplest build scenario. It's probably 100% wasted in a cross-platform build scenario, which I would be willing to bet is not uncommon. Even in the simplest case, I don't know how much of a speed up caching would bring. Do you have data about this? I think one of the bigger UX problems is it's hard to measure the cache hit rate of the actual Go build. Sure, we can see if the |
Hello @nfi-hashicorp , With actions/cache it is possible to create any keys using the context and expressions and cache any paths moreover with Caching was off by default for a long time until few months ago the investigation had been made and according to its results it was decided to turn it on by default. I'd be extremely grateful if you suggested the build configurations you think not cached in the optimal way an we would improve the action. |
In this build, I have two parallel flows, one that builds an AMD64 build twice, and one that builds an ARM64 build twice. I deleted all caches for this repo before starting. You would expect to see the first build in each flow be slow, since it's building from no cache, then the second one should be fast. But that's not what you see Only the AMD64 one gets a speed boost, because the cache key doesn't include (Also, I notice that since the two flows collide on the same cache key, the second flow ends up failing to save the cache tarball, wasting 30s making a tarball.) As another example, this build does something similar, but instead of switching arches, it switches on Similar problem to the first example, the second The pointMy point isn't that these two inputs need to be added to the cache key, it's that the cache key has a lot more inputs than these and it would be a lot of work to enumerate (let alone capture!) them. And a cache miss can be quite expensive, just to download the tarball. So yeah, I'd be willing to bet that in most non-trivial go builds, it is actually a detriment, but I'd need to see the data. (Sidenote: Personally I think the Github action cache approach of uploading tarballs is too naive for most non-trivial builds.) |
I should note that the above really only applies to the build cache ( |
Thank you @nfi-hashicorp , we are starting to investigate the case |
Thanks! I appreciate all the hard work, LMK if I can help |
Hello @nfi-hashicorp , it seems i am getting the idea and tying to reproduce the problem, but it would be helpful to see the builds you've mentioned above. I can not access them now - aren't they are in the private repository? |
@nfi-hashicorp , so finally i understood the problem as interfering the builds which targets assume different dependencies and builds results. Ignoring the GOCACHE does not seem to be moving in the right direction because it just lowers the performance. Setting different GOCACHE(GOMODCACHE) for the specific build but it does not resolve the problem because of the same key Do you think having an input Meanwhile, from that i saw in the comment describing you workflow the workaround for the problem could be to save the target variables in the file and include it into the list of hashed files:
does it help? |
Yes, they were :P Sorry about that. It's public now.
That almost works, except that it doesn't take into account any of the actual source files in my repo (besides
It can help, but in the general case, capturing all possible inputs that could affect the cache is practically a very difficult problem. Really only
Honestly I would be willing to bet that in many use cases, not saving the
I was actually thinking a small win would be to have separate control for those. Your arg |
Another problem that's related: apparently Github action cache can't be overwritten, it must be deleted first. I'm not sure if that's considered a bug or design flaw, or "works as intended". But it certainly makes it hard to deal with the Imagine we fix the cache key so that it incorporates some but not all of the inputs that could invalidate the
This is worse than no cache, because we're incurring the cost of a 100% irrelevant download+extract and compress+failed upload cycle. If we had the ability to overwrite, we could have a |
Hello @nfi-hashicorp
if i understood the problem right, it can be solved with adding the source files to
I suggest to do not take the inputs in the account but instead let the user to add any uniq key to the auto generated key. This allows to modify the cache key per job even the auto generated keys are the same (due to the same underlying files)
|
You also might be interested in the pretty straightforward solution of this problem: |
That could work provided you have no files that are conditionally included. E.g.
Yes, that could work, but requires the user knowing to add keys for those things. And false cache hits are really only recognizable by their timing effects. I'm wondering if instead of a |
glob masking https://github.com/isaacs/node-glob#readme has very expressive syntax that allows to describe virtually any real-life combinations.
yes, i assume some trade off between attempts to make fully automated action and requiring some intellect from the user. I consider full-automation impossible because of too many variants and even having not knowledge about intended build targets. But i expect the users who builds multi-target applications to be savvy guys. So, finally i'd like to get the confirmation " Alternatively i will suggest the workaround with storing the build into the file and include the file into
|
I haven't looked but I'd be willing to bet it cannot take into account Go build constraints. Those are full boolean expressions.
Yes, it can in theory, but in practice specifying all the inputs precisely would be extremely difficult. We still have the problem that the default behaviour is, IMO, incorrect. After the first run, it will fail to overwrite, even if the contents of the relevant source have changed. It will forever after lug around those initial semi-relevant build objects, until garbage collected or |
The only reason this happens is if the relevant sources are not included in the `cache-dependency-path' input. They need to be there to trigger the cache rebuild.
So, you suggestion is to delete |
Yes, by default, I think we should not cache Also, and this is more a matter of opinion, and really needs to be informed by the user, but I don't think caching |
Hello @nfi-hashicorp, sorry for long delay, but it took some time to arrange the scheduled feature, but in turn there's a proposal that could solve most if not all the problems the current implementation of the caching has. There's an early stage of the development, but i want you to be aware of upcoming changes #426. |
IIUC, a job with cache enabled goes roughly like this (with defaults):
go.sum
~/.cache/go-build/
and upload it with that key.The key is:
--- https://github.com/actions/setup-go/blob/main/src/cache-restore.ts#L34
But that doesn't include any inputs from the current module's source (other than
go.sum
)? So really, we're just caching the build outputs of the dependencies?(Or to be more precise, we bundle all of
~/.cache/go-build
, but only whengo.sum
changes. If we did a build of the current module, then the build cache will contain its build objects. But it will never get updated if only the source of the current module changed.)It's unclear if this is intended or an oversight. The
README.md
definitely isn't explicit that the current module's source is not considered WRT to the cache key and whether a save needs to happen.Also, I haven't really thought about it, but I'm not sure how cache test results play into this.
The text was updated successfully, but these errors were encountered: