Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache activation environment variables #1832

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

borchero
Copy link
Contributor

Motivation

Resolves #973.

This PR obviously misses some tests, I just wanted to check in early if this goes into the right direction.

@ruben-arts
Copy link
Contributor

Thanks @borchero, cool feature. I'm a little worried about the invalidation as on windows this might not work as well. @baszalmstra can you give your feedback.

@baszalmstra
Copy link
Contributor

I like this a lot! Thanks for the contribution.

The biggest problem is that some of these environment variables are initialized from system environment variables. For instance, when you add the compiler activation scripts on Windows the path to a specific MSVC compiler is added. However, when a new update of the compiler is installed (which happens from time to time) this path is no longer valid. The same goes for the system PATH variable, this can be updated but the changes would not be reflected in our cached entry.

I think it would be good to add a time-to-live to the cache entry so that the cache entry will be recomputed after a period of time (30 minutes?).

Maybe we can also invalidate the cache if certain environment variables no longer match what they were when the cache was created? I think the PATH variable is a prime example.

The entries should also be cleared on pixi clean, or is that already the case?

@borchero
Copy link
Contributor Author

I think it would be good to add a time-to-live to the cache entry so that the cache entry will be recomputed after a period of time (30 minutes?).

I'm not sure this is the best solution to be honest. I think it introduces hard-to-debug issues that are non-deterministic over time 👀

That being said, I think it's important that we identify a set of items that need to be hashed. The lockfile is an obvious one. Using the PATH environment variable also makes sense to me. Eventually, I'm unsure how one can get around non-deterministic activation scripts (such as the compiler activation scripts you mentioned) 👀

@baszalmstra
Copy link
Contributor

Yeah I agree a TTL is very much suboptimal.

Maybe we can start by identifying a larger set of environment variables to use as input? I guess its less anoying when you have a stale cache than an invalid one.

@borchero
Copy link
Contributor Author

Maybe we can start by identifying a larger set of environment variables to use as input

That'd be great 😄 do you have any proposal where to start with this? I personally know very little about Windows and that's usually where most issues arise 🙃

@baszalmstra
Copy link
Contributor

If you start the implementation and have a list somewhere Ill investigate what we need on windows!

@baszalmstra baszalmstra self-assigned this Aug 20, 2024
@ruben-arts
Copy link
Contributor

@borchero are you planning to work on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache environment activation
3 participants