-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache per interpreter #62
Conversation
Your issue seems to come from a cache entry computed with a given version of beniget/gast/memestra, and then a usage from another version of gast/beniget/memestra. Theoritically, the interpreter version should not change the semantic computed by memstra, so it shouldn't be part of the key computation. One option here would be to use beniget version in the hash computation (and probably memestra's too?) |
I don't think so. I ran the same exact code with two different interpreters to reproduce.
Yes, but it hides bugs in caching code paths that are different between interpreters such as the ones that are hit when dealing with
I don't think that will help, because it's not a library version issue. |
Do you mean you have the same library version with both interpreters, but still different cache entries for the same file? |
No, the cache isn't different, and that's the issue. The cache is the same, regardless of interpreter. This is problematic because of implementation details that differ in memestra due to different Python versions. This hides code paths that are different because of this different versions. The recent bugs I found in gast and beniget were found after removing the cache and running with Python 3.8. If I hadn't deleted the cache the bug wouldn't have triggered. |
I see nothing that explicitly depends on
My understanding is that if we have a bug in beniget, we get a cache key and we keep it even once the bug in beniget is fixed. Thus my hint toward adding the beniget version in the hash key computation. Note that I'm not saying that you're wrong. I just want to understand the exact origin of the caching issue you're meeting. From an abstract point of view, the cache key is just a hash of the source, and the cache value is (roughly) a list of obsolete functions. This binding should not depend on the interpreter. |
Indeed, but that isn't necessary for a code path that is Python 3.8 specific to be hit while traversing the import graph.
Hm, I don't think I'm explaining the issue clearly enough, because this would not address the issue I'm raising.
Ok, I'm going to write down the exact failure mode, which will hopefully make it clear why I think the interpreter version should be part of the cache structure. You need the previous versions of gast and beniget to reproduce the problem.
The second step here should always fail, even if the source code in the module is byte-for-byte equal for the different interpreters. Hopefully this makes it clear that even if the source code being cached is exactly the same, the code path being taken to generate the cache key isn't necessarily the same. |
This looks like the hash should use as a salt:
Unfortunately, gast and beinget don't provide a |
Cool. Closing this PR then. |
This PR is one implementation of a solution to address #61.
The problem is that when traversing the standard library, memestra can potentially (and does)
cache a standard library module's code from a potentially different version of Python than the
one that created the cache entry. This hides bugs in the cache implementation for different versions
of Python, because the cache will be hit regardless of the version.
The implementation here creates a directory below the top level cache directory that
is equivalent to
sys.executable
with any leading path separators or drives (in the case of Windows)removed.
The effect is that there's a distinct cache per interpreter.
One caveat is that on some systems, an upgrade of the system python, for example
/usr/bin/python3
,can result in a similar bug being hidden on such systems. I'm not sure whether
that case (development environments that use the system Python) is common
enough to warrant addressing in this PR.
Closes #61.