-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identical data is loaded into every session wasting memory #3078
Comments
Hi @sparrowt thanks for reaching out. Have you tried sharing a single loader instance across several sessions? For example: from botocore.loaders import Loader
loader = Loader()
sessions = some_func_that_makes_multiple_sessions()
for session in session:
session.register_component('data_loader', loader) Another option is using a single session to create multiple clients which get passed to the other threads: session = boto3.session.Session(region_name='us-east-1')
client1 = session.client('s3')
client2 = session.client('someotherservice')
# In one thread
client1.do_something()
# In another thread
client2.do_something() The |
Thanks so much for getting back to me @tim-finnigan. I have not tried that, I assumed
To respond to some of your other points:
Sadly this is not really an option in my case: the app in question is a multi-threaded web server and it is not possible to predict in advance which boto3 clients any given thread might need, so because Session is not thread safe, each thread has to create its own session in order to create the client(s) it needs. I am already caching that session using
It is 781KB (only surpassed by a handful of the service definitions) however loading it into memory in python results in nearly 6 MB of memory allocation according to analysis using the Austin profiler e.g. in the memory allocation profile trace below where I did |
Found that each call of The example was not exactly clear to me but did point me in the right direction on what I should be trying, so as a note for others, this got my ~200ms down to ~20ms per call to create and S3
Then in your threads later you can use the following for a significantly faster setup time:
|
By default, a botocore session creates and caches an instance of JSONDecoder which consumes a lot of memory. This issue was reported here boto/botocore#3078. In the context of triggers which use boto sessions, this can result in excessive memory usage and as a result reduced capacity on the triggerer. We can reduce memory footprint by sharing the loader instance across the sessions.
By default, a botocore session creates and caches an instance of JSONDecoder which consumes a lot of memory. This issue was reported here boto/botocore#3078. In the context of triggers which use boto sessions, this can result in excessive memory usage and as a result reduced capacity on the triggerer. We can reduce memory footprint by sharing the loader instance across the sessions.
By default, a botocore session creates and caches an instance of JSONDecoder which consumes a lot of memory. This issue was reported here boto/botocore#3078. In the context of triggers which use boto sessions, this can result in excessive memory usage and as a result reduced capacity on the triggerer. We can reduce memory footprint by sharing the loader instance across the sessions.
By default, a botocore session creates and caches an instance of JSONDecoder which consumes a lot of memory. This issue was reported here boto/botocore#3078. In the context of triggers which use boto sessions, this can result in excessive memory usage and as a result reduced capacity on the triggerer. We can reduce memory footprint by sharing the loader instance across the sessions. GitOrigin-RevId: d2b61976a4f8f73286906e2a6884d836a11fe4fb
Describe the bug
Each botocore
Session
creates its own instance of aLoader
within which JSON content loaded frombotocore/data/
is cached by@instance_cache
e.g. on methods likeload_service_model
&load_data_with_path
.This caching applies to many things including loading endpoints.json into an
EndpointResolver
which happens in every session and results in approx 6 MB of memory allocation (to load details of all the HTTP endpoints for every region/partition for all 300+ AWS services).The JSON files shipped with botocore presumably do not change on disk at runtime. Nevertheless if you create several sessions within a process - e.g. in a multi-threaded app because sessions are not thread safe - this exact same data is loaded into memory multiple times and cached separately in each
Session
'sLoader
and in itsEndpointResolver
.It seems therefore like a bug (of the wasteful memory usage variety) that the immutable JSON cache is per-session rather than per-process. In a multi-threaded app in a resource-constrained environment every 6MB really adds up.
Expected Behavior
When creating a 2nd (and any subsequent) Session, the data which has already been loaded from
endpoints.json
should be re-used, quickly and without unnecessary extra memory allocation.Current Behavior
Instead, each new session actually loads the whole thing in again resulting in another ~6 MB of memory usage each time, storing it in a new
EndpointResolver
(andLoader
) with the newSession
. (The same issue exists for other JSON data such as service definitions but I'm just focussing on the most common & most impactful example that I observed.)Reproduction Steps
Possible Solution
One solution would be to make the
Loader
process-wide with suitable locking on state as necessary. I imagine the small extra overhead is more than paid for by the memory savings if many sessions/clients are created.A more radical alternative would be for the pre-processing step that generates the botocore/data/ to spit out, instead of each JSON file, a python module (.py file) containing a dict with the same data. Then Loader doesn't have to load JSON, it just lazily imports the python files it needs and python's
importlib
gives you the process-wide sharing and thread safety for free. I imagine this would be a much more difficult change having seen the existence of things like CUSTOMER_DATA_PATH (~/.aws/models/) so it may not be feasible - but I've included it if nothing else for hypothetical comparison and to illustrate the principle of the problem.Additional Information/Context
boto/boto3#1670 is very related - this ticket is an attempt at a detailed description of why each session increases memory usage so much and how this might be avoided.
Any
SDK version used
botocore==1.33.1 boto3==1.33.1
Environment details (OS name and version, etc.)
Windows 10 Ubuntu WSL / same happens on Amazon Linux 2
The text was updated successfully, but these errors were encountered: