Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share data loader to across asyncio boto sessions #40658

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion airflow/providers/amazon/aws/hooks/base_aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,19 @@

from airflow.models.connection import Connection # Avoid circular imports.

_loader = botocore.loaders.Loader()
"""
botocore data loader to be used with async sessions

By default, a botocore session creates and caches an instance of JSONDecoder which
consumes a lot of memory. This issue was reported here https://github.com/boto/botocore/issues/3078.
In the context of triggers which use boto sessions, this can result in excessive
memory usage and as a result reduced capacity on the triggerer. We can reduce
memory footprint by sharing the loader instance across the sessions.

:meta private:
"""


class BaseSessionFactory(LoggingMixin):
"""
Expand Down Expand Up @@ -155,7 +168,9 @@ def _apply_session_kwargs(self, session):
def get_async_session(self):
from aiobotocore.session import get_session as async_get_session

return async_get_session()
session = async_get_session()
session.register_component("data_loader", _loader)
return session

def create_session(
self, deferrable: bool = False
Expand Down