Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to skip JSON parsing for DynamoDB get_item, query, and the like #3238

Closed
1 of 2 tasks
pechersky opened this issue Aug 13, 2024 · 5 comments
Closed
1 of 2 tasks
Assignees
Labels
feature-request This issue requests a feature.

Comments

@pechersky
Copy link

Describe the feature

When doing result = session.client("dynamodb").query(**kwargs), have an option to skip json-parsing of the output to speed up collection of results.

Use Case

We are loading on the order of thousands of items from DynamoDB using aioboto3/aiobotocore. Each item is ~16KB, and we combine/filter items and forward them to a client. We noticed that in a loop retrieving these items, AioJSONParser.parse is the largest contributor of the time (as measured by pyinstrument).
botocore took 50 s for 1000 dynamodb queries (20% inside JSONParser.parse)
image

Proposed Solution

One possibility is to have a keyword argument for client and/or resource calls that turns off parsing that allows the user to receive the raw body of the response.

Other Information

botocore:

import json
from botocore.session import get_session

def main():
    session = get_session()
    tasks = []
    query_kwarg_list = json.load(open("kwargs.json"))
    assert len(query_kwarg_list) == 1000
    client = session.create_client("dynamodb")
    for kwargs in query_kwarg_list:
        tasks.append(client.query(**kwargs))
    result = tasks
    assert len(result) == 1000


if __name__ == "__main__":
    main()

aiobotocore:

import asyncio
import json
from aiobotocore.session import get_session

async def main():
    session = get_session()
    tasks = []
    query_kwarg_list = json.load(open("kwargs.json"))
    assert len(query_kwarg_list) == 1000
    async with session.create_client("dynamodb") as client:
        for kwargs in query_kwarg_list:
            tasks.append(client.query(**kwargs))
        result = await asyncio.gather(*tasks)
    assert len(result) == 1000


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Cf: aio-libs/aiobotocore#1132

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

SDK version used

1.34.131 and higher

Environment details (OS name and version, etc.)

python 3.10, 3.11; Ubuntu 20.04

@pechersky pechersky added feature-request This issue requests a feature. needs-triage This issue or PR still needs to be triaged. labels Aug 13, 2024
@pechersky
Copy link
Author

The developer of aiobotocore suggested:

May I advise to raise an issue with botocore to improve performance of JSON coding, e.g. by supporting alternative JSON libraries such as orjson? That would benefit users of both botocore and aiobotocore.

@pechersky
Copy link
Author

pechersky commented Aug 13, 2024

Further inspection indicates that it isn't even json parsing, but rather, the shape parsing:
image
image

@pechersky
Copy link
Author

Likely relevant issue: boto/boto3#2928

@tim-finnigan tim-finnigan self-assigned this Aug 20, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reaching out. I brought this issue up for discussion with the team, and the consensus was that there are no plans to change the current behavior. SDKs like Boto3 rely on the JSON parsing — you would need to call service APIs directly in order to get the raw responses. We can continue tracking the issue in boto/boto3#2928 for now to get more feedback and explore potential optimizations.

@tim-finnigan tim-finnigan closed this as not planned Won't fix, can't repro, duplicate, stale Aug 20, 2024
@tim-finnigan tim-finnigan removed the needs-triage This issue or PR still needs to be triaged. label Aug 20, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature.
Projects
None yet
Development

No branches or pull requests

2 participants