Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 put_object should accept a block to facillitate chunked writes #3142

Open
1 of 2 tasks
ezekg opened this issue Nov 14, 2024 · 7 comments
Open
1 of 2 tasks

S3 put_object should accept a block to facillitate chunked writes #3142

ezekg opened this issue Nov 14, 2024 · 7 comments
Labels
feature-request A feature should be added or improved. needs-major-version Can only be considered for the next major release p2 This is a standard priority issue

Comments

@ezekg
Copy link

ezekg commented Nov 14, 2024

Describe the feature

After using get_object's chunked read, I assumed put_object similarly supported chunked writing:

client.put_object(bucket: blob.bucket, key: blob.key) do |buffer|
  while chunk = blob.read(16 * 256)
    buffer << chunk
  end
end

For reference, get_object supports this:

client.get_object(bucket: blob.bucket, key: blob.key) do |chunk|
  buffer << chunk
end

But this isn't currently supported and results in an empty object, since the block is ignored.

Use Case

I want to write an IO to S3 while maintaining a low memory footprint, while being explicit with how much I read for each chunk. I do not want to rely on S3 internals to choose how large my chunks should be.

Proposed Solution

Similarly to get_object, allow put_object to accept a block, yielding the internal request body.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

SDK version used

1.113.0

Environment details (OS name and version, etc.)

Linux 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

@ezekg ezekg added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 14, 2024
@mullermp
Copy link
Contributor

@RanVaknin RanVaknin added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 15, 2024
Copy link

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 25, 2024
@ezekg
Copy link
Author

ezekg commented Nov 25, 2024

Hi, have you looked at https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-s3/lib/aws-sdk-s3/customizations/object.rb#L385

Thanks for this. I wasn't aware of that method. I'm curious if we could make put_object with a block delegate to upload_stream?

@mullermp
Copy link
Contributor

I'm not sure if that's possible without a breaking change within the major version. The block is already reserved to be a response target here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/seahorse/client/request.rb#L70. There would be no way to differentiate that a block is for reading or writing and would be inconsistent.

@mullermp mullermp added needs-major-version Can only be considered for the next major release and removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 This is a standard priority issue labels Nov 25, 2024
@mullermp
Copy link
Contributor

I believe you can also pass an IO as the body for put_object and it will be read. I'll leave this as an open feature request but I think the interface would have to be different.

@ezekg
Copy link
Author

ezekg commented Nov 25, 2024

I'm not sure if that's possible without a breaking change within the major version. The block is already reserved to be a response target here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/seahorse/client/request.rb#L70. There would be no way to differentiate that a block is for reading or writing and would be inconsistent.

The put_object method does not currently take a block or pass it along to send_request, so I don't think introducing a block that is used for streaming writes would be a breaking change. I do understand that the internals of put_object would need to be refactored, but I don't see any apparent breaking changes for the public put_object API.

I am currently passing an IO body that streams the data as required (well as much as I can from the outside), just thought the block interface would be a nicer and clearer DX, since it'd align well with assumptions from using get_object.

@mullermp
Copy link
Contributor

mullermp commented Nov 25, 2024

This could be done by checking streaming input modeling on the operation. However this could be an inconsistent API, where some operations have block streaming requests and others for responses. Additionally, writing from the block would be very complex - net http body writing would have to yield to the block and I believe that would be inefficient. Our current build request would need to differentiate block types. Currently the IO body is passed to net http's body stream and uses IO.copy_stream (written in C) and the stream is read in chunks already. I can leave this open as a feature request to consider.

@RanVaknin RanVaknin added the p2 This is a standard priority issue label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. needs-major-version Can only be considered for the next major release p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants