S3 put_object should accept a block to facillitate chunked writes #3142

ezekg · 2024-11-14T22:34:45Z

Describe the feature

After using get_object's chunked read, I assumed put_object similarly supported chunked writing:

client.put_object(bucket: blob.bucket, key: blob.key) do |buffer|
  while chunk = blob.read(16 * 256)
    buffer << chunk
  end
end

For reference, get_object supports this:

client.get_object(bucket: blob.bucket, key: blob.key) do |chunk|
  buffer << chunk
end

But this isn't currently supported and results in an empty object, since the block is ignored.

Use Case

I want to write an IO to S3 while maintaining a low memory footprint, while being explicit with how much I read for each chunk. I do not want to rely on S3 internals to choose how large my chunks should be.

Proposed Solution

Similarly to get_object, allow put_object to accept a block, yielding the internal request body.

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

SDK version used

1.113.0

Environment details (OS name and version, etc.)

Linux 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

mullermp · 2024-11-14T23:57:49Z

Hi, have you looked at https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-s3/lib/aws-sdk-s3/customizations/object.rb#L385

github-actions · 2024-11-25T00:00:51Z

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

ezekg · 2024-11-25T00:08:55Z

Hi, have you looked at https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-s3/lib/aws-sdk-s3/customizations/object.rb#L385

Thanks for this. I wasn't aware of that method. I'm curious if we could make put_object with a block delegate to upload_stream?

mullermp · 2024-11-25T18:55:57Z

I'm not sure if that's possible without a breaking change within the major version. The block is already reserved to be a response target here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/seahorse/client/request.rb#L70. There would be no way to differentiate that a block is for reading or writing and would be inconsistent.

mullermp · 2024-11-25T18:57:43Z

I believe you can also pass an IO as the body for put_object and it will be read. I'll leave this as an open feature request but I think the interface would have to be different.

ezekg · 2024-11-25T19:08:49Z

I'm not sure if that's possible without a breaking change within the major version. The block is already reserved to be a response target here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/seahorse/client/request.rb#L70. There would be no way to differentiate that a block is for reading or writing and would be inconsistent.

The put_object method does not currently take a block or pass it along to send_request, so I don't think introducing a block that is used for streaming writes would be a breaking change. I do understand that the internals of put_object would need to be refactored, but I don't see any apparent breaking changes for the public put_object API.

I am currently passing an IO body that streams the data as required (well as much as I can from the outside), just thought the block interface would be a nicer and clearer DX, since it'd align well with assumptions from using get_object.

mullermp · 2024-11-25T19:28:18Z

This could be done by checking streaming input modeling on the operation. However this could be an inconsistent API, where some operations have block streaming requests and others for responses. Additionally, writing from the block would be very complex - net http body writing would have to yield to the block and I believe that would be inefficient. Our current build request would need to differentiate block types. Currently the IO body is passed to net http's body stream and uses IO.copy_stream (written in C) and the stream is read in chunks already. I can leave this open as a feature request to consider.

ezekg added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 14, 2024

RanVaknin added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 15, 2024

github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 25, 2024

RanVaknin added the p2 This is a standard priority issue label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 put_object should accept a block to facillitate chunked writes #3142

S3 put_object should accept a block to facillitate chunked writes #3142

ezekg commented Nov 14, 2024 •

edited

Loading

mullermp commented Nov 14, 2024

github-actions bot commented Nov 25, 2024

ezekg commented Nov 25, 2024 •

edited

Loading

mullermp commented Nov 25, 2024

mullermp commented Nov 25, 2024

ezekg commented Nov 25, 2024 •

edited

Loading

mullermp commented Nov 25, 2024 •

edited

Loading

S3 put_object should accept a block to facillitate chunked writes #3142

S3 put_object should accept a block to facillitate chunked writes #3142

Comments

ezekg commented Nov 14, 2024 • edited Loading

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

SDK version used

Environment details (OS name and version, etc.)

mullermp commented Nov 14, 2024

github-actions bot commented Nov 25, 2024

ezekg commented Nov 25, 2024 • edited Loading

mullermp commented Nov 25, 2024

mullermp commented Nov 25, 2024

ezekg commented Nov 25, 2024 • edited Loading

mullermp commented Nov 25, 2024 • edited Loading

ezekg commented Nov 14, 2024 •

edited

Loading

ezekg commented Nov 25, 2024 •

edited

Loading

ezekg commented Nov 25, 2024 •

edited

Loading

mullermp commented Nov 25, 2024 •

edited

Loading