Switch S3 writes to use streaming I/O #51

bbockelm · 2024-10-28T14:46:21Z

We currently buffer the data from the client until enough is in memory to form a full "part" that we send to S3 in one call.

As long as we know the full size of the object, we can simply stream the data as it comes in from the client - no need to hold it in the buffer. This PR starts the process to transitioning to streaming mode.

Note that if the client doesn't provide the full file size, then we don't know the size of the part we are uploading (will it be a full 100MB? Will it be shorter?); this is a no-no from the AWS side.

Hence, a final solution, unfortunately, is going to likely require both approaches. We can, however, put in a requirement like "if you don't provide the final output data size, we will buffer at most 10MB and you'll have a corresponding object size limit".

TODO items to finish off this branch:

Run under valgrind, hunt down the occasional segfault (possibly a race condition -- goes away if you run one test at a time or have large volumes of debugging enabled).
Small-object optimization: when the file size is provided and the entire file is provided in a single write call, do a single PUT -- no start / upload / complete of the S3 multipart protocol required.
Add back the buffering for objects of unknown size.

bbockelm · 2024-10-28T17:10:05Z

Item 4: We also need to decide when to fail out an in-progress operation.

We currently pause the handle when we're done with the current buffer -- but it's not clear what happens when the client starts some writes and then never returns.

For example, I think the file is automatically closed by XRootD but in our destructor, we never cancel or fail the workflow. Needs investigation.

bbockelm · 2024-11-30T02:01:07Z

Ok --- all 4 items outlined are complete!

Valgrind comes out clean and there haven't been any more intermittent crashes.
Small object optimization is complete. If uploads are done in a single Write call, then no multipart upload is attempted.
Object of unknown size are now buffered. [XrdHttp] Set oss.asize if object size is known xrootd/xrootd#2378 sets the upload size for most HTTP use cases.
We now declare transfers as "stalled" if no call to Write occurs within 10 seconds after the last one. Unit test for this case is included.

jhiemstrawisc · 2024-12-02T17:00:54Z

src/CurlUtil.cc

@@ -223,7 +236,7 @@ void CurlWorker::Run() {
 					op->Fail(op->getErrorCode(), op->getErrorMessage());
 				}
 			} catch (...) {
-				m_logger.Log(LogMask::Debug, "CurlWorker",
+				m_logger.Log(LogMask::Debug, "Run",
 							 "Unable to setup the curl handle");


"setup" is usually used as a noun, I think we want "set up" here.

src/CurlUtil.cc

jhiemstrawisc · 2024-12-02T18:40:52Z

src/HTTPCommands.hh

+
+	// Return whether or not the request has timed out since the last
+	// call to send more data.
+	bool Timeout() const { return m_timeout; }


Most of your other setters/getters include a verb and that makes them a little easier to read. Maybe this can be GetTimeout()?

test/s3_unit_tests.cc

This provides some minimal test cases for testing the S3 write code.

Most significantly, switches the debug/dump of the libcurl interaction to use the XRootD logging framework instead of printing right to stderr.

This refactors the request logic to allow requests to be continued over multiple calls. The calling thread will regain control when the buffer has been completely consumed (even if the full operation will require additional buffers). Note this only works if the client provides the full file size.

We want to always unpause a given operation from the same thread that is handling it. If a separate thread can pick up the operation, there is a race condition where both the original one and new one are operating on the same `CURL *` handle at the same time, resulting in an observed segfault. This commit introduces a separate "unpause" queue for each curl worker; this queue is notified by the parent when there is additional data available.

If an entire part is a single libcurl operation, a client that starts writing - but then leaves for an extended period of time - will leave dangling references inside libcurl (eventually exhausting the number of allowable transfers). This commit adds a background thread for S3File that will go through all pending uploads and check to see if they are still live; if not, then it'll timeout the operation.

After the notification is done, the request may be deleted by the owning S3File instance. Do not call `Notify` from within the curl result processing function as the request object needs to be alive to release the curl handle.

If the client doesn't pre-declare the size of the object it will write, we don't know the size of the last part of the upload. Hence, we must switch back to buffer mode in this case.

If the entire object is uploaded during a single `Write` call, then skip the multipart upload and just do a single non-buffered upload.

The unit test refactoring left copy/pasted code. This commit splits the common piece into a single header, allowign `s3_tests.cc` with the unit tests that utilize AWS S3 while `s3_unit_tests.cc` use the minio instance started up by ctest.

jhiemstrawisc

LGTM!

bbockelm force-pushed the streaming_io_v2 branch 3 times, most recently from 372cbba to 87f5a7c Compare November 30, 2024 01:55

jhiemstrawisc self-requested a review December 2, 2024 19:45

jhiemstrawisc requested changes Dec 2, 2024

View reviewed changes

bbockelm added 11 commits December 3, 2024 09:11

Add unit test coverage for S3 upload

b92f134

This provides some minimal test cases for testing the S3 write code.

Minor logging fixups

538d478

Most significantly, switches the debug/dump of the libcurl interaction to use the XRootD logging framework instead of printing right to stderr.

Fixups: Linux compile fixes for s3 tests

e3f0a24

clang-format of the curl headers

4183ba0

Notify from the main curl worker loop

4973bf4

After the notification is done, the request may be deleted by the owning S3File instance. Do not call `Notify` from within the curl result processing function as the request object needs to be alive to release the curl handle.

Restore buffering mode for objects of unknown size

a8e588b

If the client doesn't pre-declare the size of the object it will write, we don't know the size of the last part of the upload. Hence, we must switch back to buffer mode in this case.

Small file upload optimization

10dea9b

If the entire object is uploaded during a single `Write` call, then skip the multipart upload and just do a single non-buffered upload.

Correctly refactor unit tests

4ca3895

The unit test refactoring left copy/pasted code. This commit splits the common piece into a single header, allowign `s3_tests.cc` with the unit tests that utilize AWS S3 while `s3_unit_tests.cc` use the minio instance started up by ctest.

bbockelm force-pushed the streaming_io_v2 branch from 7e160ee to 4ca3895 Compare December 3, 2024 15:12

bbockelm requested a review from jhiemstrawisc December 4, 2024 15:05

jhiemstrawisc approved these changes Dec 5, 2024

View reviewed changes

jhiemstrawisc merged commit 87a041b into PelicanPlatform:main Dec 5, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch S3 writes to use streaming I/O #51

Switch S3 writes to use streaming I/O #51

bbockelm commented Oct 28, 2024

bbockelm commented Oct 28, 2024

bbockelm commented Nov 30, 2024

jhiemstrawisc Dec 2, 2024

jhiemstrawisc Dec 2, 2024

jhiemstrawisc left a comment

Switch S3 writes to use streaming I/O #51

Switch S3 writes to use streaming I/O #51

Conversation

bbockelm commented Oct 28, 2024

bbockelm commented Oct 28, 2024

bbockelm commented Nov 30, 2024

jhiemstrawisc Dec 2, 2024

Choose a reason for hiding this comment

jhiemstrawisc Dec 2, 2024

Choose a reason for hiding this comment

jhiemstrawisc left a comment

Choose a reason for hiding this comment