Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch S3 writes to use streaming I/O #51

Merged
merged 11 commits into from
Dec 5, 2024

Conversation

bbockelm
Copy link
Collaborator

We currently buffer the data from the client until enough is in memory to form a full "part" that we send to S3 in one call.

As long as we know the full size of the object, we can simply stream the data as it comes in from the client - no need to hold it in the buffer. This PR starts the process to transitioning to streaming mode.

Note that if the client doesn't provide the full file size, then we don't know the size of the part we are uploading (will it be a full 100MB? Will it be shorter?); this is a no-no from the AWS side.

Hence, a final solution, unfortunately, is going to likely require both approaches. We can, however, put in a requirement like "if you don't provide the final output data size, we will buffer at most 10MB and you'll have a corresponding object size limit".

TODO items to finish off this branch:

  1. Run under valgrind, hunt down the occasional segfault (possibly a race condition -- goes away if you run one test at a time or have large volumes of debugging enabled).
  2. Small-object optimization: when the file size is provided and the entire file is provided in a single write call, do a single PUT -- no start / upload / complete of the S3 multipart protocol required.
  3. Add back the buffering for objects of unknown size.

@bbockelm
Copy link
Collaborator Author

Item 4: We also need to decide when to fail out an in-progress operation.

We currently pause the handle when we're done with the current buffer -- but it's not clear what happens when the client starts some writes and then never returns.

For example, I think the file is automatically closed by XRootD but in our destructor, we never cancel or fail the workflow. Needs investigation.

@bbockelm bbockelm force-pushed the streaming_io_v2 branch 3 times, most recently from 372cbba to 87f5a7c Compare November 30, 2024 01:55
@bbockelm
Copy link
Collaborator Author

Ok --- all 4 items outlined are complete!

  1. Valgrind comes out clean and there haven't been any more intermittent crashes.
  2. Small object optimization is complete. If uploads are done in a single Write call, then no multipart upload is attempted.
  3. Object of unknown size are now buffered. [XrdHttp] Set oss.asize if object size is known xrootd/xrootd#2378 sets the upload size for most HTTP use cases.
  4. We now declare transfers as "stalled" if no call to Write occurs within 10 seconds after the last one. Unit test for this case is included.

@jhiemstrawisc jhiemstrawisc self-requested a review December 2, 2024 19:45
src/CurlUtil.cc Outdated
@@ -223,7 +236,7 @@ void CurlWorker::Run() {
op->Fail(op->getErrorCode(), op->getErrorMessage());
}
} catch (...) {
m_logger.Log(LogMask::Debug, "CurlWorker",
m_logger.Log(LogMask::Debug, "Run",
"Unable to setup the curl handle");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"setup" is usually used as a noun, I think we want "set up" here.

src/CurlUtil.cc Show resolved Hide resolved

// Return whether or not the request has timed out since the last
// call to send more data.
bool Timeout() const { return m_timeout; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of your other setters/getters include a verb and that makes them a little easier to read. Maybe this can be GetTimeout()?

test/s3_unit_tests.cc Show resolved Hide resolved
This provides some minimal test cases for testing the S3 write code.
Most significantly, switches the debug/dump of the libcurl interaction
to use the XRootD logging framework instead of printing right to stderr.
This refactors the request logic to allow requests to be continued
over multiple calls.  The calling thread will regain control when
the buffer has been completely consumed (even if the full operation
will require additional buffers).

Note this only works if the client provides the full file size.
We want to always unpause a given operation from the same thread that
is handling it.  If a separate thread can pick up the operation, there
is a race condition where both the original one and new one are operating
on the same `CURL *` handle at the same time, resulting in an observed
segfault.

This commit introduces a separate "unpause" queue for each curl worker;
this queue is notified by the parent when there is additional data
available.
If an entire part is a single libcurl operation, a client that starts
writing - but then leaves for an extended period of time - will leave
dangling references inside libcurl (eventually exhausting the number
of allowable transfers).

This commit adds a background thread for S3File that will go through
all pending uploads and check to see if they are still live; if not,
then it'll timeout the operation.
After the notification is done, the request may be deleted by the
owning S3File instance.  Do not call `Notify` from within the curl
result processing function as the request object needs to be alive
to release the curl handle.
If the client doesn't pre-declare the size of the object it will
write, we don't know the size of the last part of the upload.  Hence,
we must switch back to buffer mode in this case.
If the entire object is uploaded during a single `Write` call, then
skip the multipart upload and just do a single non-buffered upload.
The unit test refactoring left copy/pasted code.  This commit splits
the common piece into a single header, allowign `s3_tests.cc` with
the unit tests that utilize AWS S3 while `s3_unit_tests.cc` use the
minio instance started up by ctest.
Copy link
Member

@jhiemstrawisc jhiemstrawisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jhiemstrawisc jhiemstrawisc merged commit 87a041b into PelicanPlatform:main Dec 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants