Skip to content

Conversation

L-Applin
Copy link
Contributor

@L-Applin L-Applin commented Sep 16, 2025

Implement parallel download for multipart GetObject in s3 Async Client and Transfer Manager.

Modifications

  • Add two new classes (Publisher/Subscriber) to orchestrate the non-linear multipart download: NonLinearMultipartDownloaderSubscriber and FileAsyncResponseTransformerPublisher. Note for reviewer: This is the core of the PR new functionality and review should probably start with those two classes.
  • Add support in Transfer-Manager module for Transfer Progress Updater.
    • Note for reviewer: The AsyncResponseTransformer published by FileAsyncResponseTransformerPublisher needs to wrapped to publish progress to the progress updater. This is done in GenericS3TransferManager and TransferProgressUpdater
  • New public API, as discussed during design review
    • supportNonSerial on SplitResult
    • ParallelConfiguration new config class in MultipartConfiguration for the maxInFlightParts config
  • New internal API
    • FileAsyncTransformer exposes getters for position, path and FileTransformerConfiguration

Testing

  • Added unit test
  • Added integration test
  • Manual tests using large objects

L-Applin added 28 commits July 22, 2025 18:15
…in the onResponse callback. Keep track of all inflight requests.
Copy link
Contributor

@zoewangg zoewangg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still going through the PR.


@Override
public void onResponse(T response) {
Optional<String> contentRangeList = response.sdkHttpResponse().firstMatchingHeader("x-amz-content-range");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are depending on header sent by service will this cause issue for Third party tools like minio or gcp ? Since we are erring out if the header is not present it would be good to know its impact when used with Third party s3 like minio or gcp

- renamed EmittingSubscription, mark it ThreadSafe
- Added comments
- some other renaming

@Override
public void onResponse(T response) {
Optional<String> contentRangeList = response.sdkHttpResponse().firstMatchingHeader("x-amz-content-range");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic specific to S3? Could we possibly apply it to a generic streaming service? In general my guess is no, because other streaming APIs don't necessarily support content-range (at least my cursory inspection most I looked at do not support requests or responses with content-range).

Given that - should we keep this class in S3 instead of core?

@dagnir dagnir self-requested a review September 23, 2025 17:55
Comment on lines +127 to +129
CompletableFuture<T> delegateFuture = delegate.prepare();
CompletableFutureUtils.forwardResultTo(delegateFuture, future);
CompletableFutureUtils.forwardExceptionTo(future, delegateFuture);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be inverted so that IndividualTransformer does the exception/result forwarding within prepare. That would clean things up a little

Copy link
Contributor Author

@L-Applin L-Applin Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't call delegate.prepare() in the actual prepare() callback of this IndividualFileTransformer because we need to have received the SdkResponse to create the delegate, to know at which offset to write to (after reading the content range header). This is why delegate.prepare() is called here in onResponse

…oposal. Renamed to ParallelMultipartDownloaderSubscriber as per PR comment

- Other PR comment: Removed unused builder parameter for EmittingSubscription
…n/large-object-merge

# Conflicts:
#	services/s3/src/test/java/software/amazon/awssdk/services/s3/internal/multipart/S3MultipartFileDownloadWiremockTest.java
@L-Applin L-Applin changed the title Olapplin/large object merge Parallel split for multipart GetObject File Download Sep 25, 2025
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
66.2% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants