-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM when using S3TransferManager.uploadDirectory() with 2 million files #5023
Comments
Hi @jensvogt, we released a fix in |
@zoewangg Hi, will try the latest version. |
@zoewangg |
Nice, thanks for verifying it. Closing the issue |
This issue is now closed. Comments on closed issues are hard for our team to see. |
Describe the bug
When uploading a directory to S3 using S3TransferManager.uploadDirectory() that contains hundreds of thousands of files, then it fails with OutOfMemoryError.
Expected Behavior
S3TransferManager can work fine no matter how many files are in the directory tree.
Current Behavior
xception in thread "Thread-11" Exception in thread "Thread-24" Exception in thread "Thread-65" Exception in thread "sdk-async-response-0-24" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Reproduction Steps
package org.example;
import org.apache.commons.lang3.RandomStringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.internal.crt.S3CrtAsyncClient;
import software.amazon.awssdk.services.s3.model.CreateBucketRequest;
import software.amazon.awssdk.services.s3.model.HeadBucketRequest;
import software.amazon.awssdk.services.s3.model.NoSuchBucketException;
import software.amazon.awssdk.transfer.s3.S3TransferManager;
import software.amazon.awssdk.transfer.s3.model.CompletedDirectoryUpload;
import software.amazon.awssdk.transfer.s3.model.UploadDirectoryRequest;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.net.URI;
import java.nio.file.Files;
import java.nio.file.Path;
public class Main {
}
Possible Solution
Do not collect a List of CompletableFutures, because this will result in a huge in-memory collection, in case the directory tree has millions of files. In our case we have a flat directory tree (only one directory with 2mio small XML files).
Additional Information/Context
Actually, this is not a memory leak, its just a bad design. In v1 you collection the directory tree in a in-memory ArrayList. This has been fixed, as you use a directory streams in v2. Nevertheless, now a huge in-memory list of CompleteableFutures is created.
AWS Java SDK version used
2.25.10
JDK version used
openjdk version "17.0.10" 2024-01-16 OpenJDK Runtime Environment (build 17.0.10+7-Debian-1deb12u1) OpenJDK 64-Bit Server VM (build 17.0.10+7-Debian-1deb12u1, mixed mode, sharing)
Operating System and version
Debian12 (bookworm)
The text was updated successfully, but these errors were encountered: