Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 PUT http with url-connection-client contentEncoding is always "aws-chunked" #4746

Closed
youngm opened this issue Dec 5, 2023 · 11 comments · Fixed by #4897
Closed

S3 PUT http with url-connection-client contentEncoding is always "aws-chunked" #4746

youngm opened this issue Dec 5, 2023 · 11 comments · Fixed by #4897
Assignees
Labels
bug This issue is a bug. closed-for-staleness p1 This is a high priority issue

Comments

@youngm
Copy link

youngm commented Dec 5, 2023

Describe the bug

When putting an S3Object connecting to S3 via http and url-connection-client, the contentEncoding for that object is always "aws-chunked".

Expected Behavior

The contentEncoding should be whatever is set in the PutObjectRequest.

Current Behavior

contentEncoding is always aws-chunked when connecting to s3 via http and url-connection-client.

Reproduction Steps

This test creates an object using http s3 and sets a contentEncoding. Note that the contentEncoding is changed to "aws-chunked" by the sdk.

S3Test.java

public class S3Test {

  @Test
  public void test() throws Exception {
    try (var s3Client =
        S3Client.builder()
            .endpointOverride(URI.create("http://s3.amazonaws.com"))
            .build()) {
    var putRequest =
        PutObjectRequest.builder()
            .bucket("bucket")
            .key("file")
            .contentEncoding("bogus")
            .build();

    s3Client.putObject(putRequest, RequestBody.empty());
    try (var response =
        s3Client.getObject(GetObjectRequest.builder().bucket("bucket").key("file").build())) {
    assertThat(response.response().contentEncoding()).isEqualTo("bogus");
    }
    }
  }
}

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>3.1.5</version>
	</parent>
	<modelVersion>4.0.0</modelVersion>
	<groupId>s3test</groupId>
	<artifactId>s3test</artifactId>
	<version>1.0</version>
	<properties>
		<java.version>17</java.version>
	</properties>
	<dependencyManagement>
		<dependencies>
			<dependency>
				<groupId>software.amazon.awssdk</groupId>
				<artifactId>bom</artifactId>
				<version>2.21.36</version>
				<type>pom</type>
				<scope>import</scope>
			</dependency>
		</dependencies>
	</dependencyManagement>
	<dependencies>
		<dependency>
			<groupId>software.amazon.awssdk</groupId>
			<artifactId>s3</artifactId>
			<exclusions>
				<exclusion>
					<groupId>software.amazon.awssdk</groupId>
					<artifactId>netty-nio-client</artifactId>
				</exclusion>
				<exclusion>
					<groupId>software.amazon.awssdk</groupId>
					<artifactId>apache-client</artifactId>
				</exclusion>
			</exclusions>
		</dependency>
		<dependency>
			<groupId>software.amazon.awssdk</groupId>
			<artifactId>url-connection-client</artifactId>
		</dependency>
		<dependency>
			<groupId>org.junit.jupiter</groupId>
			<artifactId>junit-jupiter</artifactId>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.assertj</groupId>
			<artifactId>assertj-core</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>
</project>

Possible Solution

PutObjectRequest should behave the same no matter the client used or http vs https. I don't know what exactly is causing the wrong contentEncoding to be sent in this situation.

Additional Information/Context

This is a re-opening of #4725.

AWS Java SDK version used

2.21.16+

JDK version used

17

Operating System and version

Windows 11

@youngm youngm added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 5, 2023
@youngm
Copy link
Author

youngm commented Dec 6, 2023

I updated the test case to use AWS s3 via http and not test containers/localstack.

@debora-ito
Copy link
Member

debora-ito commented Dec 9, 2023

Hi @youngm, I still can't reproduce the issue, even with UrlConnectionHttpClient.

Some steps to try to narrow down the cause:

  1. Are you using a proxy?
    -- Proxies might modify the request headers after they were sent by the SDK and before they hit the aws endpoint
  2. Check the verbose wirelogs -
    -- instructions to enable can be found in the Developer Guide
    -- UrlConnection wirelogs spacing is a little odd, but you can still see the content-encoding value:
FINE: sun.net.www.MessageHeader@35c9a23115 pairs: 
    {PUT /duck.txt HTTP/1.1: null}
    {amz-sdk-invocation-id: 58c09078-4a64-dcef-e04e-54176eb69c82}
    {amz-sdk-request: attempt=1; max=4}
    {Authorization: AWS4-HMAC-SHA256 Credential=xxx/20231208/us-east-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=xxx}
    {Content-Encoding: bogus}
    {Content-Type: application/octet-stream}
    {Expect: 100-continue}
    {User-Agent: aws-sdk-java/2.21.39 OpenJDK_64-Bit_Server_VM/11.0.21+9-LTS Java/11.0.21 vendor/Amazon.com_Inc. io/sync http/UrlConnection cfg/retry-mode/legacy ft/s3-transfer}
    {x-amz-content-sha256: UNSIGNED-PAYLOAD}
    {X-Amz-Date: 20231208T233805Z}
    {Accept: */*}
    {Host: my-bucket.s3.amazonaws.com}
    {Connection: keep-alive}
    {Content-Length: 14}

See {Content-Encoding: bogus} as one of the PutRequest headers. Checking the GetObject body in the logs is a little tricky, but you can see with a visual check if Content-Encoding: bogus is returned.

  1. Check the content-encoding in s3 after the upload -
    -- this is to identify if the problem is in the PutObject or GetObject
    -- you can check in the aws console or use the cli - here's the aws cli command:
$ aws s3api head-object --bucket my-bucket --key duck.txt
{
    "AcceptRanges": "bytes",
    "LastModified": "2023-12-09T00:14:42+00:00",
    "ContentLength": 14,
    "ContentEncoding": "bogus",
    "ContentType": "application/octet-stream",
    "ServerSideEncryption": "AES256",
    "Metadata": {}
}

If you share the verbose wirelogs I can take a look.

@debora-ito debora-ito self-assigned this Dec 9, 2023
@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Dec 9, 2023
@youngm
Copy link
Author

youngm commented Dec 9, 2023

@debora-ito did you connect to s3 using http? Not https?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Dec 9, 2023
@debora-ito
Copy link
Member

@youngm I missed that part 🤦🏻‍♀️ thank you for pointing out.

I see the "aws-chunked" in the content-encoding header now, when using HTTP endpoint and UrlConnectionHttpClient. We'll investigate.

Can you use ApacheHttpClient in the meantime?

@youngm
Copy link
Author

youngm commented Dec 9, 2023

That's great @debora-ito 😅. I have instead switched to https when using localstack. But http is more conventional when using localstack so I hope you are able to get to the bottom of it. Thanks!

@debora-ito debora-ito added p1 This is a high priority issue and removed p2 This is a standard priority issue labels Dec 11, 2023
akheron added a commit to espoon-voltti/evaka that referenced this issue Jan 2, 2024
This works around an issue in aws-sdk-java-v2: aws/aws-sdk-java-v2#4746
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

@joviegas joviegas reopened this Feb 23, 2024
@debora-ito
Copy link
Member

@youngm we reverted the change (#4960) because of a unintentional impact. We're investigating the root cause, will update here when we know more.

@aza547
Copy link

aza547 commented Mar 26, 2024

Just to add another voice, we're also experiencing this on version 2.24.12. Let me know if I can provide any details that would help.

@debora-ito
Copy link
Member

debora-ito commented Mar 27, 2024

The fix was re-introduced via #5043, and released as part of version 2.25.18.

@debora-ito debora-ito added the closing-soon This issue will close in 4 days unless further comments are made. label Mar 27, 2024
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will close in 4 days unless further comments are made. labels Mar 31, 2024
@ngudbhav
Copy link

ngudbhav commented Aug 8, 2024

@debora-ito, We are seeing this issue again in 2.26.25

Metadata of a sample gzipped file

{
    "AcceptRanges": "bytes",
    "LastModified": "Fri, 02 Aug 2024 12:59:49 GMT",
    "ContentLength": 3288,
    "ETag": "\"935f6b1d2f8b15a5a2582a3396fb16cc\"",
    "ContentEncoding": "gzip,aws-chunked",
    "ContentType": "text/csv",
    "ServerSideEncryption": "AES256",
    "Metadata": {}
}

@debora-ito
Copy link
Member

@ngudbhav I think this: "ContentEncoding": "gzip,aws-chunked",
indicates the fix is working, before the fix the code didn't handle headers with multiple values.

If you are experiencing issues, please open a new Github issue and provide us a repro code if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closed-for-staleness p1 This is a high priority issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants