-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Generated Presigned URLs with CRC32C checksums results in 400 from S3 #3216
Comments
Thanks for reaching out. In your upload_part request have you tried setting the |
Hi @tim-finnigan, thanks for getting back in touch so quickly. We did try the ChecksumAlgorithm set to CRC32C approach, which required then setting the Thus our workflow is this, CLI calls Initiate endpoint to initiate an upload. On success the CLI can then call a generate pre-signed URLs endpoint which should take the parts and the checksums and return the part numbers with the pre-signed URLs for those parts (and this is the call which is using So with the description above out of the way, I will note that we do already have We have run this through successfully by removing the need for the checksums and it all works, so worst case we could fall back to the historic way of doing this using ContentMD5 but we were hoping to use the same approach that we're using for smaller unitary uploads which uses |
I've done some testing today and here's a table of what I get back from the put to s3. So I've tested every combination of the
*1: The request signature we calculated does not match the signature you provided. Check your key and signing method. |
Hi @richardnpaul, thanks for following up here. Going back to your original snippet, you are using CRC32 and not CRC32C ( For using CRC32 I tested this and it works for me:import boto3
import requests
from zlib import crc32
import base64
import pathlib
bucket_name = 'test-bucket'
object_key = 'test'
s3_client = boto3.client('s3')
response = s3_client.create_multipart_upload(
Bucket=bucket_name,
Key=object_key
)
upload_id = response['UploadId']
part_number = 1
chunk_size = 10 * 1024 * 1024 # 10 MB
testfile = pathlib.Path('./11-mb-file.txt').expanduser()
parts = []
with open(testfile, 'rb') as f:
while True:
content = f.read(chunk_size)
if not content:
break
checksum_crc32 = base64.b64encode(crc32(content).to_bytes(4, byteorder='big')).decode('utf-8')
presigned_url = s3_client.generate_presigned_url(
'upload_part',
Params={
'Bucket': bucket_name,
'Key': object_key,
'PartNumber': part_number,
'UploadId': upload_id,
'ChecksumCRC32': checksum_crc32,
'ChecksumAlgorithm': 'CRC32',
},
ExpiresIn=3600
)
response = requests.put(presigned_url, data=content)
if response.status_code == 200:
print(f"Part {part_number} uploaded successfully!")
parts.append({
'PartNumber': part_number,
'ETag': response.headers['ETag']
})
else:
print(f"Failed to upload part {part_number}, status: {response.status_code}, response: {response.text}")
break
part_number += 1
if len(parts) == part_number - 1:
s3_client.complete_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id,
MultipartUpload={
'Parts': parts
}
)
print("Multipart upload completed successfully!")
else:
s3_client.abort_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id
)
print("Multipart upload failed and has been aborted.") |
Hi Tim, Okay, so yes, as noted in my initial notes yes, we use the crc32c package, but we're just trying to test that the checksums work so it doesn't matter which one we use apart from it should be valid. I've taken your code and made a couple of changes, I've added I had the bucket deployed in So, at this point I'm not sure if this is a botocore/boto3 issue or an AWS infrastructure issue 🤔 (...or something else) |
Just some additional information, adding explicit v4 signature_version via
|
Thanks for following up and for your patience here. The |
Does the script work for you if you use signature_version='v4'? From the first link:
The second link seems to be for those people not using the SDK, we're using botocore/boto3 here. I using an administrator role that works for generating the url and uploading to the url so long as checksums are not used in the generation. The output for the signature that I see is like this:
What I note from this is that the checksum in the signature is blank, as is the algorithm. Current Script#!/usr/bin/env python3
import boto3
import requests
from crc32c import crc32c
import base64
import pathlib
from botocore.config import Config
# boto3.set_stream_logger('')
REGION = 'eu-west-2'
session = boto3.Session(region_name=REGION)
my_config = Config(
region_name=REGION,
retries={
'max_attempts': 10,
'mode': 'standard'
},
signature_version='v4',
)
bucket_name = f'presigned-urls-test-{REGION}'
object_key = 'test-upload'
s3_client = session.client(
's3',
config=my_config,
)
response = s3_client.create_multipart_upload(
Bucket=bucket_name,
Key=object_key
)
upload_id = response['UploadId']
part_number = 1
chunk_size = 10 * 1024 * 1024 # 10 MB
testfile = pathlib.Path('~/Downloads/bbb_sunflower_2160p_30fps_stereo_abl.mp4').expanduser()
parts = []
with open(testfile, 'rb') as f:
while True:
content = f.read(chunk_size)
if not content:
break
checksum_crc32c = base64.b64encode(crc32c(content).to_bytes(4, byteorder='big')).decode('utf-8')
presigned_url = s3_client.generate_presigned_url(
'upload_part',
Params={
'Bucket': bucket_name,
'ChecksumAlgorithm': 'CRC32C',
'ChecksumCRC32C': checksum_crc32c,
'Key': object_key,
'PartNumber': part_number,
'UploadId': upload_id,
},
ExpiresIn=3600
)
response = requests.put(
presigned_url,
data=content,
# headers={
# 'x-amz-checksum-crc32c': checksum_crc32c,
# 'x-amz-sdk-checksum-algorithm': 'CRC32C'
# }
)
if response.status_code == 200:
print(f"Part {part_number} uploaded successfully!")
parts.append({
'PartNumber': part_number,
'ETag': response.headers['ETag']
})
else:
print(f"Failed to upload part {part_number}, status: {response.status_code}, response: {response.text}")
break
part_number += 1
if len(parts) == part_number - 1:
s3_client.complete_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id,
MultipartUpload={
'Parts': parts
}
)
print("Multipart upload completed successfully!")
else:
s3_client.abort_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id
)
print("Multipart upload failed and has been aborted.") |
Okay, I got it sorted out. I was pretty sure that it came down to the missing headers on the The initial The final script#!/usr/bin/env python3
import base64
import pathlib
import boto3
import requests
from crc32c import crc32c
from botocore.config import Config
# boto3.set_stream_logger('')
testfile = pathlib.Path('~/Downloads/bbb_sunflower_2160p_30fps_stereo_abl.mp4').expanduser()
REGION = 'eu-west-2'
session = boto3.Session(region_name=REGION)
my_config = Config(
region_name=REGION,
retries={
'max_attempts': 10,
'mode': 'standard'
},
signature_version='v4',
)
bucket_name = f'presigned-urls-test-{REGION}'
object_key = 'test-upload'
s3_client = session.client(
's3',
config=my_config,
)
# def resolve_endpoint_ruleset(method):
# def wrapper(operation_model, params, context, ignore_signing_region=False):
# (endpoint_url, additional_headers, properties) = method(
# operation_model, params, context, ignore_signing_region
# ) # Call the original method
# if "ContentType" not in params:
# additional_headers = {
# "Content-Type": "binary/octet-stream",
# **additional_headers,
# }
# return (endpoint_url, additional_headers, properties)
# return wrapper
# s3_client._resolve_endpoint_ruleset = resolve_endpoint_ruleset(
# s3_client._resolve_endpoint_ruleset
# )
upload_id_request = s3_client.create_multipart_upload(
Bucket=bucket_name,
Key=object_key,
ChecksumAlgorithm='CRC32C',
)
upload_id = upload_id_request['UploadId']
part_number = 1
chunk_size = 10 * 1024 * 1024 # 10 MB
parts = []
with open(testfile, 'rb') as f:
while True:
content = f.read(chunk_size)
if not content:
break
checksum_crc32c = base64.b64encode(crc32c(content).to_bytes(4, byteorder='big')).decode('utf-8')
presigned_url = s3_client.generate_presigned_url(
ClientMethod='upload_part',
Params={
'Bucket': bucket_name,
# 'ContentLength': len(content),
'ChecksumAlgorithm': 'CRC32C',
'ChecksumCRC32C': checksum_crc32c,
'Key': object_key,
'PartNumber': part_number,
'UploadId': upload_id,
},
ExpiresIn=3600
)
response = requests.put(
presigned_url,
data=content,
headers={
# 'content-type': 'binary/octet-stream',
'x-amz-sdk-checksum-algorithm': 'CRC32C',
'x-amz-checksum-crc32c': checksum_crc32c,
}
)
if response.status_code == 200:
print(f"Part {part_number} uploaded successfully!")
parts.append({
'PartNumber': part_number,
'ETag': response.headers['ETag'],
'ChecksumCRC32C': checksum_crc32c,
})
else:
print(f"Failed to upload part {part_number}, status: {response.status_code}, response: {response.text}")
break
part_number += 1
if len(parts) == part_number - 1:
s3_client.complete_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id,
MultipartUpload={
'Parts': parts
},
)
print("Multipart upload completed successfully!")
else:
s3_client.abort_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=upload_id
)
print("Multipart upload failed and has been aborted.") |
This issue is now closed. Comments on closed issues are hard for our team to see. |
Describe the bug
When trying to upload a large object to S3 using the multipart upload process with presigned urls with crc32c checksums the response from S3 is a 400 error with an error message.
Expected Behavior
I would expect that the provided checksum headers would be expected and so the type would be the checksum type not a type of null which would then mean that the upload to S3 would succeed.
Current Behavior
The following type of error message is returned instead of success:
Reproduction Steps
Change all the AWS credentials for valid values for your testing and provide a file on the
testfile
assignment line (I was using a path in~/Downloads/
)Possible Solution
I feel like the checksum header is not being passed to be included in the signing process but to be honest I got a bit lost in the library's code and couldn't make head nor tail of it in the end.
Additional Information/Context
Docs page for generating the urls:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/generate_presigned_url.html
Docs page with acceptable params to be passed to
generate_presigned_url
when usingupload_part
as theClientMethod
:https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_part.html
SDK version used
1.34.138
Environment details (OS name and version, etc.)
Ubuntu 22.04.4, Python 3.10.12
The text was updated successfully, but these errors were encountered: