Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Client: upload() returns differently encoded ObjectURL, depending on the uploaded file size (e.g. %2F instead of /) #2933

Closed
mfn opened this issue Jun 3, 2024 · 3 comments
Assignees
Labels
bug This issue is a bug.

Comments

@mfn
Copy link

mfn commented Jun 3, 2024

Describe the bug

There's a pretty interesting surprise behaviour in getting back the canonical URL of the just-uploaded data.

If the file size exceeds \Aws\S3\ObjectUploader::DEFAULT_MULTIPART_THRESHOLD, then a different upload strategy is used, which ends up returning a differently encoded ObjectURL: for such files, e.g. / is encoded as %2F, possibly breaking certain assumption on what is supposed to be returned here.

Expected Behavior

The returned ObjectURL should be consistent, i.e. uploading the 17mb file should output:

$ php aws_s3_get_object_url_test.php 17mb.file
https://….s3.….amazonaws.com/test/path.file

Current Behavior

The ObjectURL is encoded differently, based on the file size.

Reproduction Steps

Test script:

<?php declare(strict_types=1);

require_once 'vendor/autoload.php';

use Aws\S3\S3Client;

$s3Client = new S3Client([
    'region' => '',
    'credentials' => [
        'key' => '',
        'secret' => '',
    ]
]);

$result = $s3Client->upload(
    '',
    'test/path.file',
    file_get_contents($argv[1])
);


echo $result->get('ObjectURL'), "\n";

Test files created:

$ dd if=/dev/zero of=15mb.file bs=1024 count=15000
15000+0 records in
15000+0 records out
15360000 bytes (15 MB, 15 MiB) copied, 1.1505 s, 13.4 MB/s

$ dd if=/dev/zero of=17mb.file bs=1024 count=17000
17000+0 records in
17000+0 records out
17408000 bytes (17 MB, 17 MiB) copied, 1.30513 s, 13.3 MB/s

$ ls -lh *.file
-rw-r--r-- 1 user user 15M Jun  3 13:17 15mb.file
-rw-r--r-- 1 user user 17M Jun  3 13:17 17mb.file

Uploading the 15mb file:

$ php aws_s3_get_object_url_test.php 15mb.file
https://….s3.….amazonaws.com/test/path.file

Uploading the 17mb file (notice the %2F in the path‼️):

$ php aws_s3_get_object_url_test.php 17mb.file
https://….s3.….amazonaws.com/test%2Fpath.file

Possible Solution

The returned value should be consistently encoded.

In \Aws\S3\PutObjectUrlMiddleware::__invoke, the ObjectURL is "constructed":

        return $next($command, $request)->then(
            function (ResultInterface $result) use ($command) {
                $name = $command->getName();
                switch ($name) {
                    case 'PutObject':
                    case 'CopyObject':
                        $result['ObjectURL'] = isset($result['@metadata']['effectiveUri'])
                            ? $result['@metadata']['effectiveUri']
                            : null;
                        break;
                    case 'CompleteMultipartUpload':
                        $result['ObjectURL'] = $result['Location'];
                        break;
                }
                return $result;
            }
        );

As can be seen, in case of CompleteMultipartUpload, which is used for data > 16MB, the resulting Location is used, which AFAICS, is from the header response directly.

For regular uploads, ObjectURL is based on $result['@metadata']['effectiveUri'], which itself is created in \Aws\WrappedHttpHandler::parseResponse as

        $metadata = [
            'statusCode'    => $status,
            'effectiveUri'  => (string) $request->getUri(),
            'headers'       => [],
            'transferStats' => [],
        ];

I.e. it's constructed from the internal $request object, instead of the returned server response.

Additional Information/Context

No response

SDK version used

3.308.7

Environment details (Version of PHP (php -v)? OS name and version, etc.)

PHP 8.3.7, Linux

@mfn mfn added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 3, 2024
@mfn mfn changed the title S3Client: upload() returns differently encoded ObjectURL, depending on the uploaded file size S3Client: upload() returns differently encoded ObjectURL, depending on the uploaded file size (e.g. %2F instead of /) Jun 3, 2024
@mfn
Copy link
Author

mfn commented Jun 3, 2024

Maybe not a "software bug", but sure an inconsistency well hidden. Maybe this is documented (or: should be)?

@yenfryherrerafeliz yenfryherrerafeliz self-assigned this Jun 6, 2024
@yenfryherrerafeliz yenfryherrerafeliz added queued This issues is on the AWS team's backlog and removed needs-triage This issue or PR still needs to be triaged. labels Jun 6, 2024
@yenfryherrerafeliz
Copy link
Contributor

Hi @mfn, we have merged the following PR that url decodes the Location when the upload is done through a multipart request.

Please feel free of opening a new issues for anything else that we may help with.

Thanks!

@yenfryherrerafeliz yenfryherrerafeliz removed the queued This issues is on the AWS team's backlog label Jul 3, 2024
Copy link

github-actions bot commented Jul 3, 2024

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants