Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a field to limit the size of uploading content #1701

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

git-hyagi
Copy link
Contributor

closes: #532

Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the goal to read blob data in smaller chunks and in case we are blocking the API for too long, we raise an error?

while True:
subchunk = chunk.read(2000000)
if not subchunk:
break
temp_file.write(subchunk)
size += len(subchunk)
for algorithm in Artifact.DIGEST_FIELDS:
hashers[algorithm].update(subchunk)

I see a threshold defined for config blobs: https://github.com/containers/image/blob/1dbd8fbbe51653e8a304122804431b07a1060d06/internal/image/oci.go#L63-L83. But, I could not find any thresholds for regular blobs. Can we investigate this?

@@ -938,6 +942,10 @@ def put(self, request, path, pk=None):
repository.pending_blobs.add(blob)
return BlobResponse(blob, path, 201, request)

def _verify_payload_size(self, distribution, chunk):
if distribution.max_payload_size and chunk.reader.length > distribution.max_payload_size:
raise PayloadTooLarge()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exception properly rendered to the container's API format?

ref:

def handle_exception(self, exc):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the exceptions are properly documented at https://docker-docs.uclv.cu/registry/spec/api/#blob-upload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahn... my bad, I didn't know about the spec for errors.
I also found this doc https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#error-codes that will be helpful.

@@ -751,6 +751,8 @@ class ContainerDistribution(Distribution, AutoAddObjPermsMixin):
null=True,
)

max_payload_size = models.IntegerField(null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we want to make this parameter configurable. I vote for having a constant for this.

@git-hyagi
Copy link
Contributor Author

Was the goal to read blob data in smaller chunks and in case we are blocking the API for too long, we raise an error?

Hum... when I was writing this PR I was thinking about a feature to allow a maximum size for blob layers and deny it if the limit is exceeded.
Did I misunderstand the goal of the issue?

I see a threshold defined for config blobs: https://github.com/containers/image/blob/1dbd8fbbe51653e8a304122804431b07a1060d06/internal/image/oci.go#L63-L83. But, I could not find any thresholds for regular blobs. Can we investigate this?

I also couldn't find a limit for regular blobs. From what I could understand, they limit only the size of buffered "resources" (mainifests, config blobs, signatures, etc.) to avoid OOM issues:
containers/image@61096ab
"Restrict the sizes of blobs which are copied into memory such as the
manifest, the config, signatures, etc. This will protect consumers of
c/image from rogue or hijacked registries that return too big blobs in
hope to cause an OOM DOS attack."

@lubosmj
Copy link
Member

lubosmj commented Jul 18, 2024

Okay, I think we can follow that path. Let's then focus on manifests, config blobs, and signatures exclusively.

@git-hyagi
Copy link
Contributor Author

After an internal discussion, here is the conclusion we got:
Considering that a proper installation of Pulp should have a reverse proxy and if an upload request is too big, the damage (denial of service) on the reverse proxy is already done, we should define the limits on the reverse proxy, not on Pulp.
Here is a draft of a nginx config to limit the manifest size:

location ~* /v2/.*/manifests/.*$ {
	client_max_body_size 1m;
}

@git-hyagi git-hyagi marked this pull request as draft July 23, 2024 14:11
@ipanova
Copy link
Member

ipanova commented Aug 14, 2024

This change makes sense to me, however besides manifest directive we need to also cap extensions/v2/(?P<path>.+)/signatures endpoint. This is an alternative way to upload skopeo produced atomic signatures.

Config blobs are left out but i don't see an easy way to set 4mb limit specifically on them given that they are in location ~* /v2/.*/blobs/.*$ I would be ok to proceed only with signatures and manifests.

One thing I do not understand, if the malicious user can create big manifest that would lead to big memory consumption, what prevents him to just create big blob? There is no limit set on the blobs.

@ipanova
Copy link
Member

ipanova commented Aug 14, 2024

Skimming through commit description containers/image@61096ab changes, they are mostly targeted on the client side, so the client during pull operation is not susceptible to DDoS attack.
This reads to me as registry itself does not have limits on upload. If that's the case, than the only place where we need to introduce a safeguard is sync operation. Because here we act as a client so would be good to read only first 4mb of the manifests/config blobs/signatures and then raise an error.

@ipanova
Copy link
Member

ipanova commented Aug 14, 2024

Hm, apparently the changes were applied also on some upload calls like https://github.com/containers/image/blob/61096ab72530cc9216b50d9fc3bee90b53704f68/docker/docker_image_dest.go#L630

@git-hyagi git-hyagi force-pushed the limit-payload-size branch 2 times, most recently from 2845307 to 60a6fbc Compare August 26, 2024 15:27
@git-hyagi git-hyagi marked this pull request as ready for review August 26, 2024 16:28
Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,2 @@
Added a limit to the size of manifests and signatures during sync tasks and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be specific and mention that this is a 4mb limit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also explain why this limit was added?

@@ -545,6 +549,14 @@ async def create_signatures(self, man_dc, signature_source):
"Error: {} {}".format(signature_url, exc.status, exc.message)
)

if not is_signature_size_valid(signature_download_result.path):
log.info(
"Signature size is not valid, the max allowed size is {}.".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about 'signature body size exceeded maximum allowed size of X'

SIGNATURE_PAYLOAD_MAX_SIZE
)
)
raise ManifestSignatureInvalid(digest=man_digest_reformatted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when used, this exception is raised to the container client

@@ -566,7 +578,12 @@ async def create_signatures(self, man_dc, signature_source):
# signature extensions endpoint does not like any unnecessary headers to be sent
await signatures_downloader.run(extra_data={"headers": {}})
with open(signatures_downloader.path) as signatures_fd:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have the signatures_downloader.path why not calling 'is_valid_size' before open/reading it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're missing check for manifests

with open(response.path, "rb") as content_file:


def _is_manifest_size_valid(manifests):
for manifest in manifests:
if manifest.get("size") > MANIFEST_PAYLOAD_MAX_SIZE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would not trust what's written on the manifest list json file other then actually checking the manifest file size

@@ -38,3 +38,25 @@ location /token/ {
proxy_redirect off;
proxy_pass http://pulp-api;
}

location ~* /v2/.*/manifests/.*$ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about apache snippet?

@ipanova
Copy link
Member

ipanova commented Aug 27, 2024

for the sync workflow, sometimes we skip tag or entire manifest list when the signature is missing or one image manifest from the list is unsigned. do we want to also skip those objects that are exceeding the size limit instead of failing whole sync?

@git-hyagi git-hyagi marked this pull request as draft August 27, 2024 18:00
Copy link
Contributor

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the code base, but @git-hyagi asked about the Apache config. There we couldn't find a config that applied LimitRequestBody to proxied requests.

My suggestion is to enforce the checks in the application itself. Having the proxy (nginx, apache or whatever) can be considered an optimization, but Pulp remains responsible for enforcing it.

I have added some inline observations that I think should make it a bit more efficient, but that is all based on familiarity with Django & django-rest-framework that was mostly built up around 10 years ago. I read some relevant information, but it's possible I missed things.


total_size = 0
buffer = b""
async for chunk in response.content.iter_chunked(MEGABYTE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the codebase, but would it make sense to send HTTP 413 Request Entity Too Large before you start reading by inspecting the Content-Length header (which I already saw was done elsewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum... it seems like a good idea, but my concern is that it is easy to modify the header and bypass the verification, so I'm not sure if we can trust only in the Content-Length value, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to be safe, you can apply both. First check the header isn't bigger than the maximum size and then make you read at most the Content-Length size. Though I think Content-Length is an optional header so if it's not present, you should keep the current code anyway.

I'd recommend reading aiohttp's documentation on what you can (and can't) assume.

@@ -1426,7 +1425,7 @@ def put(self, request, path, pk):
except models.Manifest.DoesNotExist:
raise ManifestNotFound(reference=pk)

signature_payload = request.META["wsgi.input"].read(SIGNATURE_PAYLOAD_MAX_SIZE)
signature_payload = request.META["wsgi.input"].read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It surprises me that this uses such a low level implementation. If I read the code right, request is a rest_framework.request.Request which has a method request.stream to get it as io.BytesIO. Isn't that a much more portable interface? Then I think you can actually use signature_dict = json.load(request.stream) to save copying the full response into memory an additional time.

And if you want to guard it, you can inspect the length explicitly prior to loading and send HTTP 413. See _load_stream() for how they read the length:
https://github.com/encode/django-rest-framework/blob/f593f5752c45e06147231bbfd74c02384791e074/rest_framework/request.py#L297-L314

My primary reason to avoid wsgi.input (and this isn't the only occurrence) is that it blocks you from moving to ASGI and eventually more performant application servers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@git-hyagi, can you also address this comment, please? It is worth the effort.

@git-hyagi git-hyagi force-pushed the limit-payload-size branch 2 times, most recently from 904a3a4 to cedb1d1 Compare September 11, 2024 21:32
total_size += len(chunk)
if max_body_size and total_size > max_body_size:
await self.finalize()
raise InvalidRequest(resource_body_size_exceeded_msg(content_type, max_body_size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this result in a HTTP 413 error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about the impact of this change. Does it mean that pulp-to-pulp syncing (or any syncing in general) will not work when someone already uploaded/synced a larger content unit? I am still a bit hesitant to include this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this result in a HTTP 413 error?

No 😢 it is following the oci distribution spec https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#error-codes (which does not have a specific error for request entity too large)

I wonder about the impact of this change. Does it mean that pulp-to-pulp syncing (or any syncing in general) will not work when someone already uploaded/synced a larger content unit? I am still a bit hesitant to include this change.

I modified the PR to allow defining these limits in the settings.py (instead of being constants). By default, if these values are not defined, no limit will be enforced, ensuring that the functionality remains the same as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also see the API doc here: https://distribution.github.io/distribution/spec/api/. How is this InvalidRequest rendered on the client side when using podman?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum... this is a sample output that I received while trying to push a manifest with an invalid size:

Writing manifest to image destination
WARN[0010] Failed, retrying in 1s ... (1/3). Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes. 
...
WARN[0014] Failed, retrying in 1s ... (2/3). Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes. 
...
WARN[0019] Failed, retrying in 1s ... (3/3). Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes. 
...
Error: writing manifest: uploading manifest latest to pulp-container:5001/test: unknown: Manifest body size exceeded the maximum allowed size of 10 bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what dockerhub returns in case I am trying to upload a 4MB+ large file.

tr -dc 'a-zA-Z' < /dev/urandom | head -c 5194304 > output.txt
cat output.txt | http --auth-type=jwt --auth=$TOKEN_AUTH PUT https://registry-1.docker.io/v2/lmjachky/pulp/manifests/new
HTTP/1.1 400 Bad Request
content-length: 110
content-type: application/json
date: Thu, 19 Sep 2024 09:25:15 GMT
docker-distribution-api-version: registry/2.0
docker-ratelimit-source: 2a02:8308:b093:8e00:62b:28c5:e440:e0b9
strict-transport-security: max-age=31536000

{
    "errors": [
        {
            "code": "MANIFEST_INVALID",
            "detail": "http: request body too large",
            "message": "manifest invalid"
        }
    ]
}

We should be returning a formatted error response. Similarly to our existing MANIFEST_INVALID:

.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 😢 it is following the oci distribution spec https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#error-codes (which does not have a specific error for request entity too large)

It says 4xx code:

A 4XX response code from the registry MAY return a body in any format.

So I don't think it forbids using code 413. Did you intend to link to https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#determining-support which does list error codes in the Failure column (and supports what you said)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says 4xx code:

A 4XX response code from the registry MAY return a body in any format.

So I don't think it forbids using code 413.

hum.... this is a good point.
It seems like this doc is a little bit confusing. I thought that 413 was not included in the spec because the "Error Codes" section states:

The code field MUST be one of the following:

and there isn't an entity too large error in the table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth raising an issue on the spec to get an opinion?

@git-hyagi git-hyagi marked this pull request as ready for review September 13, 2024 13:59
@git-hyagi git-hyagi force-pushed the limit-payload-size branch 3 times, most recently from 19732ef to 884f7df Compare September 18, 2024 18:30
@lubosmj
Copy link
Member

lubosmj commented Sep 19, 2024

Creating a manifest larger than 4MB is quite challenging. Currently, registries have this 4MB limit adopted for uploading. However, it is possible that we may encounter hitting the limit in the future frequently. Given this, should we consider removing the restriction if there is a chance it might be lifted later on? Alternatively, would it be better to maintain the current limit and allow administrators to fiddle with this constraint? Are the administrators ever going to touch the proposed setting? So many questions... I am sorry but I am leaning towards closing the PR and issue completely. Do you mind bringing in more voices?

@git-hyagi
Copy link
Contributor Author

I agree that we still have a lot of questions and, with the OCI spec supporting a wide range of different "artifacts", it is possible that we may encounter some problems to fine tune a good value for these constraints and adding this setting (allowing admins to change the limit value) could only bring more code to be maintained for something that no one ever would need/change.
So I am ok with closing it.

Comment on lines 1 to 6
Added support to the `MANIFEST_PAYLOAD_MAX_SIZE` and `SIGNATURE_PAYLOAD_MAX_SIZE` settings to define
limits (for the size of Manifests and Signatures) to protect against OOM DoS attacks during synchronization tasks
and image uploads.
Additionally, the Nginx snippet has been updated to enforce the limit for these endpoints.
Modified the internal logic of Blob uploads to read the receiving layers in chunks,
thereby reducing the memory footprint of the process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refine this changelog to make it shorter, simpler, and more user-friendly. For instance, we can omit details like "Modified the internal logic of Blob uploads to read the receiving layers in chunks, thereby reducing the memory footprint of the process".

Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion we had, you should now update the code to start returning 413 in case the processed entity is too large (https://github.com/pulp/pulp_container/pull/1701/files#r1766498485).

More observations:

  1. The settings should be defined in the settings.py file. We should not run settings.get("MANIFEST_PAYLOAD_MAX_SIZE", None) in "heavy-duty" situations.
  2. Once again, please, do not forget to use request.stream in other places as well.
  3. It is up to you what you decide to do with web-server snippets. I think we should drop any changes made to the nginx.conf file and keep the apache config untouched.

@ipanova
Copy link
Member

ipanova commented Sep 26, 2024

3. It is up to you what you decide to do with web-server snippets. I think we should drop any changes made to the `nginx.conf` file and keep the apache config untouched.

+1 to revert changes made to the snippets.

@lubosmj
Copy link
Member

lubosmj commented Oct 4, 2024

People are storing PNGs inside configs' labels. 🙄

Screenshot from 2024-10-04 09-56-55

I bet manifests' annotations will be next.

Adds new settings to limit the size of manifests and signatures as a safeguard
to avoid DDoS attack during sync and upload operations.
Modify the blob upload to read the layers in chunks.

closes: pulp#532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

As a user, there is a limit to the size of content that is accepted to be retrieved via live api
4 participants