Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2701: Clarifying Content-Type usage in the media repo #2701

Merged
merged 3 commits into from
Jan 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions proposals/2701-media-content-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# MSC2701: Media and the `Content-Type` relationship

The specification currently does not outline in great detail how `Content-Type` should be handled
with respect to media, particularly around uploads. The [`POST /upload`](https://spec.matrix.org/v1.9/client-server-api/#post_matrixmediav3upload)
and [`PUT /upload/:serverName/:mediaId`](https://spec.matrix.org/v1.9/client-server-api/#put_matrixmediav3uploadservernamemediaid)
endpoints mention that `Content-Type` is a header that can be set, but does not list it as required,
for example. Similarly, the `Content-Type` seems to entirely disappear when talking about
[downloads](https://spec.matrix.org/v1.9/client-server-api/#get_matrixmediav3downloadservernamemediaid).

This proposal clarifies how the `Content-Type` header is used on upload and download, in line with
current best practices among server implementations.

## Proposal

For `POST` and `PUT` `/upload`, the `Content-Type` header becomes explicitly *optional*, defaulting
to `application/octet-stream`. [Synapse](https://github.com/element-hq/synapse/blob/742bae3761b7b2c638975f853ab6161527629240/synapse/rest/media/upload_resource.py#L91)
and [MMR](https://github.com/turt2live/matrix-media-repo/blob/fdb434dfd8b7ef7d93401d7b86791610fed72cb6/api/r0/upload_sync.go#L33)
both implement this behaviour. Clients SHOULD always supply a `Content-Type` header though, as this
may change in future iterations of the endpoints.

**Note**: Synapse's behaviour was changed in October 2021 with [PR #11200](https://github.com/matrix-org/synapse/pull/11200).
Previously, Synapse required the header.

For `GET /download`, the server MUST return a `Content-Type` which is either exactly the same as the
original upload, or reasonably close. The bounds of "reasonable" are:
Comment on lines +24 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this MSC was merged a while ago, but would it have made sense to also mandate a X-Content-Type-Options nosniff header as part of the response?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kladki Could you open this question as a new issue on https://github.com/matrix-org/matrix-spec/issues? Then we can discuss and track it.


* Adding a `charset` to `text/*` content types.
* Detecting HTML and using `text/html` instead of `text/plain`.
* Using `application/octet-stream` when the server determines the content type is obviously wrong. For
example, an encrypted file being claimed as `image/png`.
* Returning `application/octet-stream` when the media has an unknown/unprovided `Content-Type`. For
example, being uploaded before the server tracked content types or when the remote server is non-compliantly
omitting the header entirely.

Actions not in the spirit of the above are not considered "reasonable". Existing server implementations
are encouraged to downgrade their behaviour to be in line with this guidance. [Synapse](https://github.com/element-hq/synapse/blob/742bae3761b7b2c638975f853ab6161527629240/synapse/media/_base.py#L154)
already does very minimal post-processing while [MMR](https://github.com/turt2live/matrix-media-repo/blob/fdb434dfd8b7ef7d93401d7b86791610fed72cb6/api/_routers/98-use-rcontext.go#L110-L139)
actively ignores the uploaded `Content-Type` (the incorrect thing to do under this MSC).

## Potential issues

Some media may have already been uploaded to a server without a content type. Such media items are
returned as `application/octet-stream` under this proposal.

## Alternatives

No significant alternatives.

## Security considerations

No relevant security considerations, though server authors are encouraged to consider the impact of
[MSC2702](https://github.com/matrix-org/matrix-spec-proposals/pull/2702) in their threat model.

## Unstable prefix

This MSC is backwards compatible with existing specification and requires no particular unstable
prefix. Servers are already able to implement this proposal's behaviour legally.

Additionally, cited in the proposal are examples of the behaviour being used in production today.