Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for setting UA for URL previews #17968

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/17968.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add support for setting User-Agent for URL previewing.
Copy link
Contributor

@MadLittleMods MadLittleMods Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option itself seems relatively straightforward. Our sane default feels like it should be good enough. I'm going to add this to the To-Discuss board to talk with the team on Monday whether we want to introduce this.

The workaround use case seems kinda meh but I appreciate the context. (adding yet another option for a problem that will go away/change in the future)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with the team and we're not keen on pretending to be someone else. If someone doesn't want to serve content to Synapse, that's their prerogative.

This might be a XY problem. The goal is make YouTube URL previews work. Do they have some API/flow that they prefer people to use? Should we be using OpenGraph/oEmbed instead of scraping, etc? We'd rather have PR's for that instead of this workaround.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

he goal is make YouTube URL previews work. Do they have some API/flow that they prefer people to use? Should we be using OpenGraph/oEmbed instead of scraping, etc?

See #17462.

11 changes: 11 additions & 0 deletions docs/usage/configuration/config_documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -774,6 +774,17 @@ Example configuration:
```yaml
max_event_delay_duration: 24h
```
---
### `url_preview_user_agent`

Setting the User-Agent for URL previews.

Defaults to `Synapse (bot; +https://github.com/matrix-org/synapse)`.

Example configuration:
```yaml
url_preview_user_agent: "Hello Matrix"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put this next to the other url_preview_xxx options


## Homeserver blocking
Useful options for Synapse admins.
Expand Down
7 changes: 7 additions & 0 deletions synapse/config/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,13 @@ def read_config(self, config: JsonDict, **kwargs: Any) -> None:
else:
self.max_event_delay_ms = None

self.url_preview_user_agent: str = (
config.get("url_preview_user_agent")
or "Synapse (bot; +https://github.com/matrix-org/synapse)"
Comment on lines +792 to +793
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config.get("url_preview_user_agent")
or "Synapse (bot; +https://github.com/matrix-org/synapse)"
config.get("url_preview_user_agent", "Synapse (bot; +https://github.com/matrix-org/synapse)")

)
if len(self.url_preview_user_agent.strip()) == 0:
raise ConfigError("The 'url_preview_user_agent' must be a valid User-Agent")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ConfigError("The 'url_preview_user_agent' must be a valid User-Agent")
raise ConfigError(
"Must be a valid User-Agent string",
("url_preview_user_agent",),
)


Comment on lines +791 to +797
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this processing alongside the other url_preview_config in synapse/config/repository.py

def has_tls_listener(self) -> bool:
return any(listener.is_tls() for listener in self.listeners)

Expand Down
5 changes: 2 additions & 3 deletions synapse/media/url_previewer.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ def __init__(
):
self.clock = hs.get_clock()
self.filepaths = media_repo.filepaths
self.hs = hs
self.max_spider_size = hs.config.media.max_spider_size
self.server_name = hs.hostname
self.store = hs.get_datastores().main
Expand Down Expand Up @@ -464,9 +465,7 @@ async def _download_url(self, url: str, output_stream: BinaryIO) -> DownloadResu
# Use a custom user agent for the preview because some sites will only return
# Open Graph metadata to crawler user agents. Omit the Synapse version
# string to avoid leaking information.
b"User-Agent": [
"Synapse (bot; +https://github.com/matrix-org/synapse)"
],
b"User-Agent": [self.hs.config.server.url_preview_user_agent],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what we do for max_spider_size in __init__, we can do the same for url_preview_user_agent

Suggested change
b"User-Agent": [self.hs.config.server.url_preview_user_agent],
b"User-Agent": [self.url_preview_user_agent],

},
is_allowed_content_type=_is_previewable,
)
Expand Down
Loading