Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a query param participant to admin endpoint /members to filter for participants in a room #18040

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

H-Shay
Copy link
Contributor

@H-Shay H-Shay commented Dec 18, 2024

Adds a query param participant to admin endpoint GET /_synapse/admin/v1/rooms/{room_id}/members which returns a list of all currently joined room members who have also posted to the room.

@H-Shay H-Shay requested a review from a team as a code owner December 18, 2024 20:10
@github-actions github-actions bot deployed to PR Documentation Preview December 18, 2024 20:13 Active
@anoadragon453
Copy link
Member

Thinking out loud, it might make more sense to make this an additional query parameter (?participant=true) on the existing Room Members Admin API.

What do you think @H-Shay?

@H-Shay
Copy link
Contributor Author

H-Shay commented Dec 19, 2024

Oh interesting - just to flesh it out, in your formulation if the query parameter participant=true is provided, then the return of {"members": [members], "total": len(members)} would only include members who have posted? I don't see a problem with this and it does avoid adding a new endpoint.

@anoadragon453
Copy link
Member

@H-Shay that's correct, yep! Would you be able to rework the PR? Apologies for the busy work!

@anoadragon453 anoadragon453 removed the request for review from a team December 20, 2024 11:58
@anoadragon453
Copy link
Member

(Taking this off the review queue in the meantime.)

@github-actions github-actions bot deployed to PR Documentation Preview December 20, 2024 18:39 Active
@H-Shay H-Shay changed the title Add an admin endpoint to fetch the participants in a room Add a query param participant to admin endpoint /members to fetch the participants in a room Dec 20, 2024
@github-actions github-actions bot deployed to PR Documentation Preview December 20, 2024 18:43 Active
@H-Shay H-Shay changed the title Add a query param participant to admin endpoint /members to fetch the participants in a room Add a query param participant to admin endpoint /members to filter for participants in a room Dec 20, 2024
@github-actions github-actions bot deployed to PR Documentation Preview December 20, 2024 19:10 Active
@H-Shay
Copy link
Contributor Author

H-Shay commented Dec 20, 2024

@anoadragon453 I have re-worked the PR but seem to be unable to re-request a review - can you take a look when you get a chance or pop this back into the queue? Thanks so much!

@anoadragon453 anoadragon453 self-requested a review December 23, 2024 10:04
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall direction looks good, but we'll need to rework the query due to performance concerns I'm afraid.


The following query parameters are available:

* `participant` - Optional. If provided and set to true, only returns currently joined members who have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `participant` - Optional. If provided and set to true, only returns currently joined members who have
* `participant` (bool) - Optional. If provided and set to true, only returns currently joined members who have

Comment on lines +1622 to +1628
SELECT DISTINCT c.state_key
FROM current_state_events AS c
INNER JOIN events AS e USING(room_id)
WHERE room_id = ?
AND c.membership = 'join'
AND e.type = 'm.room.message'
AND c.state_key = e.sender
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this query is currently quite slow. Testing this on matrix.org with Matrix HQ as the room, this query took about 50 seconds to run. It would be much slower on typical homeserver hardware. Postgres' EXPLAIN command measures the cost at cost=438880.86..440110.15 rows=122929 width=29.

Rather than iterating through current_state_events for room membership, the room_memberships table could be used to check which users are currently joined to a given room. We can rewrite the query as follows:

SELECT DISTINCT rm.user_id
FROM room_memberships AS rm
INNER JOIN events AS e USING(room_id)
WHERE room_id = ?
    AND rm.membership = 'join'
    AND e.type = 'm.room.message'
    AND rm.user_id = e.sender;

This cuts our cost down a little bit, but still not much; psql rates the cost as cost=438882.94..440112.23 rows=122929 width=29, and still taking about 30 seconds.

The majority of the cost comes from scanning events for m.room.message-type events from the given room:

               ->  Parallel Bitmap Heap Scan on events e  (cost=4517.63..242635.10 rows=26950 width=58) (actual time=409.882..2154.587 rows=142547 loops=3)
                     Recheck Cond: (room_id = '!OGEhHVWSdvArJzumhm:matrix.org'::text)
                     Rows Removed by Index Recheck: 7928064
                     Filter: (type = 'm.room.message'::text)
                     Rows Removed by Filter: 138493
                     Heap Blocks: exact=31299 lossy=164255
                     ->  Bitmap Index Scan on events_room_stream  (cost=0.00..4501.46 rows=120367 width=0) (actual time=317.490..317.491 rows=843191 loops=1)
                           Index Cond: (room_id = '!OGEhHVWSdvArJzumhm:matrix.org'::text)

One possible way we could move forward is to add another column to room_memberships called participant that starts off as false, and is set to true once a user sends a message in a room. You would need a background job to populate this table from the current database, but going forwards it could be updated trivially, and read from very quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a column sounds practical given I don't think there's another way around scanning the events table for evidence of participation - in terms of this PR, I am thinking that it would make the most sense to open a separate PR to add the column and the background job to populate it, and then once that is complete add logic to this PR to read from the column - would that be a sensible approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@H-Shay that sounds right to me! #17512 may serve as a good example of adding the background job. I don't believe you'll need to bump SCHEMA_COMPAT_VERSION as we're not removing any columns/tables; only adding.

The background updates documentation is useful here, and remember that boolean columns need to be noted in the port script.

You may find that you want to block the Admin API from returning any results until the background update is done - lest you get incomplete results. Up to you though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is #18068

The following query parameters are available:

* `participant` - Optional. If provided and set to true, only returns currently joined members who have
also posted to the room. Defaults to false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
also posted to the room. Defaults to false.
also posted to the room. Having "posted" is defined as sending an `m.room.message`
event at some point in the room's history. Defaults to false.

docs/admin_api/rooms.md Show resolved Hide resolved
@lamoboos223
Copy link
Contributor

@H-Shay why only users who posted to the room? why not even the users who joined a room and left it without even sending a message? you know to get the historical room data

@anoadragon453
Copy link
Member

@lamoboos223 the existing endpoint is only for currently-joined members. If you want to get historical members from a room, I suggest pulling the full membership state of the room and filtering by users whose membership value is leave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants