Skip to content

Conversation

@Urgau
Copy link
Member

@Urgau Urgau commented Jan 13, 2026

This PR adds a GitHub issue comments viewer.

It's primarily designed for long/very long issue (not PR) where GitHub is really unhelpful with it's "Load More".

It can be accessed at /gh-comments/{owner}/{repo}/{issue}.

Technically it's using a GraphQl query (up to 100 comments at a time) to get the issue details and the comments (including the already HTML-ed markdown body). As far as I could see that query only cost 1 per call.

Regarding the styling of comment bodies, this PR uses the github-markdown-css project (under MIT license).

For a big issue like #99301, with 150 comments, the loading time is ~3s on my machine.

Example of a not very long issue (but with many states) image

@Urgau Urgau requested a review from Kobzol January 13, 2026 19:11
@Mark-Simulacrum
Copy link
Member

I think this makes sense at a high level (unfortunately...). I'm wondering if there's a good way for us to authenticate users and only allow access to e.g. rust-lang team members. I think we can deploy without that but I worry a little that we'll end up getting hit with e.g. scrapers trying to use us to bypass GitHub's rate limits.

Do we at least restrict to only rust repos?

@Urgau
Copy link
Member Author

Urgau commented Jan 13, 2026

I'm wondering if there's a good way for us to authenticate users and only allow access to e.g. rust-lang team members. I think we can deploy without that but I worry a little that we'll end up getting hit with e.g. scrapers trying to use us to bypass GitHub's rate limits.

Our very visible endpoints /gha-logs and /gh-changes-since (we link them from GitHub comments) haven't been used by scrapers, so hopefully this one won't.

We could use an OAuth App but that's seems like a very heavy tool.

Do we at least restrict to only rust repos?

Yes. It's the first thing the handler checks.

if !is_repo_autorized(&ctx, &owner, &repo).await? {
return Ok((
StatusCode::UNAUTHORIZED,
format!("repository `{owner}/{repo}` is not part of the Rust Project team repos"),
)
.into_response());
}

@Urgau
Copy link
Member Author

Urgau commented Jan 13, 2026

A simpler thing to avoid us being spammed might be to have a rate limiter with back-off period per IP, maybe a limit of 1 req per IP every Xs. As well as a global limit of X req/s.

@Urgau
Copy link
Member Author

Urgau commented Jan 13, 2026

A simpler thing to avoid us being spammed might be to have a rate limiter with back-off period per IP

I've implemented the rate limiter in #2252.

@SpriteOvO
Copy link
Member

Would it also be possible to display reactions?

Copy link
Member

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome work!

I tried http://localhost:8000/gh-comments/rust-lang/rust/112049 for https://github.com/rust-lang/rust/pull/112049 and it ended with this:

Something went wrong: unable to fetch the issue and it's comments

Caused by:
    0: failed to fetch the issue with comments
    1: error: Could not resolve to an Issue with the number of 112049.

🤔 Does it only work for issues and not PRs?

View changes since this review

let mut data = self
.graphql_query(
"
query ($owner: String!, $repo: String!, $issueNumber: Int!, $cursor: String) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, any reason why use GraphQL and not https://docs.github.com/en/rest/issues/comments?apiVersion=2022-11-28#list-issue-comments? Is there some information not provided by the REST API? I vaguely remember some cases where the same thing being fetched by the REST API was several times faster than through the GraphQL. Would be nice to check the perf. difference on some big PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a datapoint I stopped adding gql github APIs in triagebot because they're too flaky, the usage reporting is not very reliable and I don't see GH improving them.

We already have high-level functions to retrieve comments in triagebot, don't we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately neither the minimized state nor the minimized reason are present in the REST API, forcing us to use the GraphQl API.

Another reason is that it's not possible to retrieve the issue body/title and comments in the same request, we would have to do two requests minimum each time, while we only have to do one GraphQl request.

The GraphQl request is also consistently costing only 1 (checked with rateLimit { cost }).

</div>
<div class="comment-body markdown-body">
{body_html}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about XSS here, but it seems like GitHub already pre-escapes the HTML body of the comment. I hope that we can trust that...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not only does GitHub do the escaping, they also gives us the rendered markdown as HTML, greatly reducing the complexity here.

@Kobzol
Copy link
Member

Kobzol commented Jan 14, 2026

I think that even for the first version, some form of caching would be useful. Also, we should seriously think about bumping this once we start adding more and more triagebot functionality that might be a bit demanding - triagebot is now running with a 0.25 of a vCPU and 512 MiB of RAM (!).

@apiraino
Copy link
Contributor

Also, we should seriously think about bumping this once we start adding more and more triagebot functionality that might be a bit demanding - triagebot is now running with a 0.25 of a vCPU and 512 MiB of RAM (!).

I think that using a small instance is a good metric to keep us in check and strive for the triagebot to stay small and efficient. Adding more power would cover up code inefficiencies (if any, I didn't run an analysis). Data should tell us if the triagebot genuinely needs more resources.

@Urgau
Copy link
Member Author

Urgau commented Jan 14, 2026

🤔 Does it only work for issues and not PRs?

Unfortunately it only works for issues for now. Supporting PRs requires more handling, in particular regarding review comments, and code highlighting; which I didn't wanted to handle in this first version.

@Urgau
Copy link
Member Author

Urgau commented Jan 14, 2026

I think that using a small instance is a good metric to keep us in check and strive for the triagebot to stay small and efficient. Adding more power would cover up code inefficiencies

I think @Kobzol comment is related to the caching part mentioned above, where we would probably want to cache as much as possible to avoid the API calls, and so having more RAM at disposal means more data cached. (Unless we use the database, but that seems overkill)

@Urgau
Copy link
Member Author

Urgau commented Jan 14, 2026

Would it also be possible to display reactions?

I've found that it most cases they are not that useful, but if people want to see them, I can add them in a follow-up PR.

@Kobzol
Copy link
Member

Kobzol commented Jan 14, 2026

I think that using a small instance is a good metric to keep us in check and strive for the triagebot to stay small and efficient. Adding more power would cover up code inefficiencies

I think @Kobzol comment is related to the caching part mentioned above, where we would probably want to cache as much as possible to avoid the API calls, and so having more RAM at disposal means more data cached. (Unless we use the database, but that seems overkill)

Yeah, memory for caching would be the limiting factor. But also more CPU is nice in general. I agree that it's cool that we run fine with 0.25 of a CPU, but if we can make the page load of these new endpoints e.g. a 1sec faster, I think it's better to just pay for (slightly) more resources, than spend a lot optimization time on that.

@Urgau
Copy link
Member Author

Urgau commented Jan 14, 2026

I think that even for the first version, some form of caching would be useful

(I forgot to actually reply to this part)

Adding some form of caching is difficult due to us needing to handle edits and new comments.

One way I could see us handling it would be to have a handler that prunes the cache for the an issue when we receive an event for that issue, but even then I don't think reactions trigger any webhook (not that reactions are that important).

I'm also wondering if the cache will have many hits. Looking at the /gha-logs logs, which I would expect to have more cache hit since multiple people may look at the failure, I don't see many cache hit (arguably the cache is also not very big).

@Urgau
Copy link
Member Author

Urgau commented Jan 14, 2026

I've added a small 35Mb cache with auto pruning as soon as we get an webhook event from that issue.

Copy link
Member

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Let's try it!

View changes since this review

@Urgau Urgau added this pull request to the merge queue Jan 15, 2026
Merged via the queue into rust-lang:master with commit 4481fdb Jan 15, 2026
3 checks passed
@Urgau Urgau deleted the gh-comments branch January 15, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants