-
Notifications
You must be signed in to change notification settings - Fork 97
Add GitHub issue comments viewer for long issue #2251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this makes sense at a high level (unfortunately...). I'm wondering if there's a good way for us to authenticate users and only allow access to e.g. rust-lang team members. I think we can deploy without that but I worry a little that we'll end up getting hit with e.g. scrapers trying to use us to bypass GitHub's rate limits. Do we at least restrict to only rust repos? |
Our very visible endpoints We could use an OAuth App but that's seems like a very heavy tool.
Yes. It's the first thing the handler checks. Lines 31 to 37 in 2c6a411
|
|
A simpler thing to avoid us being spammed might be to have a rate limiter with back-off period per IP, maybe a limit of 1 req per IP every Xs. As well as a global limit of X req/s. |
I've implemented the rate limiter in #2252. |
|
Would it also be possible to display reactions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really awesome work!
I tried http://localhost:8000/gh-comments/rust-lang/rust/112049 for https://github.com/rust-lang/rust/pull/112049 and it ended with this:
Something went wrong: unable to fetch the issue and it's comments
Caused by:
0: failed to fetch the issue with comments
1: error: Could not resolve to an Issue with the number of 112049.
🤔 Does it only work for issues and not PRs?
| let mut data = self | ||
| .graphql_query( | ||
| " | ||
| query ($owner: String!, $repo: String!, $issueNumber: Int!, $cursor: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, any reason why use GraphQL and not https://docs.github.com/en/rest/issues/comments?apiVersion=2022-11-28#list-issue-comments? Is there some information not provided by the REST API? I vaguely remember some cases where the same thing being fetched by the REST API was several times faster than through the GraphQL. Would be nice to check the perf. difference on some big PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a datapoint I stopped adding gql github APIs in triagebot because they're too flaky, the usage reporting is not very reliable and I don't see GH improving them.
We already have high-level functions to retrieve comments in triagebot, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately neither the minimized state nor the minimized reason are present in the REST API, forcing us to use the GraphQl API.
Another reason is that it's not possible to retrieve the issue body/title and comments in the same request, we would have to do two requests minimum each time, while we only have to do one GraphQl request.
The GraphQl request is also consistently costing only 1 (checked with rateLimit { cost }).
| </div> | ||
| <div class="comment-body markdown-body"> | ||
| {body_html} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried about XSS here, but it seems like GitHub already pre-escapes the HTML body of the comment. I hope that we can trust that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not only does GitHub do the escaping, they also gives us the rendered markdown as HTML, greatly reducing the complexity here.
|
I think that even for the first version, some form of caching would be useful. Also, we should seriously think about bumping this once we start adding more and more triagebot functionality that might be a bit demanding - triagebot is now running with a 0.25 of a vCPU and 512 MiB of RAM (!). |
I think that using a small instance is a good metric to keep us in check and strive for the triagebot to stay small and efficient. Adding more power would cover up code inefficiencies (if any, I didn't run an analysis). Data should tell us if the triagebot genuinely needs more resources. |
Unfortunately it only works for issues for now. Supporting PRs requires more handling, in particular regarding review comments, and code highlighting; which I didn't wanted to handle in this first version. |
I think @Kobzol comment is related to the caching part mentioned above, where we would probably want to cache as much as possible to avoid the API calls, and so having more RAM at disposal means more data cached. (Unless we use the database, but that seems overkill) |
I've found that it most cases they are not that useful, but if people want to see them, I can add them in a follow-up PR. |
Yeah, memory for caching would be the limiting factor. But also more CPU is nice in general. I agree that it's cool that we run fine with 0.25 of a CPU, but if we can make the page load of these new endpoints e.g. a 1sec faster, I think it's better to just pay for (slightly) more resources, than spend a lot optimization time on that. |
(I forgot to actually reply to this part) Adding some form of caching is difficult due to us needing to handle edits and new comments. One way I could see us handling it would be to have a handler that prunes the cache for the an issue when we receive an event for that issue, but even then I don't think reactions trigger any webhook (not that reactions are that important). I'm also wondering if the cache will have many hits. Looking at the |
|
I've added a small 35Mb cache with auto pruning as soon as we get an webhook event from that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Let's try it!
This PR adds a GitHub issue comments viewer.
It's primarily designed for long/very long issue (not PR) where GitHub is really unhelpful with it's "Load More".
It can be accessed at
/gh-comments/{owner}/{repo}/{issue}.Technically it's using a GraphQl query (up to 100 comments at a time) to get the issue details and the comments (including the already HTML-ed markdown body). As far as I could see that query only cost 1 per call.
Regarding the styling of comment bodies, this PR uses the github-markdown-css project (under MIT license).
For a big issue like #99301, with 150 comments, the loading time is ~3s on my machine.
Example of a not very long issue (but with many states)