-
Notifications
You must be signed in to change notification settings - Fork 97
Add rate limiter to our most exposed and costly endpoints #2252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Also unsure of the numbers, I think that we could go higher, but let's see how this works. Can we avoid using the governor on localhost though? It would get annoying for developing triagebot :) Maybe by not applying it if debug_assertions is set, or hide it behind a CLI parameter? Or filtering out requests from localhost? (although not sure if the requests on the deployment through Docker aren't also from localhost, we'd have to test that I guess).
src/main.rs
Outdated
| .per_second(15) | ||
| .burst_size(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who (or what) hits these /gha-logs/* endpoints? Do we have stats about their usage?
3 req/s with 15s cooldown look a bit restrictive - but it really depends on the answer to the above questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /gha-logs endpoint is the (plain enhanced) link when CI fails1.
It's used to display the raw-enhanced logs from GitHub Actions. It can take ~3/4s to fetch and display our logs.
I'm not expecting anyone to try open 4 of those links in <15s. So I think a 15s cool down is good enough.
Footnotes
I also wanted to exclude localhost, but in my typical workflow I pretty much always rebuild and thus restart triagebot, resetting those limit. I can add an environment variable (like we do for other things) to relax those restrictions. |
|
Ok, fair enough for the rebuild, it still seems a bit annoying and unexpected if someone would work on it locally though. Maybe just disabling it in debug mode would do the trick, to avoid complicating things? |
|
Disabling it when I prefer to follow our existing convention of using an environment variables, so I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, gating it on compilation options could likely be confusing, you're right.
This PR adds rate limiter to our most exposed and costly endpoints.
I used the tower-governor crate with a burst size of 3 requests per IP and a replenishment of one element every 15 seconds.
I don't know if those parameters are good, but we can adjust them later if we find out that they are too light or too tight.
Those limits are NOT applied to the
/github-hookand/zulip-hookendpoints.Related to #2251 (comment)
cc @Mark-Simulacrum