-
Notifications
You must be signed in to change notification settings - Fork 323
use a connection pool for remote execution #1213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
We've been using connection pool code for a few days. There is one problem that we've noticed. Previously, the number of actions that are executing remotely had an implicit limit. Since all of the actions went over a single connection, if the HTTP2 server limits the number of concurrent streams to, say, 128, then this constituted an implicit limit on the number of jobs that could be executed remotely. With this connection pool, there is no such limit. I'll probably make another PR to implement this execution semaphore. |
|
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D91266586. (Because this pull request was imported automatically, there will not be any future comments.) |
Implement a connection pool for all remote APIs. This can be configured using min_connections, max_connections, and max_concurrency_per_connection. It roughly is a round robin algorithm, and the connection pool can grow when connections are maxed out with requests. Once all connections are maxed out, back pressure will be provided.
7f3b163 to
a6664c6
Compare
|
Multiple connections to the same remote host usually isn't optimal from a congestion control standpoint. Would it make more sense to raise the server's concurrent stream limit? If it accepts multiple connections, clearly it's willing to tolerate larger numbers of streams in practice. |
|
@Ralith I disagree, or maybe I'm not sure what you mean by "from a congestion control standpoint". BTW, in practice, most HTTP2 servers limit the number of concurrent streams per connection to around 100 or 128. A single stream can be limited by latency and window sizes and packet loss. Currently, each GRPC client (CAS, action cache, executor, and bytestream) has its own TCP connection. So even multiple large file materializations all go through a single connection, where missing a single packet can stall everything. If you really want to use all of your bandwidth, you need to use multiple streams. This is especially true if you have high latency or any packet loss at all between you and your build servers. I think using multiple streams is fairly standard advice. For example, it is standard advice that if you want to download from s3 at full speeds, to use multiple TCP streams. The For another example, bazel uses a similar growing connection pool, and I've chosen similar defaults. It's configurable though, so if you really wanted to use a single TCP connection instead of 4, with this PR, you can do that. |
|
I thought your intent was to increase concurrency, but I see that I misunderstood. Mitigating TCP head-of-line blocking makes sense. Supporting HTTP/3 might be a more graceful solution in the long term.
Because each connection's congestion controller operates independently, without knowledge of the others, they can be slower to converge, and to re-converge after a disturbance. As you say, though, the tradeoff can still make sense. |
Implement a connection pool for all remote APIs. This can be configured using min_connections, max_connections, and max_concurrency_per_connection. It roughly is a round robin algorithm, and the connection pool can grow when connections are maxed out with requests. Once all connections are maxed out, the pool gives out channels that will provide back pressure when they are attempted to be used.