Skip to content

Conversation

@bvinc
Copy link
Contributor

@bvinc bvinc commented Jan 22, 2026

Implement a connection pool for all remote APIs. This can be configured using min_connections, max_connections, and max_concurrency_per_connection. It roughly is a round robin algorithm, and the connection pool can grow when connections are maxed out with requests. Once all connections are maxed out, the pool gives out channels that will provide back pressure when they are attempted to be used.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2026
@bvinc
Copy link
Contributor Author

bvinc commented Jan 22, 2026

We've been using connection pool code for a few days. There is one problem that we've noticed. Previously, the number of actions that are executing remotely had an implicit limit. Since all of the actions went over a single connection, if the HTTP2 server limits the number of concurrent streams to, say, 128, then this constituted an implicit limit on the number of jobs that could be executed remotely. With this connection pool, there is no such limit.

I'll probably make another PR to implement this execution semaphore.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Jan 22, 2026

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D91266586. (Because this pull request was imported automatically, there will not be any future comments.)

Implement a connection pool for all remote APIs.  This can be configured using
min_connections, max_connections, and max_concurrency_per_connection.  It
roughly is a round robin algorithm, and the connection pool can grow when
connections are maxed out with requests.  Once all connections are maxed out,
back pressure will be provided.
@Ralith
Copy link
Contributor

Ralith commented Jan 23, 2026

Multiple connections to the same remote host usually isn't optimal from a congestion control standpoint. Would it make more sense to raise the server's concurrent stream limit? If it accepts multiple connections, clearly it's willing to tolerate larger numbers of streams in practice.

@bvinc
Copy link
Contributor Author

bvinc commented Jan 23, 2026

@Ralith I disagree, or maybe I'm not sure what you mean by "from a congestion control standpoint". BTW, in practice, most HTTP2 servers limit the number of concurrent streams per connection to around 100 or 128.

A single stream can be limited by latency and window sizes and packet loss. Currently, each GRPC client (CAS, action cache, executor, and bytestream) has its own TCP connection. So even multiple large file materializations all go through a single connection, where missing a single packet can stall everything. If you really want to use all of your bandwidth, you need to use multiple streams. This is especially true if you have high latency or any packet loss at all between you and your build servers.

I think using multiple streams is fairly standard advice. For example, it is standard advice that if you want to download from s3 at full speeds, to use multiple TCP streams. The aws s3 command line tool downloads defaults to chunking downloads and uses multiple connections.

For another example, bazel uses a similar growing connection pool, and I've chosen similar defaults.

It's configurable though, so if you really wanted to use a single TCP connection instead of 4, with this PR, you can do that.

@Ralith
Copy link
Contributor

Ralith commented Jan 23, 2026

I thought your intent was to increase concurrency, but I see that I misunderstood. Mitigating TCP head-of-line blocking makes sense. Supporting HTTP/3 might be a more graceful solution in the long term.

I disagree, or maybe I'm not sure what you mean by "from a congestion control standpoint"

Because each connection's congestion controller operates independently, without knowledge of the others, they can be slower to converge, and to re-converge after a disturbance. As you say, though, the tradeoff can still make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants