perf: S3 concurrency semaphore and lock-free flush uploads by klaudworks · Pull Request #111 · KafScale/platform

klaudworks · 2026-02-25T19:11:41Z

Status quo

Every PartitionLog issues S3 calls independently with no shared concurrency limit. The only bound is the SDK's default MaxConnsPerHost: 2048. Total concurrent S3 calls is effectively active_connections * partitions_per_request * 2 for writes, plus active_connections * ReadAheadSegments for prefetch reads.
flushLocked holds l.mu for the entire flush cycle — buffer drain, segment build, S3 upload, and metadata commit. Every AppendBatch and Read on the same partition blocks during S3 upload.

Shortcomings and fixes

flushLocked blocks reads and writes.: flushLocked holds l.mu during S3 uploads, blocking AppendBatch and Read on the same partition until the upload completes. Fixed by splitting into prepareFlush (buffer drain + segment build, under l.mu) and uploadFlush (S3 I/O, no lock held). A flushing flag + sync.Cond serializes concurrent flushes on the same partition.

Unbounded concurrent S3 calls. Each partition issues S3 calls independently with no shared limit. Under load with many partitions, the broker can have hundreds of concurrent S3 requests with no backpressure. Fixed by adding a broker-wide semaphore (KAFSCALE_S3_CONCURRENCY, default 64) that caps concurrent S3 operations. Set to 0 to disable. For slower S3-compatible storages (Hetzner, IONOS, self-hosted MinIO), operators can lower this to match their backend's capacity.

Back-of-envelope for 64 as default concurrency: each 4MB segment on a 10 Gbps link takes 3.2ms to transfer + ~15ms S3 latency = 18.2ms total. One connection achieves 4MB / 18.2ms = ~1.76 Gbps effective throughput. To fill 10 Gbps: 10 / 1.76 = 6 concurrent requests. For 50 Gbps: 24. 64 covers even high-network instances with margin (and also leaves margin for lower s3 latency which would incr. required concurrency to saturate the network). The goal was basically to just set a default that will never throttle throughput while providing back pressure for edge cases.

Connection churn. The AWS SDK default keeps only 10 idle connections per host (MaxIdleConnsPerHost: 10). Under burst load, most connections are created fresh with a full TCP+TLS handshake. Fixed by setting MaxConnsPerHost and MaxIdleConnsPerHost to match the semaphore limit, keeping connections warm.

Prefetch goroutines competed equally with produce/consume for S3 capacity. Fixed by using non-blocking TryAcquire — prefetch is skipped when all tokens are taken. Also narrowed l.mu to just the segment list read so prefetch I/O doesn't block appends.

…oads Introduce a broker-wide semaphore (KAFSCALE_S3_CONCURRENCY, default 64) that caps concurrent S3 operations across all partitions. Align the HTTP transport connection pool with the same limit. Split flushLocked into prepareFlush (under lock) and uploadFlush (lock-free) so that AppendBatch and Read callers are no longer blocked behind S3 I/O. Serialize concurrent flushes on the same partition via a flushing flag and sync.Cond. Prefetch uses TryAcquire to avoid blocking critical-path I/O.

novatechflow · 2026-02-26T07:37:29Z

Thank you @klaudworks - please add the new switch into the /docs? We use a dedicated branch - gh-pages - for our docs rendering, operations (https://kafscale.io/operations/) would be a good candidate?

klaudworks · 2026-02-26T07:40:39Z

@novatechflow Sure, I'll look into it later today and look for most fitting place e.g. operations.

klaudworks · 2026-02-26T11:54:01Z

@novatechflow added the docs

novatechflow

Thank you @klaudworks !

klaudworks added 2 commits February 25, 2026 20:11

chore: remove low-value comments that restate the code

c1f0dfb

klaudworks marked this pull request as ready for review February 25, 2026 20:05

docs: add KAFSCALE_S3_CONCURRENCY to broker env var index

e5d307b

novatechflow approved these changes Feb 26, 2026

View reviewed changes

novatechflow merged commit b1fc0da into KafScale:main Feb 26, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: S3 concurrency semaphore and lock-free flush uploads#111

perf: S3 concurrency semaphore and lock-free flush uploads#111
novatechflow merged 3 commits intoKafScale:mainfrom
klaudworks:feature/s3-backpressure-semaphore

klaudworks commented Feb 25, 2026 •

edited

Loading

Uh oh!

novatechflow commented Feb 26, 2026

Uh oh!

klaudworks commented Feb 26, 2026

Uh oh!

klaudworks commented Feb 26, 2026

Uh oh!

novatechflow left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klaudworks commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status quo

Shortcomings and fixes

Uh oh!

novatechflow commented Feb 26, 2026

Uh oh!

klaudworks commented Feb 26, 2026

Uh oh!

klaudworks commented Feb 26, 2026

Uh oh!

novatechflow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klaudworks commented Feb 25, 2026 •

edited

Loading