TCP teardown errors with high stream counts (128+)

## Description

With high parallel stream counts (`-P 128`), teardown errors still occur despite the fixes in #25. The original fix works for normal stream counts (confirmed by @matttbe with 50 iterations of 1-second runs), but 128 streams triggers two distinct problems.

Reported by @matttbe in https://github.com/lance0/xfr/issues/25#issuecomment-4005897380

## Reproduction

```bash
xfr <host> -P 128 --no-tui
```

Tested in a network namespace with veth pairs at ~19 Mbps.

## Problem 1: Post-test RST cascade

When the server's duration timer fires, it cancels all receive handlers, which drop their `TcpStream`. Dropping a socket with unread data in the kernel buffer sends RST (not FIN) to the client. The client's send tasks are still running — the client doesn't learn the test is over until the `Result` message arrives on the control channel, which happens *after* the server has already torn down all data streams.

The client's error suppression checks `cancel` (not yet set), `deadline_reached` (not yet, due to timing asymmetry), and `near_deadline` (250ms grace, not enough). So the errors are treated as fatal and logged as `ERROR`.

**Timing asymmetry**: The server starts its interval timer before any data streams connect. Each client `send_data` starts its own deadline from when that stream connects. With 128 sequential TCP connects, late-starting streams have deadlines that lag the server's by hundreds of milliseconds.

**Join timeout**: The 2s hardcoded timeout for `join_all` on stream handles isn't enough for 128 tasks.

## Problem 2: Mid-test broken pipe

In one trace, stream 16 gets `Broken pipe` at ~7s into a 10s test while the test continues running (intervals 8 and 9 still arrive). This is well before either side's deadline. Cause is unclear — possibly kernel-level resource pressure with 128 TCP connections competing for ~19 Mbps (~150 kbps per stream).

## Planned fixes

- [x] Scale the client join timeout with stream count (`max(2s, streams * 50ms)`) — v0.9.1
- [x] Client stops local data streams at local duration expiry (narrows server/client teardown race) — v0.9.1
- [x] Receive-side cancel drain to reduce RST-on-close bursts — v0.9.1
- ~Server gracefully shuts down data sockets (FIN instead of RST)~ — RST is correct for timed tests; FIN would let bufferbloated send buffers drain past the requested duration
- [ ] Investigate mid-test stream failures under high contention

## Related

- #25 — original teardown fix (works for normal stream counts)
- #24 — JoinHandle panic fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TCP teardown errors with high stream counts (128+) #32

Description

Reproduction

Problem 1: Post-test RST cascade

Problem 2: Mid-test broken pipe

Planned fixes

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

TCP teardown errors with high stream counts (128+) #32

Description

Description

Reproduction

Problem 1: Post-test RST cascade

Problem 2: Mid-test broken pipe

Planned fixes

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions