-
-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Description
With high parallel stream counts (-P 128), teardown errors still occur despite the fixes in #25. The original fix works for normal stream counts (confirmed by @matttbe with 50 iterations of 1-second runs), but 128 streams triggers two distinct problems.
Reported by @matttbe in #25 (comment)
Reproduction
xfr <host> -P 128 --no-tuiTested in a network namespace with veth pairs at ~19 Mbps.
Problem 1: Post-test RST cascade
When the server's duration timer fires, it cancels all receive handlers, which drop their TcpStream. Dropping a socket with unread data in the kernel buffer sends RST (not FIN) to the client. The client's send tasks are still running — the client doesn't learn the test is over until the Result message arrives on the control channel, which happens after the server has already torn down all data streams.
The client's error suppression checks cancel (not yet set), deadline_reached (not yet, due to timing asymmetry), and near_deadline (250ms grace, not enough). So the errors are treated as fatal and logged as ERROR.
Timing asymmetry: The server starts its interval timer before any data streams connect. Each client send_data starts its own deadline from when that stream connects. With 128 sequential TCP connects, late-starting streams have deadlines that lag the server's by hundreds of milliseconds.
Join timeout: The 2s hardcoded timeout for join_all on stream handles isn't enough for 128 tasks.
Problem 2: Mid-test broken pipe
In one trace, stream 16 gets Broken pipe at ~7s into a 10s test while the test continues running (intervals 8 and 9 still arrive). This is well before either side's deadline. Cause is unclear — possibly kernel-level resource pressure with 128 TCP connections competing for ~19 Mbps (~150 kbps per stream).
Planned fixes
- Scale the client join timeout with stream count (
max(2s, streams * 50ms)) — v0.9.1 - Client stops local data streams at local duration expiry (narrows server/client teardown race) — v0.9.1
- Receive-side cancel drain to reduce RST-on-close bursts — v0.9.1
Server gracefully shuts down data sockets (FIN instead of RST)— RST is correct for timed tests; FIN would let bufferbloated send buffers drain past the requested duration- Investigate mid-test stream failures under high contention
Related
- TCP: Broken pipe / Connection reset at teardown #25 — original teardown fix (works for normal stream counts)
- Add MPTCP support #24 — JoinHandle panic fix