When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug
Key Points
- cwnd stuck at minimum after early loss
- send-time idle-adjustment advanced epoch into the future
- fix: don't advance epoch/recovery into the future
Summary
A port of a Linux CUBIC idle-period optimization into Cloudflare's quiche QUIC implementation caused CUBIC's congestion window (cwnd) to become permanently pinned at its minimum after an early heavy-loss phase. The bug reproduces deterministically in an early-loss then idle scenario: every ACK drove bytes_in_flight to zero, a send-time adjustment advanced the CUBIC epoch/recovery boundary into the future, and the algorithm oscillated between recovery and congestion-avoidance once per RTT instead of growing cwnd. The kernel later avoided this by not setting epoch_start into the future; quiche needed the same guard.
Key Points
-
Reproduction (practical):
- quiche HTTP/3 client/server on localhost, RTT=10ms
- 10 MB download, CUBIC congestion control
- 30% random loss during first 2s, then no loss; 10s test timeout
- Observed ~60% failures (download times out)
-
Observable symptoms:
- cwnd pinned at minimum (≈2700 bytes, two packets) after loss
- ~999 recovery/avoidance transitions in ~6.7s (~1 per RTT)
- bytes_in_flight drops to 0 each ACK and sender emits a two-packet burst
- Reno unaffected (100% pass), confirming CUBIC-specific behavior
-
Root cause (concise):
- Ported idle-time adjustment ran at packet-send time (user-space) and advanced the CUBIC epoch / congestion_recovery_start_time into the future when bytes_in_flight == 0.
- That future epoch made bictcp_update() compute an inflated target, which immediately triggered a recovery vs. avoidance flip each RTT, preventing cwnd growth.
- Kernel follow-up fix: do not set epoch_start in the future; quiche needed the same protection.
-
Fix and mitigation (practical):
- Apply the kernel-style guard: do not advance epoch/recovery-start into the future (clamp the adjustment or skip it when it would move the timestamp ahead).
- Alternatively, perform the idle-duration shift on ACK processing rather than at send time.
- Add a regression test: early heavy loss for first N seconds followed by no loss; assert cwnd recovers and download completes within timeout.
-
Engineering notes:
- The bug requires three simultaneous conditions: a real loss event (recovery boundary set), being in congestion-avoidance, and cwnd collapsed to the two-packet floor.
- Tests should exercise the "idle-after-loss" corner-case not covered by steady-state or slow-start-only suites.
Recommended action items
- Patch quiche to clamp/skip send-time epoch advancement (one-line guard), or move adjustment to ACK handling.
- Add an automated regression reproducing the exact scenario described.
- Backport or audit similar user-space CUBIC ports for the same pattern.