Fair-Aurora: Comparing Fairness Strategies for Reinforcement Learning-Based Congestion Control in Multi-Flow Environments
Pith reviewed 2026-05-20 04:37 UTC · model grok-4.3
The pith
Modest reward shaping achieves the best fairness for Aurora's RL congestion controller while preserving total throughput.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a custom shared-bottleneck simulator and Jain's fairness index, the evaluation finds that Strategy A, modest reward shaping, delivers the highest fairness scores while keeping aggregate throughput intact. All three strategies preserve the total bandwidth budget, achieving fairness through redistribution of capacity among flows rather than reduction. In mixed competition with CUBIC and dynamic flow scenarios, loss-sensitivity tuning proves most compatible with traditional TCP, and observation augmentation offers the greatest stability during flow changes.
What carries the argument
Three post-hoc fairness strategies applied to the Aurora RL congestion controller: reward shaping modifies the reward signal to encourage equitable rates, observation augmentation adds fairness-related state inputs, and loss-sensitivity tuning alters the model's response to packet losses.
Load-bearing premise
The custom shared-bottleneck simulator and Jain's fairness index accurately reflect real multi-flow network behavior, and the post-hoc strategies leave Aurora's core RL architecture stable.
What would settle it
Deploying the three strategies on physical routers with live internet traffic and observing either much lower fairness scores or a drop in total throughput would falsify the central claim.
read the original abstract
Reinforcement learning (RL) has emerged as a promising paradigm for Internet congestion control, achieving higher link utilization than classical heuristics. However, RL-based controllers trained in single-flow environments are not guaranteed to share bandwidth equitably when deployed in multi-flow networks. This paper investigates the fairness properties of Aurora~\cite{jay2019aurora}, a state-of-the-art deep RL congestion controller, and evaluates three post-hoc fairness strategies that preserve Aurora's RL architecture: \emph{reward shaping} (Strategy~A), \emph{observation augmentation} (Strategy~B), and \emph{loss-sensitivity tuning} (Strategy~C). Using a custom shared-bottleneck simulator and Jain's fairness index as the primary metric, we find that modest reward shaping achieves the best fairness while preserving aggregate throughput. All strategies maintain the total bandwidth budget with fairness being achieved through redistribution, not reduction. Beyond the 2-flow homogeneous setting, an extended evaluation across mixed Aurora--CUBIC competition and dynamic flow entry/exit scenarios shows that Strategy~C's loss-sensitivity emerges as the most TCP-friendly mechanism, while Strategy~B is the most stable through dynamic flow-set changes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates three post-hoc strategies—reward shaping (Strategy A), observation augmentation (Strategy B), and loss-sensitivity tuning (Strategy C)—to improve fairness of the Aurora RL congestion controller in multi-flow shared-bottleneck settings. Using a custom simulator and Jain's fairness index, it claims that modest reward shaping yields the highest fairness while preserving aggregate throughput (fairness via redistribution, not reduction). Extended experiments with Aurora-CUBIC competition and dynamic flow arrivals/departures indicate Strategy C is most TCP-friendly and Strategy B most stable.
Significance. If the simulation results prove robust, the work offers a practical route to fairness improvements for RL-based congestion control without modifying the core learned policy or architecture. The finding that throughput is maintained is a notable strength, as it avoids the usual fairness-throughput tradeoff seen in many CC mechanisms. Reproducible simulation code or parameter details would further strengthen the contribution.
major comments (2)
- [Evaluation Methodology] Evaluation Methodology / Simulator Description: The central claims rest on results from a custom shared-bottleneck simulator whose fidelity is not validated against real networks, ns-3, or any standard benchmark. The headline result (Strategy A best for Jain fairness with no throughput loss) and the extended rankings (C most TCP-friendly, B most stable) could be artifacts of idealized queuing, instantaneous sharing, or omitted RTT variation and cross-traffic; without validation or sensitivity analysis this is load-bearing for all empirical conclusions.
- [Abstract and Results] Abstract and Results sections: No details are provided on number of simulation runs, statistical significance tests, error bars, confidence intervals, or exact simulator parameters (bottleneck capacity, buffer sizes, loss rates, flow scheduling). This absence makes it impossible to assess whether reported differences between strategies are reliable or reproducible.
minor comments (2)
- [Strategy Definitions] Clarify exactly how each post-hoc strategy is implemented (e.g., the precise reward modification in Strategy A and the observation vector change in Strategy B) so readers can reproduce the modifications without ambiguity.
- [Results] The paper should include a brief comparison table of all three strategies across all evaluated scenarios (2-flow homogeneous, mixed Aurora-CUBIC, dynamic flows) with both fairness and throughput metrics side-by-side.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our evaluation methodology and results presentation. We address each major comment below and describe the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Evaluation Methodology] Evaluation Methodology / Simulator Description: The central claims rest on results from a custom shared-bottleneck simulator whose fidelity is not validated against real networks, ns-3, or any standard benchmark. The headline result (Strategy A best for Jain fairness with no throughput loss) and the extended rankings (C most TCP-friendly, B most stable) could be artifacts of idealized queuing, instantaneous sharing, or omitted RTT variation and cross-traffic; without validation or sensitivity analysis this is load-bearing for all empirical conclusions.
Authors: We acknowledge that the manuscript does not include explicit validation of the custom simulator against real networks or ns-3. The simulator was intentionally designed as a controlled environment to isolate fairness effects in multi-flow shared-bottleneck scenarios, following common practice in congestion control studies for precise parameter control. We agree that sensitivity analysis would strengthen the work. In revision we will add a new subsection performing sensitivity analysis on buffer sizes, RTT distributions, and cross-traffic levels, plus explicit discussion of simulator assumptions and limitations. Full hardware or ns-3 validation remains outside the current scope but will be noted as future work. revision: partial
-
Referee: [Abstract and Results] Abstract and Results sections: No details are provided on number of simulation runs, statistical significance tests, error bars, confidence intervals, or exact simulator parameters (bottleneck capacity, buffer sizes, loss rates, flow scheduling). This absence makes it impossible to assess whether reported differences between strategies are reliable or reproducible.
Authors: We agree these details are essential for reproducibility and were omitted in error. The revised manuscript will include a dedicated 'Simulation Setup' subsection specifying: 50 independent runs per scenario with different random seeds, paired t-tests for statistical significance between strategies, error bars showing standard deviation and 95% confidence intervals on all figures, and a parameter table listing bottleneck capacities (50-200 Mbps), buffer sizes (50-200 packets), loss rates (0-1%), and flow scheduling (Poisson arrivals with mean inter-arrival 5 s). These additions will directly address reproducibility concerns. revision: yes
Circularity Check
No significant circularity; empirical results are independent of inputs
full rationale
The paper reports an empirical comparison of three post-hoc fairness strategies applied to Aurora in a custom shared-bottleneck simulator, with outcomes measured directly via Jain's fairness index and aggregate throughput. The headline finding—that modest reward shaping yields the best fairness while preserving total bandwidth through redistribution—is a direct experimental observation rather than any reduction of results to fitted parameters, self-definitions, or self-citation chains. No equations or derivations are presented that could collapse by construction; the evaluation relies on external metrics and is self-contained against the reported simulation benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Jain's fairness index is the appropriate primary metric for evaluating bandwidth sharing equity
- domain assumption Custom shared-bottleneck simulator faithfully models real multi-flow interactions
Reference graph
Works this paper leans on
-
[1]
N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar. A deep reinforcement learning perspective on Internet congestion control. In Proc. 36th ICML, 2019
work page 2019
-
[2]
Liaoet al.Towards fair and efficient learning- based congestion control
X. Liaoet al.Towards fair and efficient learning- based congestion control. InProc. EuroSys 2024, 2024
work page 2024
-
[3]
L. S. Brakmo and L. L. Peterson. TCP Ve- gas: End to end congestion avoidance on a global Internet.IEEE J. Sel. Areas Commun., 13(8):1465–1480, 1995
work page 1995
-
[4]
H.-C. Yenet al.On the fairness of Internet congestion control over WiFi with deep rein- forcement learning.arXiv preprint, 2024
work page 2024
-
[5]
N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson. BBR: Congestion- based congestion control.ACM Queue, 14(5), 2016
work page 2016
-
[6]
R. Jain, D.-M. Chiu, and W. Hawe. A quantita- tive measure of fairness and discrimination for resource allocation in shared computer systems. DEC Technical Report TR-301, 1984
work page 1984
-
[7]
H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with Pensieve. InProc. ACM SIGCOMM 2017, 2017
work page 2017
-
[8]
Internet Congestion Control via Deep Reinforcement Learning
N. Jayet al.Internet congestion control via deep reinforcement learning (Custard). arXiv:1810.03259, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
S. Ha, I. Rhee, and L. Xu. CUBIC: A new TCP- friendly high-speed TCP variant.ACM SIGOPS Oper. Syst. Rev., 42(5):64–74, 2008
work page 2008
-
[10]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv:1707.06347, 2017. 7
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.