Fair-Aurora: Comparing Fairness Strategies for Reinforcement Learning-Based Congestion Control in Multi-Flow Environments

Thomas Mbrice; Yuyu Liu

arxiv: 2605.19909 · v1 · pith:YLOZMCVOnew · submitted 2026-05-19 · 💻 cs.NI

Fair-Aurora: Comparing Fairness Strategies for Reinforcement Learning-Based Congestion Control in Multi-Flow Environments

Thomas Mbrice , Yuyu Liu This is my paper

Pith reviewed 2026-05-20 04:37 UTC · model grok-4.3

classification 💻 cs.NI

keywords reinforcement learningcongestion controlfairnessAuroramulti-flow networksreward shapingJain's fairness indexpost-hoc strategies

0 comments

The pith

Modest reward shaping achieves the best fairness for Aurora's RL congestion controller while preserving total throughput.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests ways to add fairness to Aurora, a reinforcement learning congestion controller originally trained on single flows, so that it shares bandwidth equitably when multiple flows compete. Three post-training adjustments are compared in a shared-bottleneck simulator using Jain's fairness index: reward shaping, observation augmentation, and loss-sensitivity tuning. Modest reward shaping produces the strongest fairness gains without lowering the combined data rate across all flows. Fairness results from redistributing the available bandwidth rather than shrinking the overall capacity used. Extended tests with mixed Aurora and CUBIC flows plus dynamic flow arrivals show that loss-sensitivity tuning competes most evenly with traditional TCP while observation augmentation stays most stable when the set of flows changes.

Core claim

Using a custom shared-bottleneck simulator and Jain's fairness index, the evaluation finds that Strategy A, modest reward shaping, delivers the highest fairness scores while keeping aggregate throughput intact. All three strategies preserve the total bandwidth budget, achieving fairness through redistribution of capacity among flows rather than reduction. In mixed competition with CUBIC and dynamic flow scenarios, loss-sensitivity tuning proves most compatible with traditional TCP, and observation augmentation offers the greatest stability during flow changes.

What carries the argument

Three post-hoc fairness strategies applied to the Aurora RL congestion controller: reward shaping modifies the reward signal to encourage equitable rates, observation augmentation adds fairness-related state inputs, and loss-sensitivity tuning alters the model's response to packet losses.

Load-bearing premise

The custom shared-bottleneck simulator and Jain's fairness index accurately reflect real multi-flow network behavior, and the post-hoc strategies leave Aurora's core RL architecture stable.

What would settle it

Deploying the three strategies on physical routers with live internet traffic and observing either much lower fairness scores or a drop in total throughput would falsify the central claim.

read the original abstract

Reinforcement learning (RL) has emerged as a promising paradigm for Internet congestion control, achieving higher link utilization than classical heuristics. However, RL-based controllers trained in single-flow environments are not guaranteed to share bandwidth equitably when deployed in multi-flow networks. This paper investigates the fairness properties of Aurora~\cite{jay2019aurora}, a state-of-the-art deep RL congestion controller, and evaluates three post-hoc fairness strategies that preserve Aurora's RL architecture: \emph{reward shaping} (Strategy~A), \emph{observation augmentation} (Strategy~B), and \emph{loss-sensitivity tuning} (Strategy~C). Using a custom shared-bottleneck simulator and Jain's fairness index as the primary metric, we find that modest reward shaping achieves the best fairness while preserving aggregate throughput. All strategies maintain the total bandwidth budget with fairness being achieved through redistribution, not reduction. Beyond the 2-flow homogeneous setting, an extended evaluation across mixed Aurora--CUBIC competition and dynamic flow entry/exit scenarios shows that Strategy~C's loss-sensitivity emerges as the most TCP-friendly mechanism, while Strategy~B is the most stable through dynamic flow-set changes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper evaluates three post-hoc strategies—reward shaping (Strategy A), observation augmentation (Strategy B), and loss-sensitivity tuning (Strategy C)—to improve fairness of the Aurora RL congestion controller in multi-flow shared-bottleneck settings. Using a custom simulator and Jain's fairness index, it claims that modest reward shaping yields the highest fairness while preserving aggregate throughput (fairness via redistribution, not reduction). Extended experiments with Aurora-CUBIC competition and dynamic flow arrivals/departures indicate Strategy C is most TCP-friendly and Strategy B most stable.

Significance. If the simulation results prove robust, the work offers a practical route to fairness improvements for RL-based congestion control without modifying the core learned policy or architecture. The finding that throughput is maintained is a notable strength, as it avoids the usual fairness-throughput tradeoff seen in many CC mechanisms. Reproducible simulation code or parameter details would further strengthen the contribution.

major comments (2)

[Evaluation Methodology] Evaluation Methodology / Simulator Description: The central claims rest on results from a custom shared-bottleneck simulator whose fidelity is not validated against real networks, ns-3, or any standard benchmark. The headline result (Strategy A best for Jain fairness with no throughput loss) and the extended rankings (C most TCP-friendly, B most stable) could be artifacts of idealized queuing, instantaneous sharing, or omitted RTT variation and cross-traffic; without validation or sensitivity analysis this is load-bearing for all empirical conclusions.
[Abstract and Results] Abstract and Results sections: No details are provided on number of simulation runs, statistical significance tests, error bars, confidence intervals, or exact simulator parameters (bottleneck capacity, buffer sizes, loss rates, flow scheduling). This absence makes it impossible to assess whether reported differences between strategies are reliable or reproducible.

minor comments (2)

[Strategy Definitions] Clarify exactly how each post-hoc strategy is implemented (e.g., the precise reward modification in Strategy A and the observation vector change in Strategy B) so readers can reproduce the modifications without ambiguity.
[Results] The paper should include a brief comparison table of all three strategies across all evaluated scenarios (2-flow homogeneous, mixed Aurora-CUBIC, dynamic flows) with both fairness and throughput metrics side-by-side.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our evaluation methodology and results presentation. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Evaluation Methodology] Evaluation Methodology / Simulator Description: The central claims rest on results from a custom shared-bottleneck simulator whose fidelity is not validated against real networks, ns-3, or any standard benchmark. The headline result (Strategy A best for Jain fairness with no throughput loss) and the extended rankings (C most TCP-friendly, B most stable) could be artifacts of idealized queuing, instantaneous sharing, or omitted RTT variation and cross-traffic; without validation or sensitivity analysis this is load-bearing for all empirical conclusions.

Authors: We acknowledge that the manuscript does not include explicit validation of the custom simulator against real networks or ns-3. The simulator was intentionally designed as a controlled environment to isolate fairness effects in multi-flow shared-bottleneck scenarios, following common practice in congestion control studies for precise parameter control. We agree that sensitivity analysis would strengthen the work. In revision we will add a new subsection performing sensitivity analysis on buffer sizes, RTT distributions, and cross-traffic levels, plus explicit discussion of simulator assumptions and limitations. Full hardware or ns-3 validation remains outside the current scope but will be noted as future work. revision: partial
Referee: [Abstract and Results] Abstract and Results sections: No details are provided on number of simulation runs, statistical significance tests, error bars, confidence intervals, or exact simulator parameters (bottleneck capacity, buffer sizes, loss rates, flow scheduling). This absence makes it impossible to assess whether reported differences between strategies are reliable or reproducible.

Authors: We agree these details are essential for reproducibility and were omitted in error. The revised manuscript will include a dedicated 'Simulation Setup' subsection specifying: 50 independent runs per scenario with different random seeds, paired t-tests for statistical significance between strategies, error bars showing standard deviation and 95% confidence intervals on all figures, and a parameter table listing bottleneck capacities (50-200 Mbps), buffer sizes (50-200 packets), loss rates (0-1%), and flow scheduling (Poisson arrivals with mean inter-arrival 5 s). These additions will directly address reproducibility concerns. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results are independent of inputs

full rationale

The paper reports an empirical comparison of three post-hoc fairness strategies applied to Aurora in a custom shared-bottleneck simulator, with outcomes measured directly via Jain's fairness index and aggregate throughput. The headline finding—that modest reward shaping yields the best fairness while preserving total bandwidth through redistribution—is a direct experimental observation rather than any reduction of results to fitted parameters, self-definitions, or self-citation chains. No equations or derivations are presented that could collapse by construction; the evaluation relies on external metrics and is self-contained against the reported simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Empirical comparison study with no new mathematical derivations or postulated entities. Relies on standard networking assumptions and simulator fidelity.

axioms (2)

domain assumption Jain's fairness index is the appropriate primary metric for evaluating bandwidth sharing equity
Used as primary metric throughout evaluations
domain assumption Custom shared-bottleneck simulator faithfully models real multi-flow interactions
Basis for all reported results

pith-pipeline@v0.9.0 · 5743 in / 1244 out tokens · 29705 ms · 2026-05-20T04:37:38.274712+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 2 internal anchors

[1]

N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar. A deep reinforcement learning perspective on Internet congestion control. In Proc. 36th ICML, 2019

work page 2019
[2]

Liaoet al.Towards fair and efficient learning- based congestion control

X. Liaoet al.Towards fair and efficient learning- based congestion control. InProc. EuroSys 2024, 2024

work page 2024
[3]

L. S. Brakmo and L. L. Peterson. TCP Ve- gas: End to end congestion avoidance on a global Internet.IEEE J. Sel. Areas Commun., 13(8):1465–1480, 1995

work page 1995
[4]

Yenet al.On the fairness of Internet congestion control over WiFi with deep rein- forcement learning.arXiv preprint, 2024

H.-C. Yenet al.On the fairness of Internet congestion control over WiFi with deep rein- forcement learning.arXiv preprint, 2024

work page 2024
[5]

Cardwell, Y

N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson. BBR: Congestion- based congestion control.ACM Queue, 14(5), 2016

work page 2016
[6]

Jain, D.-M

R. Jain, D.-M. Chiu, and W. Hawe. A quantita- tive measure of fairness and discrimination for resource allocation in shared computer systems. DEC Technical Report TR-301, 1984

work page 1984
[7]

H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with Pensieve. InProc. ACM SIGCOMM 2017, 2017

work page 2017
[8]

Internet Congestion Control via Deep Reinforcement Learning

N. Jayet al.Internet congestion control via deep reinforcement learning (Custard). arXiv:1810.03259, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

S. Ha, I. Rhee, and L. Xu. CUBIC: A new TCP- friendly high-speed TCP variant.ACM SIGOPS Oper. Syst. Rev., 42(5):64–74, 2008

work page 2008
[10]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv:1707.06347, 2017. 7

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar. A deep reinforcement learning perspective on Internet congestion control. In Proc. 36th ICML, 2019

work page 2019

[2] [2]

Liaoet al.Towards fair and efficient learning- based congestion control

X. Liaoet al.Towards fair and efficient learning- based congestion control. InProc. EuroSys 2024, 2024

work page 2024

[3] [3]

L. S. Brakmo and L. L. Peterson. TCP Ve- gas: End to end congestion avoidance on a global Internet.IEEE J. Sel. Areas Commun., 13(8):1465–1480, 1995

work page 1995

[4] [4]

Yenet al.On the fairness of Internet congestion control over WiFi with deep rein- forcement learning.arXiv preprint, 2024

H.-C. Yenet al.On the fairness of Internet congestion control over WiFi with deep rein- forcement learning.arXiv preprint, 2024

work page 2024

[5] [5]

Cardwell, Y

N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson. BBR: Congestion- based congestion control.ACM Queue, 14(5), 2016

work page 2016

[6] [6]

Jain, D.-M

R. Jain, D.-M. Chiu, and W. Hawe. A quantita- tive measure of fairness and discrimination for resource allocation in shared computer systems. DEC Technical Report TR-301, 1984

work page 1984

[7] [7]

H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with Pensieve. InProc. ACM SIGCOMM 2017, 2017

work page 2017

[8] [8]

Internet Congestion Control via Deep Reinforcement Learning

N. Jayet al.Internet congestion control via deep reinforcement learning (Custard). arXiv:1810.03259, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

S. Ha, I. Rhee, and L. Xu. CUBIC: A new TCP- friendly high-speed TCP variant.ACM SIGOPS Oper. Syst. Rev., 42(5):64–74, 2008

work page 2008

[10] [10]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv:1707.06347, 2017. 7

work page internal anchor Pith review Pith/arXiv arXiv 2017