Near-optimal Online Traffic Engineering

Arvin Ghavidel; Nikolai Matni; Pooria Namyar; Ramesh Govindan; Walter Willinger

arxiv: 2605.16187 · v1 · pith:TIGZDVKQnew · submitted 2026-05-15 · 💻 cs.NI

Near-optimal Online Traffic Engineering

Arvin Ghavidel , Pooria Namyar , Nikolai Matni , Walter Willinger , Ramesh Govindan This is my paper

Pith reviewed 2026-05-19 18:16 UTC · model grok-4.3

classification 💻 cs.NI

keywords traffic engineeringWANdistributed optimizationonline algorithmsMLUmax-flownetwork management

0 comments

The pith

OnlineTE uses optimization decomposition for distributed solvers that deliver near-optimal WAN traffic engineering within seconds of changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OnlineTE, a traffic engineering system for wide area networks that reacts immediately when traffic demands shift or failures occur. It breaks the optimization into local subproblems that switches solve themselves, with a coordinator keeping the overall solution on track. This design supports quick re-optimization triggered locally at any switch. A sympathetic reader would care because existing centralized controllers often take minutes to respond, leaving networks inefficient during dynamic periods.

Core claim

OnlineTE builds distributed TE solvers for path-based MLU and max-flow problems through optimization decomposition. Each switch handles part of the computation locally while a central coordinator orchestrates progress, so a switch can start re-optimization as soon as it detects a demand change or failure. The result is near-optimal solutions reached within seconds even on emulated 750-node WAN topologies.

What carries the argument

Optimization decomposition into local subproblems solved at switches with central coordination for path-based traffic engineering.

If this is right

Reacts to demand changes and failures in seconds rather than minutes.
Scales to large 750-node WAN emulations using only modest compute on switches.
Outperforms current centralized methods by up to an order of magnitude in testbed runs.
Enables edge-based TE that can use network resources more efficiently than path-based methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition idea might extend to other real-time network control tasks beyond traffic engineering.
Widespread use could reduce the hardware demands placed on central controllers in future networks.
Production deployments would show how often measurement noise forces extra coordination rounds.

Load-bearing premise

That local subproblems from the decomposition, when coordinated, remain near-optimal despite real WAN dynamics, failures, and measurement noise.

What would settle it

Running the system on a live WAN during repeated demand changes and failures, then checking if achieved MLU stays within a small gap of the offline optimal.

Figures

Figures reproduced from arXiv: 2605.16187 by Arvin Ghavidel, Nikolai Matni, Pooria Namyar, Ramesh Govindan, Walter Willinger.

**Figure 1.** Figure 1: The classical TE loop (left) runs periodically on a controller. In OnlineTE’s distributed TE approach (right), switches notice demand shifts and failures immediately, and iteratively run a decomposed optimization together with a coordinator. minutes [21, 28, 35]) (a) collects demands from switches and predicts future demand, (b) runs the TE algorithm on predicted demand, and (c) installs forwarding rules… view at source ↗

**Figure 2.** Figure 2: Illustrating optimization decomposition. Demand 1 is routed along paths 1 − 2 − 4, 1 − 4 and 1 − 3 − 4, while Demand 2 is routed along paths 2 − 4, 2 − 1 − 4, and 2 − 1 − 3 − 4. The splits in the above table are for paths in this order. Prices correspond to links or path segments shared between the demands, respectively 2 − 4, 1 − 4, and 1 − 3 − 4. Initial Update Round 1 Round 2 ... Round N Demand 1 (0.25,… view at source ↗

**Figure 3.** Figure 3: When Demand 1 changes to 3.5 and Demand 2 to 2.5, the algorithm can warm-start from the previous solution. by distributing the computation across switches ( [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 4.** Figure 4: ADMM decomposition for a simple Consensus problem. and its decomposition, it may be possible to solve some of the sub-problems in a distributed manner [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: OnlineTE’s nested ADMM converges faster than a single ADMM loop. z-update dual-update zinner-update dualinner-update Coordinator xinner-update Switch 1 ... xinner-update Switch N xinner xinner zinner, dualinner zinner, dualinner [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 8.** Figure 8: Distribution of final MLU objective regret with different network load and demand distribution. height of the bar) and notches representing the median, the 25th percentile and the 75th percentile. At all loads, OnlineTE has non-zero objective regret, for two reasons. First, when a demand change occurs, it incurs regret until it converges, which takes a few seconds. Second, we run ADMM until it reaches 1% o… view at source ↗

**Figure 10.** Figure 10: Distribution of final max-Flow objective regret. medium traffic loads, OnlineTE and DeDe have comparable regrets. For both, we compute solutions within 1% of the optimal, but in theory, DeDe’s objective regret should have been higher since it does not react immediately (as is the case for MLU at light and medium loads, [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 9.** Figure 9: Trace of MLU objective for one of our against DeDe. NCFlow Pop16 DeDe OnlineTE 0 20 40 60 80 Regret Better (Max-Flow) Light Load Model Uniform Gravity Bimodal NCFlow Pop16 DeDe OnlineTE 0 20 40 60 80 (Max-Flow) Medium Load NCFlow (Obj.) Pop16 (Obj.) DeDe (Obj.) OnlineTE (Obj.) NCFlow (Cap.) Pop16 (Cap.) DeDe (Cap.) OnlineTE (Cap.) (Max-Flow) High Load 0 20 40 60 80 Regret [PITH_FULL_IMAGE:figures/full_fig… view at source ↗

**Figure 11.** Figure 11: (lower) shows the behavior of the two approaches during one of our experiments in which we induced 3 failures (at 60s, 180s and 200s). In these experiments, we do not introduce any demand changes, since it would be difficult to dis-entangle OnlineTE’s behavior under failure from its response to demand changes. These experiments use medium load and a uniform traffic matrix [PITH_FULL_IMAGE:figures/full_… view at source ↗

**Figure 14.** Figure 14: shows the execution time of a single invocation of 𝑥𝑖𝑛𝑛𝑒𝑟-update on our VMs as a function of topology size, both for edge-based and path-based approaches. With a topology with 𝑚 nodes and 𝑛 edges, the 𝑥𝑖𝑛𝑛𝑒𝑟-update for edge-based TE handles matrices of size 𝑂(𝑚𝑛). A path-based solver can use a more efficient sparse-matrix representation, since the paths are pre-defined and the allocations can be sparse. A… view at source ↗

**Figure 13.** Figure 13: shows the convergence time of the three approaches, both for cold-start and warm-start. For both, synchronous ADMM has very high latency, almost 40 s for cold-start, since the coordinator needs to wait to hear from all switches. Asynchronous ADMM converges faster, but can incur long convergence due to a large update from a far away switch. As a result, it exhibits high variability (over 10s warm-start at… view at source ↗

read the original abstract

Most deployed WAN Traffic Engineering (TE) systems use a logically centralized controller that periodically gathers traffic demands, runs a TE optimization or heuristic, and then programs the network. At scale, these solutions can be sub-optimal, and can take minutes to react to demand changes or failures. In this paper, we introduce OnlineTE, a system that reacts immediately to demand changes and failures, and delivers near-optimal solutions within seconds of a change. OnlineTE builds on the theory of optimization decomposition to devise scalable, near-optimal, distributed TE solvers for path-based MLU and Max-flow problems. In OnlineTE, each switch solves part of the optimization, and a central coordinator orchestrates the progress of the switches. As such, a switch can trigger a re-optimization as soon as it notices a demand change or failure, enabling high reactivity. OnlineTE scales to large WANs, and its compute requirements are well below the capabilities of modern WAN switches. It also enables a new opportunity, edge-based TE, which can utilize resources more efficiently than today's path-based approaches. On a testbed emulation of a 750-node WAN topology, OnlineTE can outperform the state-of-the-art by up to an order of magnitude.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OnlineTE applies optimization decomposition to let switches trigger reactive TE solves, with promising large-scale emulation results but open questions on robustness to real WAN asynchrony.

read the letter

OnlineTE shows how to use optimization decomposition for a distributed, reactive traffic engineering system that reacts in seconds rather than minutes. The new element is the design where switches can initiate re-optimization locally when they detect changes, with a coordinator keeping the distributed solves on track for path-based MLU and max-flow problems. This seems absent from earlier work on periodic TE. The paper does well by demonstrating scalability on a large 750-node topology emulation and claiming order-of-magnitude improvements, which suggests the approach has practical potential. The discussion of edge-based TE is also a nice addition. The soft spots are around the assumption that the decomposition stays near-optimal in asynchronous settings with failures and noisy measurements. As the stress-test notes, typical decomposition needs coordination that might not be reliable in WANs, and the abstract lacks details on iteration counts or robustness. If the full paper has experiments addressing this, it would be stronger; otherwise, it's a gap. This paper is for networking systems people interested in improving WAN performance through better TE. Readers who work on distributed optimization or network control would get the most out of it. It deserves a serious referee to dig into the implementation and verify the performance claims under varied conditions. I recommend sending it for peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces OnlineTE, a distributed system for online traffic engineering in WANs that applies optimization decomposition to create scalable solvers for path-based minimum link utilization (MLU) and max-flow problems. Switches detect demand changes or failures and trigger local subproblem solves, coordinated by a central entity, enabling reaction within seconds rather than minutes. The work claims near-optimal solutions, scaling to 750-node topologies in emulation, compute loads suitable for modern switches, and up to an order-of-magnitude improvement over state-of-the-art centralized approaches, while also enabling edge-based TE.

Significance. If the empirical and robustness claims hold, the result would be significant for WAN management by addressing the reactivity gap in centralized TE systems while retaining near-optimality through established decomposition methods. The large-scale 750-node emulation and practical focus on switch compute limits are strengths; the suggestion of edge-based TE opens a new direction for resource efficiency. These elements position the work as a practical bridge between theory and deployable online control.

major comments (3)

[Abstract / Evaluation] Abstract and Evaluation section: the claim of outperforming the state-of-the-art by up to an order of magnitude on a 750-node WAN topology emulation provides no error bars, no explicit quantification of the optimality gap (e.g., relative to centralized optimum), and no discussion of post-hoc tuning or statistical validation. This directly affects assessment of the central performance result.
[Decomposition approach] Decomposition approach (theoretical and algorithmic sections): the distributed solvers for path-based MLU and max-flow are asserted to remain near-optimal when re-triggered by demand changes or failures, yet no convergence bounds, iteration counts under asynchrony, or analysis of partial views after failures and measurement noise are supplied. This is load-bearing for the online reactivity claim, as decomposition typically assumes reliable synchronous coordination.
[Scaling and Implementation] Scaling and reactivity claims: while compute requirements are stated to be well below modern WAN switch capabilities, the manuscript lacks direct timing comparisons to centralized solvers or measurements of end-to-end reaction latency (including coordination overhead) across the 750-node emulation scenarios.

minor comments (2)

[Abstract] The abstract could more explicitly define 'near-optimal' at first mention to set reader expectations before the emulation results.
[System Design] Notation for the central coordinator's orchestration of local subproblems would benefit from a concise pseudocode or diagram to clarify the distributed protocol.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, proposing revisions to improve empirical rigor and clarity while preserving the core contributions of the work.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and Evaluation section: the claim of outperforming the state-of-the-art by up to an order of magnitude on a 750-node WAN topology emulation provides no error bars, no explicit quantification of the optimality gap (e.g., relative to centralized optimum), and no discussion of post-hoc tuning or statistical validation. This directly affects assessment of the central performance result.

Authors: We agree that these details strengthen the evaluation. In the revised manuscript we will add error bars derived from multiple independent emulation runs, explicitly quantify the optimality gap by comparing OnlineTE solutions against the centralized optimum for the same demands, and clarify that no post-hoc parameter tuning was applied. The reported improvements reflect the default solver configuration across the tested scenarios. revision: yes
Referee: [Decomposition approach] Decomposition approach (theoretical and algorithmic sections): the distributed solvers for path-based MLU and max-flow are asserted to remain near-optimal when re-triggered by demand changes or failures, yet no convergence bounds, iteration counts under asynchrony, or analysis of partial views after failures and measurement noise are supplied. This is load-bearing for the online reactivity claim, as decomposition typically assumes reliable synchronous coordination.

Authors: The near-optimality follows from the underlying decomposition theory for the synchronous case; re-triggering upon detected changes re-initializes the process from the prior solution. We will add observed iteration counts from the 750-node emulations and a brief discussion of practical convergence behavior. However, we do not provide new theoretical convergence bounds under asynchrony or measurement noise, as developing such bounds would require substantial additional analysis beyond the scope of this systems paper. revision: partial
Referee: [Scaling and Implementation] Scaling and reactivity claims: while compute requirements are stated to be well below modern WAN switch capabilities, the manuscript lacks direct timing comparisons to centralized solvers or measurements of end-to-end reaction latency (including coordination overhead) across the 750-node emulation scenarios.

Authors: We will augment the evaluation section with direct wall-clock timing comparisons between the distributed OnlineTE solves and equivalent centralized solvers on the same hardware. We will also report measured end-to-end reaction latencies from the 750-node emulations, explicitly including the coordination overhead with the central entity. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation applies established decomposition theory to online distributed setting

full rationale

The paper states that OnlineTE builds on the theory of optimization decomposition to create distributed solvers for path-based MLU and Max-flow. This is an application of prior independent theory to a new online, reactive context with switches and a coordinator. No equations, fitted parameters, or self-citations in the abstract reduce the claimed near-optimality or reactivity to a definition or input by construction. The performance claims rest on the decomposition framework's properties under the described coordination, which are externally verifiable and not tautological to the paper's results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the practical applicability of optimization decomposition to TE problems and on the assumption that a central coordinator can maintain consistency without becoming a bottleneck. No free parameters or invented physical entities are introduced in the abstract.

axioms (1)

domain assumption Optimization decomposition yields scalable, near-optimal distributed solvers for the stated TE problems
Invoked to justify the distributed architecture and near-optimality guarantee.

pith-pipeline@v0.9.0 · 5749 in / 1195 out tokens · 53489 ms · 2026-05-19T18:16:02.609354+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

OnlineTE builds on the theory of optimization decomposition to devise scalable, near-optimal, distributed TE solvers for path-based MLU and Max-flow problems... each switch solves part of the optimization, and a central coordinator orchestrates the progress of the switches.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ADMM provably converges quickly to a near-optimal solution... nested ADMM loop... hierarchical ADMM loops

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

[1]

[n. d.]. Arista 7800 Series. https://www.arista.com/assets/data/pdf/ Datasheets/7800R4-Series-AI-Spine-Datasheet.pdf. ([n. d.])

work page
[2]

[n. d.]. Internet Topology Zoo. http://www.topology-zoo.org/. ([n. d.])

work page
[3]

Firas Abuzaid, Srikanth Kandula, Behnaz Arzani, Ishai Menache, Matei Zaharia, and Peter Bailis. 2021. Contracting Wide-area Network Topologies to Solve Flow Problems Quickly. InNSDI

work page 2021
[5]

Rao, Bruno Ribeiro, and Mohit Tawar- malani

Abd AlRhman AlQiam, Yuanjun Yao, Zhaodong Wang, Satyajeet Singh Ahuja, Ying Zhang, Sanjay G. Rao, Bruno Ribeiro, and Mohit Tawar- malani. 2024. Transferable Neural WAN TE for Changing Topolo- gies. InProceedings of the ACM SIGCOMM 2024 Conference (ACM SIG- COMM ’24). Association for Computing Machinery, New York, NY, USA, 86–102. https://doi.org/10.1145/3...

work page doi:10.1145/3651890.3672237 2024
[6]

David Applegate, Mateo Díaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O’Donoghue, and Warren Schudy. 2022. Practical Large- Scale Linear Programming using Primal-Dual Hybrid Gradient. (2022). arXiv:math.OC/2106.04756 https://arxiv.org/abs/2106.04756

work page arXiv 2022
[7]

2015.Parallel and distributed computation: numerical methods

Dimitri Bertsekas and John Tsitsiklis. 2015.Parallel and distributed computation: numerical methods. Athena Scientific

work page 2015
[8]

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine Learning3, 1 (2011), 1–122

work page 2011
[9]

2004.Convex Optimization

Stephen Boyd and Lieven Vandenberghe. 2004.Convex Optimization. Cambridge University Press

work page 2004
[10]

Mung Chiang, Steven H. Low, A. Robert Calderbank, and John C. Doyle. 2007. Layering as Optimization Decomposition: A Mathematical Theory of Network Architectures.Proc. IEEE95, 1 (2007), 255–312. https://doi.org/10.1109/JPROC.2006.887322

work page doi:10.1109/jproc.2006.887322 2007
[11]

Emilie Danna, Avinatan Hassidim, Haim Kaplan, Alok Kumar, Yishay Mansour, Danny Raz, and Michal Segalov. 2012. Upward Max Min Fairness. InINFOCOM

work page 2012
[12]

Emilie Danna, Subhasree Mandal, and Arjun Singh. 2012. A prac- tical algorithm for balancing the max-min fairness and throughput objectives in traffic engineering. InINFOCOM

work page 2012
[13]

Marek Denis, Yuanjun Yao, Ashley Hatch, Qin Zhang, Chiunlin Lim, Shuqiang Zhang, Kyle Sugrue, Henry Kwok, Mikel Jimenez Fernandez, Petr Lapukhov, Sandeep Hebbani, Gaya Nagarajan, Omar Baldonado, Lixin Gao, and Ying Zhang. 2023. EBB: Reliable and Evolvable Express Backbone Network in Meta. InProceedings of the ACM SIGCOMM 2023 Conference, ACM SIGCOMM 2023,...

work page 2023
[14]

Janus: A unified distributed training framework for sparse mixture-of-experts models,

346–359. https://doi.org/10.1145/3603269.3604860

work page doi:10.1145/3603269.3604860
[15]

Low, and Indra Widjaja

Anwar Elwalid, Cheng Jin, Steven H. Low, and Indra Widjaja. 2001. MATE: MPLS Adaptive Traffic Engineering. InINFOCOM

work page 2001
[16]

Jacek Gondzio and Andreas Grothey. 2008. A New Unblocking Technique to Warmstart Interior Point Methods Based on Sensitiv- ity Analysis.Siam journal on optimization19, 3 (2008), 1184–1210. https://doi.org/10.1137/060678129 This paper discusses the theory and implementation of a new warmstarting technique for interior point methods

work page doi:10.1137/060678129 2008
[17]

Zonghao Gu, Edward Rothberg, and Robert Bixby. 2012. Gurobi opti- mizer reference manual, version 5.0.Gurobi Optimization Inc., Houston, USA(2012)

work page 2012
[18]

Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min, and Yi Wang. 2024. RedTE: Mitigating Subsecond Traffic Bursts with Real-Time and Distributed Traffic Engineering. InProceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM 2024, Sydney, NSW, Australia, August 4-8, 2024. ACM, 71–85. https://doi.org/10.1145/ 3651890.3672231

work page arXiv 2024
[19]

Hengquan Guo, Xin Liu, Honghao Wei, and Lei Ying. 2022. Online convex optimization with hard constraints: Towards the best of two worlds and beyond.Advances in Neural Information Processing Systems 35 (2022), 36426–36439

work page 2022
[20]

Jeffrey Haas. 2023. A BGP Cease NOTIFICATION Subcode for Bidi- rectional Forwarding Detection (BFD). RFC 9384. (March 2023). https://doi.org/10.17487/RFC9384

work page doi:10.17487/rfc9384 2023
[21]

Jiayue He, Ma’ayan Bresler, Mung Chiang, and Jennifer Rexford. 2007. Towards Robust Multi-Layer Traffic Engineering: Optimization of Congestion Control and Routing.IEEE Journal on Selected Areas in Communications25, 5 (2007), 868–880. https://doi.org/10.1109/JSAC. 2007.070602

work page doi:10.1109/jsac 2007
[22]

Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving high utilization with software-driven WAN. InSIGCOMM

work page 2013
[23]

Chi-Yao Hong, Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, et al. 2018. B4 and after: managing hierarchy, parti- tioning, and asymmetry for availability and scale in Google’s software- defined WAN. InSIGCOMM

work page 2018
[24]

Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, and Min Zhu. 2013. B4: Experience with a globally-deployed software defined WAN. InSIGCOMM

work page 2013
[25]

Srikanth Kandula, Dina Katabi, Bruce Davie, and Anna Charny. 2005. Walking the tightrope: Responsive yet stable traffic engineering. In SIGCOMM

work page 2005
[26]

Srikanth Kandula, Dina Katabi, Bruce Davie, and Anna Charny. 2005. Walking the Tightrope: Responsive Yet Stable Traffic Engineering. In SIGCOMM

work page 2005
[27]

Alexander Krentsel, Nitika Saran, Bikash Koley, Subhasree Mandal, Ashok Narayanan, Sylvia Ratnasamy, Ali Al-Shabibi, Anees Shaikh, Rob Shakir, Ankit Singla, and Hakim Weatherspoon. 2024. A Decen- tralized SDN Architecture for the WAN. InProceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM 2024, Sydney, NSW, Aus- tralia, August 4-8, 2024. ACM, 938–9...

work page doi:10.1145/3651890 2024
[28]

Umesh Krishnaswamy, Rachee Singh, Nikolaj Bjørner, and Himanshu Raj. 2022. Decentralized cloud wide-area network traffic engineering with BLASTSHIELD. InNSDI

work page 2022
[29]

Bis- sonnette, Nikolaj S

Umesh Krishnaswamy, Rachee Singh, Paul Mattes, Paul-Andre C. Bis- sonnette, Nikolaj S. Bjørner, Zahira Nasrin, Sonal Kothari, Prabhakar Reddy, John Abeln, Srikanth Kandula, Himanshu Raj, Luis Irún-Briz, Jamie Gaudette, and Erica Lan. 2023. OneWAN is better than two: Unifying a split WAN architecture. In20th USENIX Symposium on Networked Systems Design and...

work page 2023
[30]

Ximeng Liu, Shizhen Zhao, Yong Cui, and Xinbing Wang. 2024. FI- GRET: Fine-Grained Robustness-Enhanced Traffic Engineering. In Proceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM 2024, Sydney, NSW, Australia, August 4-8, 2024. ACM, 117–135. https: //doi.org/10.1145/3651890.3672258

work page doi:10.1145/3651890.3672258 2024
[31]

Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. 2012. Trading regret for efficiency: online convex optimization with long term constraints. The Journal of Machine Learning Research13, 1 (2012), 2503–2528. 13

work page 2012
[32]

Congcong Miao, Zhizhen Zhong, Yunming Xiao, Feng Yang, Senkuo Zhang, Yinan Jiang, Zizhuo Bai, Chaodong Lu, Jingyi Geng, Zekun He, Yachen Wang, Xianneng Zou, and Chuanchuan Yang. 2024. MegaTE: Extending WAN Traffic Engineering to Millions of Endpoints in Vir- tualized Cloud. InProceedings of the ACM SIGCOMM 2024 Conference (ACM SIGCOMM ’24). Association fo...

work page doi:10.1145/3651890.3672242 2024
[33]

J Mirkovic, B Kocoloski, and D Balenson. 2024. Enabling reproducibility through the SPHERE research infrastructure. InUsenix ;login Maga- zine

work page 2024
[34]

Pooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Hi- manshu Raj, and Srikanth Kandula. 2022. Minding the gap between fast heuristics and their optimal counterparts. InProceedings of the 21st ACM Workshop on Hot Topics in Networks (HotNets ’22). Asso- ciation for Computing Machinery, New York, NY, USA, 138–144. https://doi.org/10.1145/3563766.3564102

work page doi:10.1145/3563766.3564102 2022
[35]

Pooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Hi- manshu Raj, Umesh Krishnaswamy, Ramesh Govindan, and Srikanth Kandula. 2024. Finding Adversarial Inputs for Heuristics using Multi- level Optimization. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). https://arxiv.org/abs/2311.12779

work page arXiv 2024
[36]

Pooria Namyar, Behnaz Arzani, Srikanth Kandula, Santiago Segarra, Daniel Crankshaw, Umesh Krishnaswamy, Ramesh Govindan, and Himanshu Raj. 2024. Solving Max-Min Fair Resource Allocations Quickly on Large Graphs. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). https://arxiv.org/abs/ 2310.09699

work page arXiv 2024
[37]

Pooria Namyar, Michael Schapira, Ramesh Govindan, Santiago Segarra, Ryan Beckett, Siva Kesava Reddy Kakarla, and Behnaz Arzani. 2024. End-to-End Performance Analysis of Learning-enabled Systems. In Proceedings of the 23rd ACM Workshop on Hot Topics in Networks (HOT- NETS ’24). Association for Computing Machinery, New York, NY, USA, 86–94. https://doi.org/...

work page doi:10.1145/3696348.3696875 2024
[38]

Deepak Narayanan, Fiodar Kazhamiaka, Firas Abuzaid, Peter Kraft, Akshay Agrawal, Srikanth Kandula, Stephen Boyd, and Matei Zaharia

work page
[39]

Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP. InSOSP

work page
[40]

Neal Parikh and Stephen Boyd. 2014. Proximal Algorithms.Found. Trends Optim.1, 3 (Jan. 2014), 127–239. https://doi.org/10.1561/ 2400000003

work page 2014
[41]

Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai Menache, Michael Schapira, and Aviv Tamar. 2023. DOTE: Rethinking (Predictive) WAN Traffic Engineering. In20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, Boston, MA, April 17-19, 2023. 1557–1581. https://www.usenix.org/conference/ nsdi23/presentation/perry

work page 2023
[42]

Leon Poutievski, Omid Mashayekhi, Joon Ong, Arjun Singh, Muham- mad Mukarram Bin Tariq, Rui Wang, Jianan Zhang, Virginia Beaure- gard, Patrick Conner, Steve D. Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawh- ney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, and Amin ...

work page doi:10.1145/3544216.3544265 2022
[43]

A. M. Rush and M. J. Collins. 2012. A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Process- ing.Journal of Artificial Intelligence Research45 (Oct. 2012), 305–362. https://doi.org/10.1613/jair.3680

work page doi:10.1613/jair.3680 2012
[44]

Suhas Jayaram Subramanya, Don Kurian Dennis, Virginia Smith, and Gregory R. Ganger. 2025. COpter: Efficient Large-Scale Resource- Allocation via Continual Optimization. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 322–340. https://doi.org/10.1145/3731569.3764846

work page doi:10.1145/3731569.3764846 2025
[45]

Ozdaglar

Ermin Wei and Asuman E. Ozdaglar. 2013. On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipli- ers.2013 IEEE Global Conference on Signal and Information Processing (2013), 551–554. https://api.semanticscholar.org/CorpusID:7339114

work page 2013
[46]

Stephen J. Wright. 2015. Coordinate Descent Algorithms. (2015). arXiv:math.OC/1502.04759 https://arxiv.org/abs/1502.04759

work page internal anchor Pith review Pith/arXiv arXiv 2015
[47]

Zhiying Xu, Minlan Yu, and Francis Y. Yan. 2025. Decouple and Decompose: Scaling Resource Allocation with DeDe. (2025). arXiv:cs.DC/2412.11447 https://arxiv.org/abs/2412.11447

work page arXiv 2025
[48]

Yan, Hongzi Zhao, Yan Wang, Zhixiong Liu, Yinghao Li, and Ming Zhang

Francis Y. Yan, Hongzi Zhao, Yan Wang, Zhixiong Liu, Yinghao Li, and Ming Zhang. 2023. Teal: Learning-Accelerated Optimization of WAN Traffic Engineering.SIGCOMM(2023). https://dl.acm.org/doi/10.1145/ 3602593.3623910 Traffic Engineering

work page arXiv 2023
[49]

Ruiliang Zhang and James T Kwok. 2014. Asynchronous distributed ADMM for consensus optimization. InProceedings of the 31st Inter- national Conference on International Conference on Machine Learning (ICML), Vol. 32. 1701–1709. 14 A Formal Definitions of Regret We measure theregretof OnlineTE and the baselines in two ways: • Objective Regretis accumulated s...

work page 2014

[1] [1]

[n. d.]. Arista 7800 Series. https://www.arista.com/assets/data/pdf/ Datasheets/7800R4-Series-AI-Spine-Datasheet.pdf. ([n. d.])

work page

[2] [2]

[n. d.]. Internet Topology Zoo. http://www.topology-zoo.org/. ([n. d.])

work page

[3] [3]

Firas Abuzaid, Srikanth Kandula, Behnaz Arzani, Ishai Menache, Matei Zaharia, and Peter Bailis. 2021. Contracting Wide-area Network Topologies to Solve Flow Problems Quickly. InNSDI

work page 2021

[4] [5]

Rao, Bruno Ribeiro, and Mohit Tawar- malani

Abd AlRhman AlQiam, Yuanjun Yao, Zhaodong Wang, Satyajeet Singh Ahuja, Ying Zhang, Sanjay G. Rao, Bruno Ribeiro, and Mohit Tawar- malani. 2024. Transferable Neural WAN TE for Changing Topolo- gies. InProceedings of the ACM SIGCOMM 2024 Conference (ACM SIG- COMM ’24). Association for Computing Machinery, New York, NY, USA, 86–102. https://doi.org/10.1145/3...

work page doi:10.1145/3651890.3672237 2024

[5] [6]

David Applegate, Mateo Díaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O’Donoghue, and Warren Schudy. 2022. Practical Large- Scale Linear Programming using Primal-Dual Hybrid Gradient. (2022). arXiv:math.OC/2106.04756 https://arxiv.org/abs/2106.04756

work page arXiv 2022

[6] [7]

2015.Parallel and distributed computation: numerical methods

Dimitri Bertsekas and John Tsitsiklis. 2015.Parallel and distributed computation: numerical methods. Athena Scientific

work page 2015

[7] [8]

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine Learning3, 1 (2011), 1–122

work page 2011

[8] [9]

2004.Convex Optimization

Stephen Boyd and Lieven Vandenberghe. 2004.Convex Optimization. Cambridge University Press

work page 2004

[9] [10]

Mung Chiang, Steven H. Low, A. Robert Calderbank, and John C. Doyle. 2007. Layering as Optimization Decomposition: A Mathematical Theory of Network Architectures.Proc. IEEE95, 1 (2007), 255–312. https://doi.org/10.1109/JPROC.2006.887322

work page doi:10.1109/jproc.2006.887322 2007

[10] [11]

Emilie Danna, Avinatan Hassidim, Haim Kaplan, Alok Kumar, Yishay Mansour, Danny Raz, and Michal Segalov. 2012. Upward Max Min Fairness. InINFOCOM

work page 2012

[11] [12]

Emilie Danna, Subhasree Mandal, and Arjun Singh. 2012. A prac- tical algorithm for balancing the max-min fairness and throughput objectives in traffic engineering. InINFOCOM

work page 2012

[12] [13]

Marek Denis, Yuanjun Yao, Ashley Hatch, Qin Zhang, Chiunlin Lim, Shuqiang Zhang, Kyle Sugrue, Henry Kwok, Mikel Jimenez Fernandez, Petr Lapukhov, Sandeep Hebbani, Gaya Nagarajan, Omar Baldonado, Lixin Gao, and Ying Zhang. 2023. EBB: Reliable and Evolvable Express Backbone Network in Meta. InProceedings of the ACM SIGCOMM 2023 Conference, ACM SIGCOMM 2023,...

work page 2023

[13] [14]

Janus: A unified distributed training framework for sparse mixture-of-experts models,

346–359. https://doi.org/10.1145/3603269.3604860

work page doi:10.1145/3603269.3604860

[14] [15]

Low, and Indra Widjaja

Anwar Elwalid, Cheng Jin, Steven H. Low, and Indra Widjaja. 2001. MATE: MPLS Adaptive Traffic Engineering. InINFOCOM

work page 2001

[15] [16]

Jacek Gondzio and Andreas Grothey. 2008. A New Unblocking Technique to Warmstart Interior Point Methods Based on Sensitiv- ity Analysis.Siam journal on optimization19, 3 (2008), 1184–1210. https://doi.org/10.1137/060678129 This paper discusses the theory and implementation of a new warmstarting technique for interior point methods

work page doi:10.1137/060678129 2008

[16] [17]

Zonghao Gu, Edward Rothberg, and Robert Bixby. 2012. Gurobi opti- mizer reference manual, version 5.0.Gurobi Optimization Inc., Houston, USA(2012)

work page 2012

[17] [18]

Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min, and Yi Wang. 2024. RedTE: Mitigating Subsecond Traffic Bursts with Real-Time and Distributed Traffic Engineering. InProceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM 2024, Sydney, NSW, Australia, August 4-8, 2024. ACM, 71–85. https://doi.org/10.1145/ 3651890.3672231

work page arXiv 2024

[18] [19]

Hengquan Guo, Xin Liu, Honghao Wei, and Lei Ying. 2022. Online convex optimization with hard constraints: Towards the best of two worlds and beyond.Advances in Neural Information Processing Systems 35 (2022), 36426–36439

work page 2022

[19] [20]

Jeffrey Haas. 2023. A BGP Cease NOTIFICATION Subcode for Bidi- rectional Forwarding Detection (BFD). RFC 9384. (March 2023). https://doi.org/10.17487/RFC9384

work page doi:10.17487/rfc9384 2023

[20] [21]

Jiayue He, Ma’ayan Bresler, Mung Chiang, and Jennifer Rexford. 2007. Towards Robust Multi-Layer Traffic Engineering: Optimization of Congestion Control and Routing.IEEE Journal on Selected Areas in Communications25, 5 (2007), 868–880. https://doi.org/10.1109/JSAC. 2007.070602

work page doi:10.1109/jsac 2007

[21] [22]

Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving high utilization with software-driven WAN. InSIGCOMM

work page 2013

[22] [23]

Chi-Yao Hong, Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, et al. 2018. B4 and after: managing hierarchy, parti- tioning, and asymmetry for availability and scale in Google’s software- defined WAN. InSIGCOMM

work page 2018

[23] [24]

Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, and Min Zhu. 2013. B4: Experience with a globally-deployed software defined WAN. InSIGCOMM

work page 2013

[24] [25]

Srikanth Kandula, Dina Katabi, Bruce Davie, and Anna Charny. 2005. Walking the tightrope: Responsive yet stable traffic engineering. In SIGCOMM

work page 2005

[25] [26]

Srikanth Kandula, Dina Katabi, Bruce Davie, and Anna Charny. 2005. Walking the Tightrope: Responsive Yet Stable Traffic Engineering. In SIGCOMM

work page 2005

[26] [27]

Alexander Krentsel, Nitika Saran, Bikash Koley, Subhasree Mandal, Ashok Narayanan, Sylvia Ratnasamy, Ali Al-Shabibi, Anees Shaikh, Rob Shakir, Ankit Singla, and Hakim Weatherspoon. 2024. A Decen- tralized SDN Architecture for the WAN. InProceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM 2024, Sydney, NSW, Aus- tralia, August 4-8, 2024. ACM, 938–9...

work page doi:10.1145/3651890 2024

[27] [28]

Umesh Krishnaswamy, Rachee Singh, Nikolaj Bjørner, and Himanshu Raj. 2022. Decentralized cloud wide-area network traffic engineering with BLASTSHIELD. InNSDI

work page 2022

[28] [29]

Bis- sonnette, Nikolaj S

Umesh Krishnaswamy, Rachee Singh, Paul Mattes, Paul-Andre C. Bis- sonnette, Nikolaj S. Bjørner, Zahira Nasrin, Sonal Kothari, Prabhakar Reddy, John Abeln, Srikanth Kandula, Himanshu Raj, Luis Irún-Briz, Jamie Gaudette, and Erica Lan. 2023. OneWAN is better than two: Unifying a split WAN architecture. In20th USENIX Symposium on Networked Systems Design and...

work page 2023

[29] [30]

Ximeng Liu, Shizhen Zhao, Yong Cui, and Xinbing Wang. 2024. FI- GRET: Fine-Grained Robustness-Enhanced Traffic Engineering. In Proceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM 2024, Sydney, NSW, Australia, August 4-8, 2024. ACM, 117–135. https: //doi.org/10.1145/3651890.3672258

work page doi:10.1145/3651890.3672258 2024

[30] [31]

Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. 2012. Trading regret for efficiency: online convex optimization with long term constraints. The Journal of Machine Learning Research13, 1 (2012), 2503–2528. 13

work page 2012

[31] [32]

Congcong Miao, Zhizhen Zhong, Yunming Xiao, Feng Yang, Senkuo Zhang, Yinan Jiang, Zizhuo Bai, Chaodong Lu, Jingyi Geng, Zekun He, Yachen Wang, Xianneng Zou, and Chuanchuan Yang. 2024. MegaTE: Extending WAN Traffic Engineering to Millions of Endpoints in Vir- tualized Cloud. InProceedings of the ACM SIGCOMM 2024 Conference (ACM SIGCOMM ’24). Association fo...

work page doi:10.1145/3651890.3672242 2024

[32] [33]

J Mirkovic, B Kocoloski, and D Balenson. 2024. Enabling reproducibility through the SPHERE research infrastructure. InUsenix ;login Maga- zine

work page 2024

[33] [34]

Pooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Hi- manshu Raj, and Srikanth Kandula. 2022. Minding the gap between fast heuristics and their optimal counterparts. InProceedings of the 21st ACM Workshop on Hot Topics in Networks (HotNets ’22). Asso- ciation for Computing Machinery, New York, NY, USA, 138–144. https://doi.org/10.1145/3563766.3564102

work page doi:10.1145/3563766.3564102 2022

[34] [35]

Pooria Namyar, Behnaz Arzani, Ryan Beckett, Santiago Segarra, Hi- manshu Raj, Umesh Krishnaswamy, Ramesh Govindan, and Srikanth Kandula. 2024. Finding Adversarial Inputs for Heuristics using Multi- level Optimization. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). https://arxiv.org/abs/2311.12779

work page arXiv 2024

[35] [36]

Pooria Namyar, Behnaz Arzani, Srikanth Kandula, Santiago Segarra, Daniel Crankshaw, Umesh Krishnaswamy, Ramesh Govindan, and Himanshu Raj. 2024. Solving Max-Min Fair Resource Allocations Quickly on Large Graphs. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). https://arxiv.org/abs/ 2310.09699

work page arXiv 2024

[36] [37]

Pooria Namyar, Michael Schapira, Ramesh Govindan, Santiago Segarra, Ryan Beckett, Siva Kesava Reddy Kakarla, and Behnaz Arzani. 2024. End-to-End Performance Analysis of Learning-enabled Systems. In Proceedings of the 23rd ACM Workshop on Hot Topics in Networks (HOT- NETS ’24). Association for Computing Machinery, New York, NY, USA, 86–94. https://doi.org/...

work page doi:10.1145/3696348.3696875 2024

[37] [38]

Deepak Narayanan, Fiodar Kazhamiaka, Firas Abuzaid, Peter Kraft, Akshay Agrawal, Srikanth Kandula, Stephen Boyd, and Matei Zaharia

work page

[38] [39]

Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP. InSOSP

work page

[39] [40]

Neal Parikh and Stephen Boyd. 2014. Proximal Algorithms.Found. Trends Optim.1, 3 (Jan. 2014), 127–239. https://doi.org/10.1561/ 2400000003

work page 2014

[40] [41]

Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai Menache, Michael Schapira, and Aviv Tamar. 2023. DOTE: Rethinking (Predictive) WAN Traffic Engineering. In20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, Boston, MA, April 17-19, 2023. 1557–1581. https://www.usenix.org/conference/ nsdi23/presentation/perry

work page 2023

[41] [42]

Leon Poutievski, Omid Mashayekhi, Joon Ong, Arjun Singh, Muham- mad Mukarram Bin Tariq, Rui Wang, Jianan Zhang, Virginia Beaure- gard, Patrick Conner, Steve D. Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawh- ney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, and Amin ...

work page doi:10.1145/3544216.3544265 2022

[42] [43]

A. M. Rush and M. J. Collins. 2012. A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Process- ing.Journal of Artificial Intelligence Research45 (Oct. 2012), 305–362. https://doi.org/10.1613/jair.3680

work page doi:10.1613/jair.3680 2012

[43] [44]

Suhas Jayaram Subramanya, Don Kurian Dennis, Virginia Smith, and Gregory R. Ganger. 2025. COpter: Efficient Large-Scale Resource- Allocation via Continual Optimization. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 322–340. https://doi.org/10.1145/3731569.3764846

work page doi:10.1145/3731569.3764846 2025

[44] [45]

Ozdaglar

Ermin Wei and Asuman E. Ozdaglar. 2013. On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipli- ers.2013 IEEE Global Conference on Signal and Information Processing (2013), 551–554. https://api.semanticscholar.org/CorpusID:7339114

work page 2013

[45] [46]

Stephen J. Wright. 2015. Coordinate Descent Algorithms. (2015). arXiv:math.OC/1502.04759 https://arxiv.org/abs/1502.04759

work page internal anchor Pith review Pith/arXiv arXiv 2015

[46] [47]

Zhiying Xu, Minlan Yu, and Francis Y. Yan. 2025. Decouple and Decompose: Scaling Resource Allocation with DeDe. (2025). arXiv:cs.DC/2412.11447 https://arxiv.org/abs/2412.11447

work page arXiv 2025

[47] [48]

Yan, Hongzi Zhao, Yan Wang, Zhixiong Liu, Yinghao Li, and Ming Zhang

Francis Y. Yan, Hongzi Zhao, Yan Wang, Zhixiong Liu, Yinghao Li, and Ming Zhang. 2023. Teal: Learning-Accelerated Optimization of WAN Traffic Engineering.SIGCOMM(2023). https://dl.acm.org/doi/10.1145/ 3602593.3623910 Traffic Engineering

work page arXiv 2023

[48] [49]

Ruiliang Zhang and James T Kwok. 2014. Asynchronous distributed ADMM for consensus optimization. InProceedings of the 31st Inter- national Conference on International Conference on Machine Learning (ICML), Vol. 32. 1701–1709. 14 A Formal Definitions of Regret We measure theregretof OnlineTE and the baselines in two ways: • Objective Regretis accumulated s...

work page 2014