arxiv: 2604.22146 · v2 · submitted 2026-04-24 · 💻 cs.DC

Recognition: unknown

O(K)-Approximation Coflow Scheduling in K-Core Optical Circuit Switching Networks

Xin Wang , Hong Shen , Hui Tian , Ye Tao

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:59 UTC · model grok-4.3

classification 💻 cs.DC

keywords coflow schedulingoptical circuit switchingmulti-core OCSapproximation algorithmdata center networksweighted completion timereconfiguration model

0 comments

The pith

An algorithm using LP-guided ordering and cross-core allocation schedules coflows in K-core optical networks with 8K-approximation ratios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a scheduling method for coflows, which represent groups of dependent flows in distributed computing jobs, in networks built from multiple independent optical circuit switch cores. The method tackles the difficulties of splitting traffic across cores and then setting up circuits inside each core without stopping everything at once. It combines linear programming to decide the global order of coflows, allocation of flows between cores, and per-core scheduling to respect port limits and switch reconfiguration delays. A sympathetic reader would care because optical switches promise high speed and low energy use in data centers, but poor scheduling wastes that potential and slows down large jobs. If the bounds hold, operators gain a reliable way to expand bandwidth by adding cores without redesigning the entire system.

Core claim

The central claim is that an efficient algorithm integrating linear programming-guided global coflow ordering, inter-core flow allocation, and intra-core circuit scheduling achieves approximation ratios of 8K for coflows with zero release times and 8K+1 for arbitrary release times in K-core OCS networks under the asynchronous not-all-stop reconfiguration model, thereby minimizing the total weighted coflow completion time while satisfying port exclusivity and reconfiguration overhead constraints. This framework also applies to H-core electrical packet switching networks with ratios 4H and 4H+1.

What carries the argument

LP-guided global coflow ordering integrated with inter-core flow allocation and intra-core circuit scheduling

Load-bearing premise

The reconfiguration model allows asynchronous operation where not all cores stop at once, and cross-core traffic can be allocated independently while maintaining port exclusivity.

What would settle it

A set of coflow instances with known optimal completion times where running the algorithm produces a schedule whose weighted CCT exceeds 8K times the optimum for zero-release cases.

Figures

Figures reproduced from arXiv: 2604.22146 by Hong Shen, Hui Tian, Xin Wang, Ye Tao.

**Figure 1.** Figure 1: Heterogeneous Multi-Core DCN Architecture view at source ↗

**Figure 2.** Figure 2: Circuit Reconfiguration Models in an OCS Core view at source ↗

**Figure 3.** Figure 3: Normalized Total Weighted CCT and Tail CCT view at source ↗

**Figure 4.** Figure 4: CDF of Normalized Weighted CCT for K = 3, 4, 5. these real-world workloads, thereby yielding a slightly lower empirically weighted CCT. 2) Performance Evaluation: To further examine the stability of the relative performance, view at source ↗

**Figure 6.** Figure 6: Approximation Ratio versus Reconfiguration Delay view at source ↗

read the original abstract

Coflow has emerged as a fundamental application-layer abstraction in distributed systems, representing communication dependencies and enabling collaborative management of related flows to enhance job completion efficiency. To meet the increasing bandwidth demands of modern data center networks (DCNs), optical circuit switches are widely deployed due to their high capacity and energy efficiency. Simultaneously, DCN deployments are evolving towards heterogeneous parallel architectures, where multiple independent optical circuit switching (OCS) cores operate concurrently to facilitate bandwidth expansion and incremental upgrades. However, existing research on coflow scheduling in multi-core switching fabrics primarily focuses on electrical packet switching (EPS) networks, with a few known results on OCS networks without or with a poor performance guarantee. This paper studies the coflow scheduling problem in multi-core OCS networks under the not-all-stop (i.e., asynchronous) reconfiguration model, focusing on two major challenges of overcoming cross-core coupling for inter-core traffic allocation and satisfying the constraints of port exclusivity and reconfiguration overhead for intra-core circuit scheduling. To minimize total weighted coflow completion time (CCT), we propose an efficient algorithm by integrating linear programming-guided (LP-guided) global coflow ordering, inter-core flow allocation and intra-core circuit scheduling that achieves approximation ratios of 8K and 8K+1 for zero and arbitrary release times of coflows, respectively, where K is the number of OCS cores. This framework is also applicable to H-core EPS networks, providing approximation guarantees of 4H and 4H+1 for zero-time and arbitrary-time release, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first stated approximation bounds for coflow scheduling in multi-core OCS networks under the asynchronous model, but the charging argument for reconfiguration delays looks like it could slip under independent per-core schedules.

read the letter

The headline result is an algorithm that orders coflows via LP, allocates flows across K cores, and schedules circuits inside each core to get 8K approximation for zero release times and 8K+1 for arbitrary releases. It also supplies 4H and 4H+1 bounds for the same problem on H-core electrical packet switches. That combination of LP ordering plus the two-stage allocation and scheduling steps is the concrete new piece; earlier OCS coflow work either lacked guarantees or stayed in the single-core case.

Referee Report

1 major / 1 minor

Summary. The paper studies coflow scheduling to minimize total weighted completion time in K-core OCS networks under the not-all-stop (asynchronous) reconfiguration model. It proposes an algorithm that combines LP-guided global coflow ordering, inter-core flow allocation, and intra-core circuit scheduling, claiming approximation ratios of 8K (zero release times) and 8K+1 (arbitrary release times). The same framework yields 4H and 4H+1 ratios for H-core EPS networks.

Significance. If the claimed ratios are rigorously established, the result is significant: it supplies the first non-trivial approximation guarantees for coflow scheduling in multi-core OCS fabrics, directly addressing cross-core coupling and asynchronous reconfiguration overheads that prior EPS-focused work does not handle. The extension to EPS networks is a useful byproduct. The constructive LP-plus-scheduling approach could inform practical DCN designs with heterogeneous optical cores.

major comments (1)

[approximation analysis / proof of Theorem for 8K ratio] The charging argument establishing the 8K (resp. 8K+1) factor must be examined in the section presenting the approximation analysis. Under the not-all-stop model, each of the K cores reconfigures independently; port-exclusivity constraints on inter-core traffic can force a coflow to wait for the sum of per-core reconfiguration intervals rather than a single bounded interval. If the analysis charges only one reconfiguration per coflow (or per core) without an explicit argument that the global ordering plus intra-core scheduler prevents accumulation, the worst-case ratio can exceed O(K).

minor comments (1)

[abstract / introduction] The abstract states the ratios but the manuscript should include a brief proof sketch or high-level charging outline already in the introduction or abstract to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for identifying a point that merits explicit clarification in the approximation analysis. We address the concern regarding the charging argument and potential accumulation of reconfiguration delays under the not-all-stop model below.

read point-by-point responses

Referee: [approximation analysis / proof of Theorem for 8K ratio] The charging argument establishing the 8K (resp. 8K+1) factor must be examined in the section presenting the approximation analysis. Under the not-all-stop model, each of the K cores reconfigures independently; port-exclusivity constraints on inter-core traffic can force a coflow to wait for the sum of per-core reconfiguration intervals rather than a single bounded interval. If the analysis charges only one reconfiguration per coflow (or per core) without an explicit argument that the global ordering plus intra-core scheduler prevents accumulation, the worst-case ratio can exceed O(K).

Authors: We appreciate the referee raising this subtlety. Our LP-guided global ordering (Section 3.1) produces a total order on coflows that is respected by every core's intra-core scheduler. Because the ordering is global, when a coflow becomes eligible on a given core, all preceding coflows have already completed their transmissions on that core (including any reconfiguration). The inter-core allocation step (Section 3.2) further ensures that each coflow's traffic is partitioned across cores so that its bottleneck is determined by the maximum per-core load rather than the sum. Consequently, the reconfiguration delay experienced by any coflow is charged at most once per core in the analysis, but the global order prevents these per-core reconfigurations from accumulating additively across the K cores for the same coflow; instead, they are overlapped with the transmission of earlier coflows on other cores. The 8K factor arises from (i) an 8-approximation for the single-core OCS subproblem (inherited from prior work) multiplied by (ii) a K-factor that bounds the number of cores any single coflow can touch, with the charging scheme (detailed in the proof of Theorem 4) explicitly using the LP solution to bound the total waiting time. We agree that the current write-up would benefit from a dedicated remark or short lemma that isolates this non-accumulation property. We will add such a clarifying paragraph (and a pointer to the relevant lines in the proof) in the revised version. revision: partial

Circularity Check

0 steps flagged

No circularity: standard LP-based approximation analysis

full rationale

The derivation presents a constructive algorithm that solves an LP relaxation for coflow ordering, then performs inter-core allocation and intra-core scheduling under the asynchronous not-all-stop model. The 8K (and 8K+1) ratios are obtained by charging the algorithm's weighted CCT against the LP optimum, which is a standard technique that does not reduce to self-definition, fitted inputs renamed as predictions, or self-citation load-bearing steps. No equations or steps in the provided abstract or description equate the claimed ratio to its own inputs by construction. The approach remains self-contained against external benchmarks for approximation algorithms.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract does not introduce new free parameters or invented entities; it relies on standard domain assumptions about OCS reconfiguration and coflow semantics.

axioms (1)

domain assumption Not-all-stop (asynchronous) reconfiguration model for multi-core OCS networks
Explicitly stated as the model under which the scheduling problem is studied.

pith-pipeline@v0.9.0 · 5579 in / 1301 out tokens · 49224 ms · 2026-05-08T09:59:45.329086+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Mapreduce: simplified data processing on large clusters,

J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,”Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008

2008
[2]

Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in9th {USENIX}Symposium on Networked Systems Design and Implementa- tion ({NSDI}12), 2012, pp. 15–28

2012
[3]

Dryad: distributed data-parallel programs from sequential building blocks,

M. Isard, M. Budiu, Y . Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, 2007, pp. 59–72

2007
[4]

Coflow: A networking abstraction for cluster applications,

M. Chowdhury and I. Stoica, “Coflow: A networking abstraction for cluster applications,” inProceedings of the 11th ACM Workshop on Hot Topics in Networks, 2012, pp. 31–36

2012
[5]

Manag- ing data transfers in computer clusters with orchestra,

M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, “Manag- ing data transfers in computer clusters with orchestra,”ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 98–109, 2011

2011
[6]

Scheduling coflows with dependency graph,

M. Shafiee and J. Ghaderi, “Scheduling coflows with dependency graph,” IEEE/ACM Transactions on Networking, vol. 30, no. 1, pp. 450–463, 2021

2021
[7]

Efficient coflow scheduling with varys,

M. Chowdhury, Y . Zhong, and I. Stoica, “Efficient coflow scheduling with varys,” inProceedings of the 2014 ACM conference on SIGCOMM, 2014, pp. 443–454

2014
[8]

Decentralized task-aware scheduling for data center networks,

F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron, “Decentralized task-aware scheduling for data center networks,”ACM SIGCOMM Computer Communication Review, vol. 44, no. 4, pp. 431–442, 2014

2014
[9]

Minimizing average coflow completion time with decentralized scheduling,

S. Luo, H. Yu, Y . Zhao, B. Wu, S. Wanget al., “Minimizing average coflow completion time with decentralized scheduling,” in2015 IEEE International Conference on Communications (ICC). IEEE, 2015, pp. 307–312

2015
[10]

Efficient coflow scheduling without prior knowledge,

M. Chowdhury and I. Stoica, “Efficient coflow scheduling without prior knowledge,”ACM SIGCOMM Computer Communication Review, vol. 45, no. 4, pp. 393–406, 2015

2015
[11]

Coda: Toward automatically identifying and scheduling coflows in the dark,

H. Zhang, L. Chen, B. Yi, K. Chen, M. Chowdhury, and Y . Geng, “Coda: Toward automatically identifying and scheduling coflows in the dark,” in Proceedings of the 2016 ACM SIGCOMM Conference, 2016, pp. 160– 173

2016
[12]

Online scheduling of coflows by attention- empowered scalable deep reinforcement learning,

X. Wang and H. Shen, “Online scheduling of coflows by attention- empowered scalable deep reinforcement learning,”Future Generation Computer Systems, vol. 146, pp. 195–206, 2023

2023
[13]

Efficient and fair: Information-agnostic online coflow scheduling by combining limited multiplexing with drl,

X. Wang, H. Shen, and H. Tian, “Efficient and fair: Information-agnostic online coflow scheduling by combining limited multiplexing with drl,” IEEE Transactions on Network and Service Management, vol. 20, no. 4, pp. 4572–4584, 2023

2023
[14]

Minimizing the total weighted com- pletion time of coflows in datacenter networks,

Z. Qiu, C. Stein, and Y . Zhong, “Minimizing the total weighted com- pletion time of coflows in datacenter networks,” inProceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures, 2015, pp. 294–303

2015
[15]

An improved bound for minimizing the total weighted completion time of coflows in datacenters,

M. Shafiee and J. Ghaderi, “An improved bound for minimizing the total weighted completion time of coflows in datacenters,”IEEE/ACM Transactions on Networking, vol. 26, no. 4, pp. 1674–1687, 2018

2018
[16]

Efficient scheduling of weighted coflows in data centers,

Z. Wang, H. Zhang, X. Shi, X. Yin, Y . Li, H. Geng, Q. Wu, and J. Liu, “Efficient scheduling of weighted coflows in data centers,”IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 9, pp. 2003–2017, 2019

2003
[17]

Omco: Online multiple coflow scheduling in optical circuit switch,

C. Xu, H. Tan, J. Hou, C. Zhang, and X.-Y . Li, “Omco: Online multiple coflow scheduling in optical circuit switch,” in2018 IEEE International Conference on Communications (ICC). IEEE, 2018, pp. 1–6

2018
[18]

Reco: Efficient regularization-based coflow scheduling in optical circuit switches,

C. Zhang, H. Tan, C. Xu, X.-Y . Li, S. Tang, and Y . Li, “Reco: Efficient regularization-based coflow scheduling in optical circuit switches,” in 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2019, pp. 111–121

2019
[19]

Regularization- based coflow scheduling in optical circuit switches,

H. Tan, C. Zhang, C. Xu, Y . Li, Z. Han, and X.-Y . Li, “Regularization- based coflow scheduling in optical circuit switches,”IEEE/ACM Trans- actions on Networking, vol. 29, no. 3, pp. 1280–1293, 2021

2021
[20]

Sunflow: Efficient optical circuit scheduling for coflows,

X. S. Huang, X. S. Sun, and T. E. Ng, “Sunflow: Efficient optical circuit scheduling for coflows,” inProceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies, 2016, pp. 297–311

2016
[21]

Minimizing coflow completion time in optical circuit switched networks,

T. Zhang, F. Ren, J. Bao, R. Shu, and W. Cheng, “Minimizing coflow completion time in optical circuit switched networks,”IEEE Transac- tions on Parallel and Distributed Systems, vol. 32, no. 2, pp. 457–469, 2020

2020
[22]

Scheduling coflows in hybrid optical- circuit and electrical-packet switches with performance guarantee,

X. Wang, H. Shen, and H. Tian, “Scheduling coflows in hybrid optical- circuit and electrical-packet switches with performance guarantee,” IEEE/ACM Transactions on Networking, vol. 32, no. 3, pp. 2299–2314, 2024

2024
[23]

Co-scheduler: Accelerating data-parallel jobs in datacenter networks with optical circuit switching,

Z. Li and H. Shen, “Co-scheduler: Accelerating data-parallel jobs in datacenter networks with optical circuit switching,” in2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2019, pp. 186–195

2019
[24]

Optimal partitioning of traffic demand for coflow scheduling in hybrid switches,

X. Wang, H. Shen, and H. Tian, “Optimal partitioning of traffic demand for coflow scheduling in hybrid switches,”IEEE Transactions on Network and Service Management, 2025

2025
[25]

Effective coflow scheduling in hybrid circuit and packet switching networks,

R. Jiang, T. Zhang, and C. Yi, “Effective coflow scheduling in hybrid circuit and packet switching networks,” in2023 IEEE Symposium on Computers and Communications (ISCC). IEEE, 2023, pp. 1156–1161

2023
[26]

The future is 40 gigabit ethernet,

Cisco White Paper. (2016), “The future is 40 gigabit ethernet,” https://www.cisco.com/c/dam/en/us/products/collateral/switches/catalyst- 6500-series-switches/white-paper-c11-737238.pdf

2016
[27]

Cisco global cloud in- dex: Forecast and methodology, 2015–2020,

Cisco. (2016), “Cisco global cloud in- dex: Forecast and methodology, 2015–2020,” https://www.cisco.com/c/dam/en/us/solutions/collateral/service- provider/global-cloud-index-gci/white-paper-c11-738085.pdf

2016
[28]

Weaver: Efficient coflow scheduling in heterogeneous parallel networks,

X. S. Huang, Y . Xia, and T. E. Ng, “Weaver: Efficient coflow scheduling in heterogeneous parallel networks,” in2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020, pp. 1071–1081

2020
[29]

Efficient approximation algorithms for scheduling coflows with total weighted completion time in identical parallel networks,

C.-Y . Chen, “Efficient approximation algorithms for scheduling coflows with total weighted completion time in identical parallel networks,” IEEE Transactions on Cloud Computing, vol. 12, no. 1, pp. 116–129, 2023

2023
[30]

Jupiter evolving: transforming google’s datacenter network via optical circuit switches and software-defined networking,

L. Poutievski, O. Mashayekhi, J. Ong, A. Singh, M. Tariq, R. Wang, J. Zhang, V . Beauregard, P. Conner, S. Gribbleet al., “Jupiter evolving: transforming google’s datacenter network via optical circuit switches and software-defined networking,” inProceedings of the ACM SIGCOMM 2022 Conference, 2022, pp. 66–85

2022
[31]

Scheduling Coflows in Multi-Core OCS Networks with Performance Guarantee

X. Wang, H. Shen, H. Tian, and D. Wang, “Scheduling coflows in multi-core ocs networks with performance guarantee,” arXiv preprint arXiv:2604.08242, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2604.08242

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.08242 2026
[32]

Brief announcement: Improved approxi- mation algorithms for scheduling co-flows,

S. Khuller and M. Purohit, “Brief announcement: Improved approxi- mation algorithms for scheduling co-flows,” inProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016, pp. 239–240

2016
[33]

Scheduling coflows for minimizing the total weighted completion time in heterogeneous parallel networks,

C.-Y . Chen, “Scheduling coflows for minimizing the total weighted completion time in heterogeneous parallel networks,”Journal of Parallel and Distributed Computing, vol. 182, p. 104752, 2023

2023
[34]

Fair coflow scheduling without prior knowl- edge,

L. Wang and W. Wang, “Fair coflow scheduling without prior knowl- edge,” in2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2018, pp. 22–32

2018
[35]

Scheduling dependent coflows to minimize the total weighted job completion time in datacenters,

B. Tian, C. Tian, B. Wang, B. Li, Z. He, H. Dai, K. Liu, W. Dou, and G. Chen, “Scheduling dependent coflows to minimize the total weighted job completion time in datacenters,”Computer Networks, vol. 158, pp. 193–205, 2019

2019
[36]

Tres observaciones sobre el algebra lineal,

G. Birkhoff, “Tres observaciones sobre el algebra lineal,”Univ. Nac. Tucuman, Ser. A, vol. 5, pp. 147–154, 1946

1946
[37]

Open shop scheduling to minimize finish time,

T. Gonzalez and S. Sahni, “Open shop scheduling to minimize finish time,”Journal of the ACM (JACM), vol. 23, no. 4, pp. 665–679, 1976

1976
[38]

Facebooktrace,

“Facebooktrace,” https://github.com/coflow/coflow-benchmark, 2019

2019