arxiv: 2604.11013 · v1 · submitted 2026-04-13 · 🪐 quant-ph · cs.DC· cs.ET

QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting

Vinooth Kulkarni , Aaron Orenstein , Xinpeng Li , Shuai Xu , Daniel Blankenberg , Vipin Chaudhary This is my paper

Pith reviewed 2026-05-10 16:18 UTC · model grok-4.3

classification 🪐 quant-ph cs.DCcs.ET

keywords quantum job schedulingmodular quantum processorscircuit cuttingdynamic circuitsparallel quantum executionquantum teleportationmulti-QPU systems

0 comments

The pith

A scheduler for modular quantum systems runs multiple user jobs in parallel by coordinating circuit cutting, qubit mapping, and inter-module teleportation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a scheduler that treats linked quantum processors as a shared resource pool rather than isolated machines. It jointly decides how to cut large circuits into subcircuits, map qubits across modules, synchronize measurements, and insert teleportation steps so that several independent jobs can execute at the same time. Classical interconnection of QPUs creates new coordination problems that single-device schedulers ignore; solving them could let cloud providers deliver higher job throughput before full error-corrected machines arrive.

Core claim

The central claim is that a single multi-programmable scheduler can simultaneously optimize qubit mapping, parallel circuit execution, measurement synchronization across subcircuits, and Bell-pair teleportation between QPUs using dynamic circuits and circuit cutting.

What carries the argument

The QuMod scheduler, which treats circuit cutting and dynamic-circuit teleportation as first-class scheduling decisions so that multiple independent quantum jobs can run concurrently across connected QPUs.

If this is right

Cloud providers could accept more concurrent quantum jobs on existing modular hardware instead of queuing them serially.
Effective qubit counts available to users increase through modular linking without waiting for monolithic chips.
Job fairness improves because the scheduler can interleave small and large circuits while managing communication costs.
Resource utilization rises as idle modules during one job can be assigned to other users via coordinated cutting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint-optimization idea might extend to photonic or neutral-atom interconnects once their latency and fidelity numbers are known.
If overheads prove manageable, hybrid classical-quantum workflows could treat a cluster of small QPUs as one larger accelerator.
Scheduling policies could later incorporate error-mitigation costs as an explicit objective once hardware data on teleportation errors become available.

Load-bearing premise

The added time and error from classical links, Bell-pair teleportation, and cross-subcircuit measurement synchronization stay small enough that parallel execution still produces a net gain in throughput.

What would settle it

Measure total wall-clock time to complete a batch of independent circuits on a modular testbed (two linked QPUs) with and without the joint scheduler; if the parallel version finishes faster by more than the teleportation overhead, the claim holds.

Figures

Figures reproduced from arXiv: 2604.11013 by Aaron Orenstein, Daniel Blankenberg, Shuai Xu, Vinooth Kulkarni, Vipin Chaudhary, Xinpeng Li.

**Figure 2.** Figure 2: MQT-QUEKO Benchmark circuits: Execution schedules using LO and LOCC modes on modular QPUs. Black horizontal lines separate execution on each quantum computer. The numbering and coloring of jobs is consistent across both subfigures. (a) Subcircuits that can be run independently using circuit cutting with only local operations (LO). (b) Upstream subcircuits are scheduled before downstream subcircuits on anot… view at source ↗

**Figure 3.** Figure 3: MQT + Large circuits (Mandatory cut): Execution schedules using LO and LOCC modes on modular QPUs. Black horizontal lines separate execution on each quantum computer. The numbering and coloring of jobs is consistent across both subfigures. (a) Only job 4 with 142 qubits was cut, which does not fit on a QPU. Cutting more circuits in LO mode produces more subcircuits which the scheduler avoided (b) QuMod dyn… view at source ↗

read the original abstract

The quantum computing community is increasingly positioning quantum processors as accelerators within classical HPC workflows, analogous to GPUs and TPUs. However, many real-world applications require scaling to hundreds or thousands of physical qubits to realize logical qubits via error correction. To reach these scales, hardware vendors employing diverse technologies -- such as trapped ions, photonics, neutral atoms, and superconducting circuits -- are moving beyond single, monolithic QPUs toward modular architectures connected via interconnects. For example, IonQ has proposed photonic links for scaling, while IBM has demonstrated a modular QPU architecture by classically linking two 127-qubit devices. Using dynamic circuits, Bell-pair-based teleportation, and circuit cutting, they have shown how to execute a large quantum circuit that cannot fit on a single QPU. As interest in quantum computing grows, cloud providers must ensure fair and efficient resource allocation for multiple users sharing such modular systems. Classical interconnection of QPUs introduces new scheduling challenges, particularly when multiple jobs execute in parallel. In this work, we develop a multi-programmable scheduler for modular quantum systems that jointly considers qubit mapping, parallel circuit execution, measurement synchronization across subcircuits, and teleportation operations between QPUs using dynamic circuits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QuMod sketches a joint scheduler for parallel multi-user jobs on modular QPUs but supplies no algorithm details, cost models, or results to show the approach actually reduces overheads.

read the letter

The paper's main contribution is a scheduler that tries to handle qubit mapping, parallel execution, cross-subcircuit measurement sync, and dynamic-circuit teleportation together for multiple jobs on linked QPUs. This matches the direction hardware is taking with modular designs from vendors like IBM and IonQ, and it correctly flags the new coordination problems that arise when classical interconnects are involved in shared environments.

Referee Report

3 major / 1 minor

Summary. The paper proposes QuMod, a scheduler for modular quantum systems that enables parallel execution of multiple quantum jobs across interconnected QPUs. It jointly optimizes qubit mapping, parallel circuit execution, cross-subcircuit measurement synchronization, and Bell-pair teleportation operations implemented via dynamic circuits and circuit cutting, motivated by the need for efficient multi-user resource allocation on scaled hardware such as classically linked IBM QPUs or IonQ photonic interconnects.

Significance. If the joint scheduler demonstrably reduces net overhead from classical interconnects, teleportation, and synchronization relative to independent per-QPU scheduling, the work could improve throughput and fairness in cloud quantum platforms. However, the manuscript supplies no algorithmic specification, cost model, simulation results, or baseline comparisons, so the practical significance cannot yet be assessed.

major comments (3)

The abstract asserts that a scheduler 'was developed' that jointly considers qubit mapping, parallel execution, measurement synchronization, and teleportation, yet the manuscript contains no description of the scheduling algorithm, objective function, optimization method, or pseudocode. Without these, the central claim cannot be evaluated.
No hardware-calibrated cost model, analytic bounds, or simulation results (wall-clock time, fidelity, or throughput) are provided to test whether the joint scheduler keeps teleportation and synchronization overheads below the threshold needed for net gains on realistic modular topologies. This directly undermines the practical-utility claim.
The manuscript offers no comparison against baselines such as independent per-QPU scheduling or existing circuit-cutting schedulers, leaving open whether the proposed joint optimization yields measurable improvements.

minor comments (1)

Notation for subcircuit synchronization and teleportation primitives should be defined explicitly in the methods section rather than left implicit.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We agree that the current manuscript is primarily conceptual and lacks the algorithmic details, cost model, simulation results, and baseline comparisons needed to fully evaluate the claims. We will revise the manuscript accordingly to address each point.

read point-by-point responses

Referee: The abstract asserts that a scheduler 'was developed' that jointly considers qubit mapping, parallel execution, measurement synchronization, and teleportation, yet the manuscript contains no description of the scheduling algorithm, objective function, optimization method, or pseudocode. Without these, the central claim cannot be evaluated.

Authors: We acknowledge that the manuscript currently presents QuMod at a high-level conceptual stage without specifying the scheduling algorithm, objective function, optimization technique, or pseudocode. In the revised version, we will add a complete description of the joint scheduler, including the formal objective function that balances qubit mapping, parallel execution, measurement synchronization, and teleportation costs, the chosen optimization method, and pseudocode for the core procedure. revision: yes
Referee: No hardware-calibrated cost model, analytic bounds, or simulation results (wall-clock time, fidelity, or throughput) are provided to test whether the joint scheduler keeps teleportation and synchronization overheads below the threshold needed for net gains on realistic modular topologies. This directly undermines the practical-utility claim.

Authors: We agree that the lack of a calibrated cost model and quantitative results prevents assessment of practical benefits. The revised manuscript will include a hardware-calibrated cost model incorporating interconnect latency, dynamic-circuit teleportation overhead, and synchronization costs, along with analytic bounds where possible and simulation results on realistic modular topologies (e.g., classically linked IBM-style or photonic-interconnect IonQ-style systems) to demonstrate net overhead reduction. revision: yes
Referee: The manuscript offers no comparison against baselines such as independent per-QPU scheduling or existing circuit-cutting schedulers, leaving open whether the proposed joint optimization yields measurable improvements.

Authors: We recognize the necessity of empirical comparisons. The revision will add direct comparisons against independent per-QPU scheduling and relevant existing circuit-cutting or modular scheduling approaches, reporting improvements in throughput, total wall-clock time, and resource utilization under multi-job workloads. revision: yes

Circularity Check

0 steps flagged

No circularity: scheduler design is a self-contained engineering proposal with no equations or self-referential derivations.

full rationale

The manuscript describes the development of a multi-programmable scheduler that jointly optimizes qubit mapping, parallel circuit execution, cross-subcircuit measurement synchronization, and dynamic-circuit teleportation on modular QPUs. No equations, fitted parameters, predictions derived from inputs, or self-citations appear in the abstract or provided text. The central contribution is an algorithmic framework for resource allocation, which stands as an independent design choice rather than a derivation that reduces to its own assumptions by construction. Any performance claims would require external validation through implementation or simulation, but the paper's structure contains no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities. The contribution is described at the level of a scheduling framework without detailing underlying assumptions beyond the background statements about modular architectures and circuit cutting.

pith-pipeline@v0.9.0 · 5534 in / 1133 out tokens · 72129 ms · 2026-05-10T16:18:20.136449+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Quantum computing in the nisq era and beyond,

J. Preskill, “Quantum computing in the nisq era and beyond,”Quantum, vol. 2, p. 79, 2018

work page 2018
[2]

doi:10.1038/s41586-024-08404-x

D. Main, P. Drmota, D. P. Nadlinger, E. M. Ainley, A. Agrawal, B. C. Nichol, R. Srinivas, G. Araneda, and D. M. Lucas, “Distributed quantum computing across an optical network link,” Nature, vol. 638, no. 8050, p. 383–388, Feb. 2025. [Online]. Available: http://dx.doi.org/10.1038/s41586-024-08404-x

work page doi:10.1038/s41586-024-08404-x 2025
[3]

Carrera Vazquez, C

A. Carrera Vazquez, C. Tornow, D. Rist `e, S. Woerner, M. Takita, and D. J. Egger, “Combining quantum processors with real-time classical communication,”Nature, vol. 636, no. 8041, p. 75–79, Nov. 2024. [Online]. Available: http://dx.doi.org/10.1038/s41586-024-08178-2

work page doi:10.1038/s41586-024-08178-2 2024
[4]

Qiskit addon: circuit cutting,

A. M. Bra ´nczyk, A. Carrera Vazquez, D. J. Egger, B. Fuller, J. Gacon, J. R. Garrison, J. R. Glick, C. Johnson, S. Joshi, E. Pednault, C. D. Pemmaraju, P. Rivero, I. Shehzad, and S. Woerner, “Qiskit addon: circuit cutting,” https://github.com/Qiskit/qiskit-addon-cutting, 2024

work page 2024
[5]

CutQC: using small quantum computers for large quantum circuit evaluations,

W. Tang, T. Tomesh, M. Suchara, J. Larson, and M. Martonosi, “CutQC: using small quantum computers for large quantum circuit evaluations,” inProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, apr 2021. [Online]. Available: https://doi.org/10. 1145%2F3445814.3446758

work page arXiv 2021
[6]

Piveteau and D

C. Piveteau and D. Sutter, “Circuit knitting with classical communication,”IEEE Transactions on Information Theory, vol. 70, no. 4, p. 2734–2745, Apr. 2024. [Online]. Available: http://dx.doi.org/10.1109/TIT.2023.3310797

work page doi:10.1109/tit.2023.3310797 2024
[7]

Online detection of golden circuit cutting points,

D. Chen, E. Hansen, X. Li, A. Orenstein, V . Kulkarni, V . Chaudhary, Q. Guan, J. Liu, Y . Zhang, and S. Xu, “Online detection of golden circuit cutting points,” in2023 IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2023

work page 2023
[8]

Quantum circuit cutting with maximum-likelihood tomography,

M. A. Perlin, Z. H. Saleem, M. Suchara, and J. C. Osborn, “Quantum circuit cutting with maximum-likelihood tomography,”npj Quantum Information, vol. 7, no. 1, p. 64, 2021. [Online]. Available: https://doi.org/10.1038/s41534-021-00390-6

work page doi:10.1038/s41534-021-00390-6 2021
[9]

Efficient circuit wire cutting based on commuting groups,

X. Li, V . Kulkarni, D. T. Chen, Q. Guan, W. Jiang, N. Xie, S. Xu, and V . Chaudhary, “Efficient circuit wire cutting based on commuting groups,”arXiv preprint arXiv:2410.20313, 2024

work page arXiv 2024
[10]

Efficient quantum circuit cutting by neglecting basis elements,

D. T. Chen, E. H. Hansen, X. Li, V . Kulkarni, V . Chaudhary, B. Ren, Q. Guan, S. Kuppannagari, J. Liu, and S. Xu, “Efficient quantum circuit cutting by neglecting basis elements,”arXiv preprint arXiv:2304.04093, 2023

work page arXiv 2023
[11]

Qucloud+: A holistic qubit mapping scheme for single/multi-programming on 2d/3d nisq quantum computers,

L. Liu and X. Dou, “Qucloud+: A holistic qubit mapping scheme for single/multi-programming on 2d/3d nisq quantum computers,”ACM Trans. Archit. Code Optim., vol. 21, no. 1, Jan. 2024. [Online]. Available: https://doi.org/10.1145/3631525

work page doi:10.1145/3631525 2024
[12]

Qgroup: Parallel quantum job schedul- ing using dynamic programming,

A. Orenstein and V . Chaudhary, “Qgroup: Parallel quantum job schedul- ing using dynamic programming,” in2024 IEEE International Confer- ence on Quantum Computing and Engineering (QCE), vol. 01, 2024, pp. 990–999

work page 2024
[13]

Quflex: Parallel quantum job scheduling using adaptive circuit cutting,

V . Kulkarni, A. Orenstein, X. Li, S. Xu, D. Blankenberg, and V . Chaud- hary, “Quflex: Parallel quantum job scheduling using adaptive circuit cutting,” inProceedings of Supercomputing India (SCI), 2025, to appear

work page 2025
[14]

Qoncord: A multi-device job scheduling framework for variational quantum algorithms,

M. Wang, P. Das, and P. J. Nair, “Qoncord: A multi-device job scheduling framework for variational quantum algorithms,” in2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024, pp. 735–749

work page 2024
[15]

Qusplit: Achieving both high fidelity and throughput via job splitting on noisy quantum computers,

J. Li, Y . Song, Y . Liu, J. Pan, L. Yang, T. Humble, and W. Jiang, “Qusplit: Achieving both high fidelity and throughput via job splitting on noisy quantum computers,”arXiv preprint arXiv:2501.12492, 2025

work page arXiv 2025
[16]

Scalable circuit cutting and scheduling in a resource-constrained and distributed quantum system,

S. Kan, Z. Du, M. Palma, S. A. Stein, C. Liu, W. Wei, J. Chen, A. Li, and Y . Mao, “Scalable circuit cutting and scheduling in a resource-constrained and distributed quantum system,” 2024. [Online]. Available: https://arxiv.org/abs/2405.04514

work page arXiv 2024
[17]

A generalized scheme for mapping parallel algorithms,

V . Chaudhary and J. K. Aggarwal, “A generalized scheme for mapping parallel algorithms,”IEEE Trans. Parallel Distrib. Syst., vol. 4, no. 3, p. 328–346, Mar. 1993. [Online]. Available: https: //doi.org/10.1109/71.210815

work page doi:10.1109/71.210815 1993
[18]

doi: 10.7717/peerj-cs.103

T. SimPy, “Simpy: A discrete-event simulation library,”PeerJ Computer Science, vol. 2, p. e103, 2016. [Online]. Available: https://doi.org/10.7717/peerj-cs.103

work page doi:10.7717/peerj-cs.103 2016
[19]

MQT Bench: Bench- marking software and design automation tools for quantum computing,

N. Quetschlich, L. Burgholzer, and R. Wille, “MQT Bench: Bench- marking software and design automation tools for quantum computing,” Quantum, 2023, MQT Bench is available at https://www.cda.cit.tum.de/ mqtbench/

work page 2023