arxiv: 2605.02389 · v1 · submitted 2026-05-04 · 🪐 quant-ph

Recognition: 3 theorem links

· Lean Theorem

Rethinking How to Act: Action-Space Engineering for Reinforcement Learning-Based Circuit Routing in Distributed Quantum Systems

Joost Van Veen , Luise Prielinger , Sebastian Feld

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:19 UTC · model grok-4.3

classification 🪐 quant-ph

keywords distributed quantum computingreinforcement learningquantum circuit compilationaction space designcircuit routingremote entanglement

0 comments

The pith

A new action-space formulation for reinforcement learning agents speeds up circuit routing in distributed quantum computers by up to 35 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Distributed quantum computing spreads qubits across separate modules linked by networks, so compilation must decide when and where to create remote entangled states that let operations cross modules. The paper builds on prior reinforcement learning work by redefining the agent's possible actions and adding masking to block invalid choices during training. This change produces policies that train more effectively and run faster at inference time, cutting the simulated execution time of compiled circuits by as much as 35 percent across different network constraints. A reader would care because monolithic quantum processors are hard to scale, so practical ways to coordinate smaller modules could bring large-scale quantum computation closer.

Core claim

By introducing a novel action-space formulation together with effective action-masking strategies, the reinforcement learning agent for distributed quantum circuit routing achieves improved training and inference performance, delivering a relative reduction in modeled execution time of up to 35 percent compared with the earlier approach under varying coupling constraints.

What carries the argument

A re-engineered action space that encodes state-dependent networking decisions for generating shared remote quantum states, paired with action-masking rules that prevent invalid moves and guide policy learning.

Load-bearing premise

The simulated execution time model accurately reflects real hardware costs and the learned policy generalizes beyond the circuits and network topologies used in training.

What would settle it

Applying the trained policy to circuits or coupling graphs never seen during training and measuring whether the modeled execution time reduction remains near 35 percent, or running the same circuits on actual hardware and comparing real runtimes to the model's predictions.

Figures

Figures reproduced from arXiv: 2605.02389 by Joost Van Veen, Luise Prielinger, Sebastian Feld.

**Figure 1.** Figure 1: An entangled state is generated between two remote view at source ↗

**Figure 2.** Figure 2: DAG of a quantum circuit encoding precedence con view at source ↗

**Figure 3.** Figure 3: Panel (a) shows the qubit mapping: Annotated values view at source ↗

**Figure 1.** Figure 1: The edge set En denotes the coupling connections in a module enabling local operations. We assume that all local operations take some constant duration to execute, which we denote tlocal. We note that remote entanglement generation over a quantum channel is inherently stochastic: each attempt succeeds with probability pgen < 1 and otherwise must be repeated. We adopt a standard abstraction [14], [24], [28]… view at source ↗

**Figure 4.** Figure 4: Example of a state vector. c) Reward structure An action’s reward should reflect the agent’s progress toward the overall goal: reducing execution time of the quantum circuit under the given hardware model. The agent in Promponas et al. receives a reward Rscore when an action leads to completing a gate, also referred to as “scoring”. Successfully completing all gates provides an additional reward, Rsucce… view at source ↗

**Figure 5.** Figure 5: DQC system with two IBM Q Guadalupe QPUs. Node view at source ↗

**Figure 6.** Figure 6: Qubit connectivity of DQC architecture consisting of view at source ↗

**Figure 7.** Figure 7: Comparison of moving average of elapsed execution view at source ↗

**Figure 9.** Figure 9: Training results using our approach over 250 episodes view at source ↗

**Figure 10.** Figure 10: Standard deviation of training results of baseline [24] view at source ↗

**Figure 11.** Figure 11: Boxplots representing execution time comparison, view at source ↗

read the original abstract

As it becomes increasingly difficult to monolithically scale a quantum processor, distributed quantum computing (DQC) offers an alternative by distributing qubits across multiple smaller interconnected quantum processor modules. In such an architecture, the challenge of quantum circuit compilation shifts from placing and routing qubits within one module to placing, routing and using the qubits efficiently across modules. In order to optimize circuit execution time, the right state-dependent networking decisions must be found, such as when and where to generate shared remote quantum states to support remote operations. Reinforcement learning (RL) provides a natural framework for this problem, generating a compilation policy that can generalize across different circuits. Building on the framework of Promponas et al. (2024), we introduce an agent that combines a novel action-space formulation with effective action-masking strategies. A comprehensive numerical comparison of the two approaches under different coupling constraints shows that our agent achieves improved training and inference performance with a relative reduction in the modeled execution time of up to 35\%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper refines the action space and masking in RL for distributed quantum circuit routing, reporting up to 35% lower simulated execution times over the prior baseline.

read the letter

The core advance here is a more effective way to structure the RL agent's choices when deciding on remote entanglement generation and routing across quantum modules. By reformulating the action space and layering in masking, the agent trains faster and produces policies that cut modeled execution time by as much as 35% under different coupling constraints. That is a direct, usable extension of the Promponas framework rather than a wholesale new method.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a reinforcement learning agent for optimizing qubit placement, routing, and remote entanglement decisions in distributed quantum computing (DQC) architectures. Building on Promponas et al. (2024), the authors introduce a novel action-space formulation together with action-masking strategies. Numerical experiments under varying inter-module coupling constraints report improved training/inference performance and a relative reduction of up to 35% in modeled execution time compared with the prior method.

Significance. If the reported gains are robust, the work offers a concrete methodological improvement in applying RL to state-dependent networking in modular quantum systems. The emphasis on action-space engineering is a focused contribution that could aid generalization across circuits and topologies. However, the absence of hardware calibration for the underlying cost model substantially tempers the immediate practical significance.

major comments (2)

[Abstract] Abstract: the headline claim of 'up to 35% relative reduction in the modeled execution time' is presented without any description of the baseline implementation, number of random seeds or circuits, statistical error bars, or the precise evaluation protocol. This information is load-bearing for assessing whether the central performance improvement is reproducible and statistically meaningful.
[Results / Simulation Model] Simulation model (throughout results section): the execution-time metric is generated by an internal simulator whose cost functions (gate durations, remote entanglement generation latency and fidelity, classical communication overhead) are not calibrated against experimental DQC data or subjected to sensitivity analysis. Because the 35% figure is the primary quantitative result, the lack of validation against physical hardware costs is a load-bearing limitation on the claim that the new action-space formulation delivers practical gains.

minor comments (1)

[References] The citation to Promponas et al. (2024) should be expanded to a full bibliographic entry with title, venue, and arXiv identifier for reader convenience.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful review of our manuscript. Below, we provide point-by-point responses to the major comments, indicating the revisions we will make to address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of 'up to 35% relative reduction in the modeled execution time' is presented without any description of the baseline implementation, number of random seeds or circuits, statistical error bars, or the precise evaluation protocol. This information is load-bearing for assessing whether the central performance improvement is reproducible and statistically meaningful.

Authors: We concur that the abstract would benefit from additional context to support the headline claim. In the revised manuscript, we will update the abstract to specify the baseline implementation (the method from Promponas et al. (2024)), the number of circuits and random seeds employed in our experiments, and indicate that statistical error bars and the detailed evaluation protocol are provided in the results section. This will make the performance claims more transparent without exceeding abstract length constraints. revision: yes
Referee: [Results / Simulation Model] Simulation model (throughout results section): the execution-time metric is generated by an internal simulator whose cost functions (gate durations, remote entanglement generation latency and fidelity, classical communication overhead) are not calibrated against experimental DQC data or subjected to sensitivity analysis. Because the 35% figure is the primary quantitative result, the lack of validation against physical hardware costs is a load-bearing limitation on the claim that the new action-space formulation delivers practical gains.

Authors: We recognize the importance of validating the simulation model. Our cost functions are based on established theoretical models and parameters from the DQC literature, as direct experimental data for full DQC systems is not yet widely available. We will add a sensitivity analysis to the results section, testing the impact of variations in gate durations, entanglement latency, fidelity, and classical communication overhead on the reported execution time reductions. This analysis will demonstrate that the up to 35% improvement holds across a range of plausible parameter values, thereby bolstering the practical implications of our action-space engineering. revision: partial

Circularity Check

0 steps flagged

No significant circularity; performance claim is external numerical comparison

full rationale

The paper's core result is a direct numerical comparison of modeled execution time against the external baseline of Promponas et al. (2024), yielding up to 35% relative reduction under varying coupling constraints. This metric is produced by running the learned policy in the simulator and subtracting the baseline performance; it is not obtained by fitting parameters to the target quantity, redefining the metric in terms of itself, or invoking a self-citation chain for uniqueness. The novel action-space formulation and masking strategies are presented as engineering choices whose value is validated by the external comparison rather than by construction. No load-bearing self-citations, self-definitional equations, or renaming of known results appear in the derivation of the headline claim. The simulation model itself is an explicit modeling assumption whose fidelity is outside the scope of circularity analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the work relies on standard RL assumptions and the validity of the execution-time model from prior work.

pith-pipeline@v0.9.0 · 5478 in / 962 out tokens · 57737 ms · 2026-05-08T18:19:15.335417+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Patterns/EightTick (8-tick period from 2^D, D=3) AlexanderDuality.alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Score 1, Swap 3, Generate 5, Tele-gate 5, Tele-qubit 5 (Table I)
Cost.FunctionalEquation (J(x)=½(x+x⁻¹)−1, ratio-symmetric cost) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

L(θ) = E[(y − Q̃(s,a;θ))²] ; y = r + γ max_a' Q̃(s', a'; θ⁻)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Automated distribution of quantum circuits via hypergraph partitioning.Physical Review A 100, 3 (2019), 032308

ANDRES-MARTINEZ, P.,ANDHEUNEN, C. Automated distribution of quantum circuits via hypergraph partitioning.Physical Review A 100, 3 (2019), 032308

2019
[2]

M., DUCKERING, C., HOOVER, A.,ANDCHONG, F

BAKER, J. M., DUCKERING, C., HOOVER, A.,ANDCHONG, F. T. Time-Sliced Quantum Circuit Partitioning for Modular Architectures. In Proceedings of the 17th ACM International Conference on Computing Frontiers(May 2020), pp. 98–107. arXiv:2005.12259 [quant-ph]

work page arXiv 2020
[3]

G.,ET AL

BANDIC, M., PRIELINGER, L., NÜSSLEIN, J., OVIDE, A., RODRIGO, S., ABADAL, S., VANSOMEREN, H., VARDOYAN, G., ALARCON, E., ALMUDEVER, C. G.,ET AL. Mapping quantum circuits to modular architectures with qubo. In2023 IEEE International Conference on Quantum Computing and Engineering (QCE)(2023), vol. 1, IEEE, pp. 790–801

2023
[4]

Potential of quantum computing for drug discovery.IBM Journal of Research and Develop- ment 62, 6 (2018), 6–1

CAO, Y., ROMERO, J.,ANDASPURU-GUZIK, A. Potential of quantum computing for drug discovery.IBM Journal of Research and Develop- ment 62, 6 (2018), 6–1

2018
[5]

I., EKERT, A., HUELGA, S

CIRAC, J. I., EKERT, A., HUELGA, S. F.,ANDMACCHIAVELLO, C. Distributed quantum computation over noisy channels.Physical Review A 59, 6 (1999), 4249

1999
[6]

CUOMO, D., CALEFFI, M., KRSULICH, K., TRAMONTO, F., AGLIARDI, G., PRATI, E.,ANDCACCIAPUOTI, A. S. Optimized compiler for distributed quantum computing.ACM Transactions on Quantum Computing 4, 2 (Feb. 2023)

2023
[7]

P., ITOH, K

DELEON, N. P., ITOH, K. M., KIM, D., MEHTA, K. K., NORTHUP, T. E., PAIK, H., PALMER, B., SAMARTH, N., SANGTAWESIN, S.,AND STEUERMAN, D. W. Materials challenges and opportunities for quantum computing hardware.Science 372, 6539 (2021), eabb2823

2021
[8]

A note on two problems in connexion with graphs

DLJKSTRA, E. A note on two problems in connexion with graphs. Numerische Mathematik 50(1959), 269–271

1959
[9]

Can quantum- mechanical description of physical reality be considered complete? Physical review 47, 10 (1935), 777

EINSTEIN, A., PODOLSKY, B.,ANDROSEN, N. Can quantum- mechanical description of physical reality be considered complete? Physical review 47, 10 (1935), 777

1935
[10]

Mind the gaps: The fraught road to quantum advantage

EISERT, J.,ANDPRESKILL, J. Mind the gaps: The fraught road to quantum advantage.arXiv preprint arXiv:2510.19928(2025)

work page arXiv 2025
[11]

A modular quantum compilation framework for distributed quantum computing.IEEE Transactions on Quantum Engineering 4(2023), 1–13

FERRARI, D., CARRETTA, S.,ANDAMORETTI, M. A modular quantum compilation framework for distributed quantum computing.IEEE Transactions on Quantum Engineering 4(2023), 1–13

2023
[12]

M., ASHHAB, S.,ANDNORI, F

GEORGESCU, I. M., ASHHAB, S.,ANDNORI, F. Quantum simulation. Rev. Mod. Phys. 86, 1 (Mar. 2014), 153–185. Publisher: American Physical Society

2014
[13]

Deep learning, vol

GOODFELLOW, I., BENGIO, Y., COURVILLE, A.,ANDBENGIO, Y. Deep learning, vol. 1. MIT press Cambridge, 2016

2016
[14]

C., KALB, N., MORITS, J

HUMPHREYS, P. C., KALB, N., MORITS, J. P. J., SCHOUTEN, R. N., VERMEULEN, R. F. L., TWITCHEN, D. J., MARKHAM, M.,AND HANSON, R. Deterministic delivery of remote entanglement on a quantum network.Nature 558, 7709 (June 2018), 268–273

2018
[15]

A., HUMPHREYS, P

KALB, N., REISERER, A. A., HUMPHREYS, P. C., BAKERMANS, J. J., KAMERLING, S. J., NICKERSON, N. H., BENJAMIN, S. C., TWITCHEN, D. J., MARKHAM, M.,ANDHANSON, R. Entanglement distillation between solid-state quantum network nodes.Science 356, 6341 (2017), 928–932

2017
[16]

Adam: A Method for Stochastic Optimization

KINGMA, D. P.,ANDBA, J. Adam: A method for stochastic optimiza- tion.arXiv preprint arXiv:1412.6980(2014)

work page internal anchor Pith review arXiv 2014
[17]

Tackling the qubit mapping problem for nisq-era quantum devices

LI, G., DING, Y.,ANDXIE, Y. Tackling the qubit mapping problem for nisq-era quantum devices. InProceedings of the twenty-fourth interna- tional conference on architectural support for programming languages and operating systems(2019), pp. 1001–1014

2019
[18]

P., AINLEY, E

MAIN, D., DRMOTA, P., NADLINGER, D. P., AINLEY, E. M., AGRAWAL, A., NICHOL, B. C., SRINIVAS, R., ARANEDA, G.,AND LUCAS, D. M. Distributed quantum computing across an optical network link.Nature 638, 8050 (Feb. 2025), 383–388

2025
[19]

R., MAUNZ, P., DUAN, L.,ANDKIM, J

MONROE, C., RAUSSENDORF, R., RUTHVEN, A., BROWN, K. R., MAUNZ, P., DUAN, L.,ANDKIM, J. Large-scale modu- lar quantum-computer architecture with atomic memory and pho- tonic interconnects.Physical Review A 89, 2 (Feb. 2014), 022317. ARXIV_ID: 1208.0391 MAG ID: 2043500043 S2ID: 0c73f108a19ce71785235766fd76b8b8ed571972

work page arXiv 2014
[20]

H., LI, Y.,ANDBENJAMIN, S

NICKERSON, N. H., LI, Y.,ANDBENJAMIN, S. C. Topological quantum computing with a very noisy network and local error rates approaching one percent.Nature Communications 4, 1 (Apr. 2013),

2013
[21]

arXiv:1211.2217 [quant-ph]

work page arXiv
[22]

A.,ANDCHUANG, I

NIELSEN, M. A.,ANDCHUANG, I. L.Quantum computation and quantum information, 10th anniversary ed ed. Cambridge University Press, Cambridge ; New York, 2010

2010
[23]

G., HERBERT, S

POZZI, M. G., HERBERT, S. J., SENGUPTA, A.,ANDMULLINS, R. D. Using Reinforcement Learning to Perform Qubit Routing in Quantum Compilers.ACM Transactions on Quantum Computing 3, 2 (May 2022). Place: New York, NY , USA Publisher: Association for Computing Machinery

2022
[24]

Quantum Computing in the NISQ era and beyond

PRESKILL, J. Quantum Computing in the NISQ era and beyond. Quantum 2(Aug. 2018), 79. Publisher: Verein zur Förderung des Open Access Publizierens in den Quantenwissenschaften

2018
[25]

Compiler for Distributed Quan- tum Computing: a Reinforcement Learning Approach, Apr

PROMPONAS, P., MUDVARI, A., DELLACHIESA, L., POLAKOS, P., SAMUEL, L.,ANDTASSIULAS, L. Compiler for Distributed Quan- tum Computing: a Reinforcement Learning Approach, Apr. 2024. arXiv:2404.17077 [quant-ph]

work page arXiv 2024
[26]

Algorithms for quantum computation: discrete logarithms and factoring

SHOR, P. Algorithms for quantum computation: discrete logarithms and factoring. InProceedings 35th Annual Symposium on F oundations of Computer Science(1994), pp. 124–134

1994
[27]

Dqc-qr: Distributing and routing quantum circuits with minimum execution time

SUNDARAM, R., GUPTA, H.,ANDRAMAKRISHNAN, C. Dqc-qr: Distributing and routing quantum circuits with minimum execution time. ACM Transactions on Quantum Computing 6, 4 (Sept. 2025)

2025
[28]

S.,ANDBARTO, A

SUTTON, R. S.,ANDBARTO, A. G. Reinforcement Learning: An Introduction
[29]

G.,ANDWEHNER, S

TALSMA, L., IÑESTA, Á. G.,ANDWEHNER, S. Continuously distribut- ing entanglement in quantum networks with regular topologies.Physical Review A 110, 2 (2024), 022429

2024
[30]

Alpharouter: Quantum circuit routing with reinforcement learning and tree search

TANG, W., DUAN, Y., KHARKOV, Y., FAKOOR, R., KESSLER, E., ANDSHI, Y. Alpharouter: Quantum circuit routing with reinforcement learning and tree search. In2024 IEEE International Conference on Quantum Computing and Engineering (QCE)(2024), vol. 01, pp. 930– 940

2024
[31]

TERHAL, B. M. Quantum error correction for quantum memories. Reviews of Modern Physics 87, 2 (2015), 307–346. [31]VANVEEN, J. Compilerdqc. https://github.com/joost-vanveen/ CompilerDQC, 2026. GitHub repository, branch: main, accessed 2026- 03-16. [32]VANVEEN, J., PRIELINGER, L.,ANDFELD, S. Data un- derlying the publication: Reinforcement learning in compi...

2015