Reinforcement learning for ion shuttling on trapped-ion quantum computers
Pith reviewed 2026-05-22 05:37 UTC · model grok-4.3
The pith
Reinforcement learning optimizes ion shuttling on trapped-ion quantum computers and cuts operations by up to 36.3 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We demonstrate the first use of reinforcement learning for optimizing ion shuttling. The RL agent learns a shuttling policy through direct interaction with a simulation of the modular trapped-ion architecture. This policy outperforms existing heuristic techniques and reduces the number of shuttling operations by up to 36.3 percent. The approach applies readily to multiple chip architectures and supplies a tool for evaluating shuttling efficiency while designing future, more complex hardware.
What carries the argument
Reinforcement learning agent that learns a policy for choosing ion transport steps to minimize total shuttling operations in a simulated modular chip.
If this is right
- Fewer shuttling steps lower the chance of errors during transport, supporting more reliable quantum circuits.
- The method scales to larger ion numbers where manual or heuristic planning becomes impractical.
- Designers can test proposed chip layouts for shuttling cost before fabrication.
- The same RL framework can be reused across different zone arrangements with little extra tuning.
Where Pith is reading between the lines
- If the policy transfers well to hardware, it could shorten the engineering cycle for scaling modular ion traps.
- Analogous RL techniques might later optimize other real-time control tasks such as gate tuning or error correction scheduling.
- Combining the shuttling optimizer with full-circuit simulators would let researchers measure end-to-end speedups on larger algorithms.
Load-bearing premise
The simulation used for training must capture the main physical constraints and noise so that the learned policy works on real hardware without retraining.
What would settle it
Deploy the trained RL policy on a physical trapped-ion processor and count whether it performs fewer shuttling operations than the best current heuristic method on the same circuit.
Figures
read the original abstract
Scalable trapped-ion quantum computing is commonly realized with modular chips that feature distinct zones with specific functionalities, such as storage, state preparation, and gate execution. To execute a quantum circuit, the ions must be transported between these zones. This process is called ion shuttling. To achieve reliable computation results, the shuttling process must be optimized. However, as the number of ions increases, this becomes a high-dimensional optimization problem where optimal solutions cannot be computed efficiently. We demonstrate, to the best of our knowledge, the first use of reinforcement learning (RL) for the optimization of ion shuttling. RL is well-suited for such scenarios, as it enables learning a strategy through direct interaction with the problem. We show that our RL approach outperforms current state-of-the-art heuristic techniques, yielding a reduction in shuttling operations of up to 36.3 %. Furthermore, we show that our method is easily applicable to various chip architectures. Our approach offers a versatile method to study shuttling efficiency during chip design and, therefore, a highly relevant tool for future, more complex architectures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a reinforcement learning (RL) method for optimizing ion shuttling in modular trapped-ion quantum computing architectures. It claims to be the first application of RL to this problem and reports that the approach reduces the number of shuttling operations by up to 36.3% relative to existing heuristic techniques while remaining applicable across different chip layouts.
Significance. If the simulator faithfully captures the dominant physical constraints and the learned policies transfer to hardware, the work would supply a practical, scalable tool for studying and improving shuttling efficiency during the design of complex trapped-ion chips. The absence of hardware validation and simulation-fidelity metrics currently limits the strength of this assessment.
major comments (2)
- [Abstract] Abstract: The headline performance claim of a 36.3% reduction in shuttling operations is obtained entirely inside an author-defined simulator. No information is supplied on the motional-heating rates, voltage-noise spectra, inter-zone coupling strengths, or other error channels included in the environment, nor are any closed-loop hardware experiments reported that would close the sim-to-real gap. This directly affects the load-bearing claim that the method is useful for real devices.
- [Abstract] The manuscript states that the RL policy outperforms 'current state-of-the-art heuristic techniques' but provides neither the explicit definitions of those heuristics nor quantitative tables comparing operation counts, fidelity, or runtime across multiple ion numbers and architectures. Without these baselines the magnitude of the reported improvement cannot be independently verified.
minor comments (1)
- [Abstract] The abstract asserts applicability to 'various chip architectures' but does not indicate whether the same reward function and state representation were used without modification or whether architecture-specific retraining was required.
Simulated Author's Rebuttal
We are grateful to the referee for their insightful comments, which have helped us clarify the scope and presentation of our work. We provide point-by-point responses to the major comments below, indicating where revisions will be incorporated.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline performance claim of a 36.3% reduction in shuttling operations is obtained entirely inside an author-defined simulator. No information is supplied on the motional-heating rates, voltage-noise spectra, inter-zone coupling strengths, or other error channels included in the environment, nor are any closed-loop hardware experiments reported that would close the sim-to-real gap. This directly affects the load-bearing claim that the method is useful for real devices.
Authors: We agree that additional details on the simulator would improve transparency. Our environment models the discrete zone assignments and transport operations central to modular trapped-ion architectures, with simplified representations of physical constraints to enable scalable RL training. We will revise the methods section to explicitly list the included assumptions (e.g., idealized transport times and basic heating estimates) and any omitted channels. We acknowledge that the work does not include hardware validation or full error-channel fidelity metrics; as a simulation study demonstrating RL feasibility, we will add a limitations paragraph discussing the sim-to-real gap and suggesting future experimental directions, but we cannot report closed-loop hardware results at this stage. revision: partial
-
Referee: [Abstract] The manuscript states that the RL policy outperforms 'current state-of-the-art heuristic techniques' but provides neither the explicit definitions of those heuristics nor quantitative tables comparing operation counts, fidelity, or runtime across multiple ion numbers and architectures. Without these baselines the magnitude of the reported improvement cannot be independently verified.
Authors: We will revise the manuscript to provide explicit definitions of the baseline heuristics, including nearest-zone greedy assignment and shortest-path routing methods drawn from prior trapped-ion literature. We will also add quantitative comparison tables and supplementary figures reporting operation counts, estimated fidelities, and wall-clock runtimes for the RL policy versus these heuristics, evaluated across ion numbers from 4 to 20 and at least three distinct chip layouts. These additions will allow direct verification of the maximum 36.3% reduction in shuttling operations. revision: yes
Circularity Check
No circularity: empirical RL performance comparison in author-defined simulator
full rationale
The paper reports an empirical result: an RL policy trained inside a custom simulator achieves up to 36.3% fewer shuttling operations than heuristics. No equations, derivations, or uniqueness theorems are presented whose outputs reduce by construction to the inputs or to self-citations. The performance metric is measured directly against the same simulator used for training; this is a standard empirical benchmark, not a definitional or fitted-input circularity. The sim-to-real transfer gap is a validity concern, not a circularity in the reported chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
It consists of four registers connected by an X-junction, we refer to it as the “X-chip”
Example architecture 1: X-chip Our first example architecture is the QVLS QROSS chip [23], the first proposal for a QCCD developed by Quantum Valley Lower Saxony (QVLS) [16]. It consists of four registers connected by an X-junction, we refer to it as the “X-chip”. The registers include a compute zone that can hold up to 2 ions, a state preparation and mea...
-
[2]
Here, the storage zone is consolidated in a ring
Example architecture 2: Q-chip We also study an alternative chip design, the QVLS CIRQLE chip [24], with a more compact storage register, allowing more ions to fit on the same chip size. Here, the storage zone is consolidated in a ring. The compute zone (capacity 2 ions) and the SPAM zone (capacity 1 ion) are connected to this ring via a junction, resulti...
-
[3]
store ions that are used together in proximity. [10] also implement proximity sorting and use the readout zone as temporary storage to make ions needed shortly after easily accessible. Handling traffic blocks:Different heuristics are used in order to avoid an ion from being blocked because the path it has to take is occupied by other ions [13]. This is do...
-
[4]
With this formula, they can then determine whether an ion should move or stay in the same trap
propose a probabilistic formula that accounts for several heuristics, such asleast movementsandhandling traffic blocks. With this formula, they can then determine whether an ion should move or stay in the same trap. The mentioned strategies are planning strategies; a shuttling protocol is developed for a given circuit in ad- vance and then executed on the...
-
[5]
Reference 1: Heuristic compiler In the framework of the QVLS X-chip, a shuttling com- piler was developed to address the ion shuttling problem 4 using heuristics derived from observations of the chip’s architecture [10]. The challenge of orchestrating ions across the chip to execute a given quantum circuit was therefore decomposed into several phases, two...
-
[6]
Reference 2: SAT solver For benchmarking purposes, it is useful to compare ob- tained trajectories against optimal ones. While finding these optimal trajectories is likely unfeasible in the gen- eral case, we can at least study small instances to gain some basic insight. In principle, a naive exhaustive search through all shut- tling sequences of a fixed ...
-
[7]
vectors, using some transformation
Requirements for representations When employing a neural network as a policyπ, the state spaceSmust usually be transformed to compatible representations, e.g. vectors, using some transformation. For simplicity in a slight abuse of notation we useSas the representation space directly. A good representation of the chip state and circuit to be executed shoul...
-
[8]
This is illustrated in Figure 2 for a lookahead ofk lookahead = 2
Proposed representation The core idea of our proposed representation is ab- stracting the qubit label and sequence position of a two- qubit gate by encoding a gate through the cell-location of the other operand and the depth of the gate in the dependency graph. This is illustrated in Figure 2 for a lookahead ofk lookahead = 2. The following steps are per- formed:
-
[9]
A chip-specific adapter translates the chip state (top left) into a tabular formK(columns “Cell” and “Qubit” on the right). In our case the adapter simply iterates all zones starting with the position next to the junction
-
[10]
If the circuit (bottom left) is given as a list of gates, the directed acyclic graph of the circuit is calcu- 7 1 2 3 4 5 g1 g2 g3 g4 Circuit of Two-Qubit Gates g1 1 3 g2 2 4 g3 1 5 g4 1 3 Depth 0 Depth 1 Depth 2 Directed Circuit Graph 1 2 34 5 Storage Compute Spam Chip State Storage Adapter Encoding M (1, 5, 6) (1, 10, ⋄) (0, ⋄, ⋄) (0, ⋄, ⋄) (1, 2, ⋄) (1...
-
[11]
For each cell, it is encoded whether it is occupied by a qubit
The encoding matrixMis computed (right). For each cell, it is encoded whether it is occupied by a qubit. Next, for depths in{0, . . . , k lookahead −1}, it is checked if a gate at that depth exists with the qubit of the current cell. If it exists, the cell of the other operand is encoded. Otherwise, an empty token⋄is encoded. Gates at a depth ofk lookahea...
-
[12]
Shaped reward The basic reward signal for a goal-reaching problem is very sparse, as the agent receives a negative reward at a constant ratec r until a goal state is reached. If the problem only terminates upon reaching a goal state and the agent has not encountered any goal states yet, the value of every state must be estimated as V=−c r R ∞ 0 e−βtdt=− c...
-
[13]
Problem generation during training When training the RL agent, a diverse range of starting states is desirable, such that the entire possible problem space is well covered. A starting state is generated by first drawing the number of ions or qubits on the chip: z∼Uniform({2, . . . , n max}). Here,n max is the maximum number of ions supported. The qubits a...
work page 2057
-
[14]
C. D. Bruzewicz, J. Chiaverini, R. McConnell, and J. M. Sage, Trapped-ion quantum computing: Progress and challenges, Applied Physics Reviews6, 021314 (2019)
work page 2019
-
[15]
J. I. Cirac and P. Zoller, Quantum computations with cold trapped ions, Physical Review Letters74, 4091 (1995)
work page 1995
-
[16]
A. Sørensen and K. Mølmer, Quantum Computation with Ions in Thermal Motion, Physical Review Letters82, 1971 (1999)
work page 1971
-
[17]
G. Zarantonello, H. Hahn, J. Morgner, M. Schulte, A. Bautista-Salvador, R. F. Werner, K. Hammerer, and C. Ospelkaus, Robust and Resource-Efficient Microwave Near-Field Entangling Be + 9 Gate, Physical Review Let- ters123, 260503 (2019)
work page 2019
-
[18]
D. Kielpinski, C. Monroe, and D. J. Wineland, Architec- ture for a large-scale ion-trap quantum computer, Nature 417, 709 (2002)
work page 2002
-
[19]
J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyen- huis, Demonstration of the trapped-ion quantum CCD computer architecture, Nature592, 209 (2021)
work page 2021
-
[20]
S. A. Moses, C. H. Baldwin, M. S. Allman, R. An- cona, L. Ascarrunz, C. Barnes, J. Bartolotta, B. Bjork, P. Blanchard, M. Bohn, J. G. Bohnet, N. C. Brown, N. Q. Burdick, W. C. Burton, S. L. Campbell, J. P. Campora, C. Carron, J. Chambers, J. W. Chan, Y. H. Chen, A. Chernoguzov, E. Chertkov, J. Colina, J. P. Curtis, R. Daniel, M. DeCross, D. Deen, C. Delan...
work page 2023
-
[21]
J. Durandau, J. Wagner, F. Mailhot, C.-A. Brunet, F. Schmidt-Kaler, U. Poschinger, and Y. B´ erub´ e- Lauzi` ere, Automated Generation of Shuttling Sequences for a Linear Segmented Ion Trap Quantum Computer, Quantum7, 1175 (2023)
work page 2023
-
[22]
Helios: A 98-qubit trapped-ion quantum computer
A. Ransford, M. S. Allman, J. Arkinstall, J. P. Campora, S. F. Cooper, R. D. Delaney, J. M. Dreiling, B. Estey, C. Figgatt, A. Hall, A. A. Husain, A. Isanaka, C. J. Kennedy, N. Kotibhaskar, I. S. Madjarov, K. Mayer, A. R. Milne, A. J. Park, A. P. Reed, R. Ancona, M. P. Andersen, P. Andres-Martinez, W. Angenent, L. Ar- gueta, B. Arkin, L. Ascarrunz, W. Bak...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
T. Schmale, B. Temesi, A. Baishya, N. Pulido-Mateo, L. Krinner, T. Dubielzig, C. Ospelkaus, H. Weimer, and D. Borcherding, Backend compiler phases for trapped-ion quantum computers, in2022 IEEE International Confer- ence on Quantum Software (QSW)(2022) pp. 32–37
work page 2022
-
[24]
A. A. Saki, R. O. Topaloglu, and S. Ghosh, Muzzle the Shuttle: Efficient Compilation for Multi-Trap Trapped- Ion Quantum Computers, in2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)(2022) pp. 322–327
work page 2022
- [25]
- [26]
-
[27]
D. Schoenberger, S. Hillmich, M. Brandl, and R. Wille, Shuttling for Scalable Trapped-Ion Quantum Computers, IEEE Transactions on Computer-Aided Design of Inte- grated Circuits and Systems44, 2144 (2025)
work page 2025
-
[28]
W. Dai, K. A. Brown, and T. G. Robertazzi, Ad- vanced Shuttle Strategies for Parallel QCCD Architec- tures, IEEE Transactions on Quantum Engineering5, 1 (2024)
work page 2024
-
[29]
V.,https://qvls.de/ en/(2026), accessed: 2026-05-20
Quantum Valley Lower Saxony e. V.,https://qvls.de/ en/(2026), accessed: 2026-05-20
work page 2026
-
[30]
The MQT handbook: A summary of design automation tools and software for quantum computing,
R. Wille, L. Berent, T. Forster, J. Kunasaikaran, K. Mato, T. Peham, N. Quetschlich, D. Rovara, A. Sander, L. Schmid, D. Schoenberger, Y. Stade, and L. Burgholzer, The MQT handbook: A summary of de- sign automation tools and software for quantum com- puting, inIEEE International Conference on Quantum Software (QSW)(2024) 2405.17543
-
[31]
A. W. Cross, L. S. Bishop, S. Sheldon, P. D. Nation, and J. M. Gambetta, Validating quantum computers us- ing randomized model circuits, Physical Review A100, 032328 (2019)
work page 2019
-
[32]
qvls-q1.de/forschung(2026), accessed: 2026-05-20
Quantum Valley Lower Saxony Q1 - Forschung,www. qvls-q1.de/forschung(2026), accessed: 2026-05-20
work page 2026
-
[33]
D. Schoenberger, S. Hillmich, M. Brandl, and R. Wille, Using Boolean Satisfiability for Exact Shuttling in Trapped-Ion Quantum Computers, in2024 29th Asia and South Pacific Design Automation Conference (ASP- DAC)(2024) pp. 127–133
work page 2024
-
[34]
R. B. Blakestad,Transport of Trapped-Ion Qubits within a Scalable Quantum Processor, Ph.D. thesis, University of Colorado (2010)
work page 2010
-
[35]
On the Transport of Atomic Ions in Linear and Multidimensional Ion Trap Arrays
D. Hucul, M. Yeo, W. K. Hensinger, J. Rabchuk, S. Olm- schenk, and C. Monroe, On the Transport of Atomic Ions in Linear and Multidimensional Ion Trap Arrays (2008), arXiv:quant-ph/0702175
work page internal anchor Pith review Pith/arXiv arXiv 2008
-
[36]
F. Ungerechts, R. Munoz, A. Hoffmann, J. B¨ atge, M. M. Billah, T. Meiners, B. Kaune, G. Zarantonello, and C. Ospelkaus, Designing a Trapped-Ion Quantum Pro- cessor based on Near-Field Microwave Quantum Logic Gates (2026), to be published
work page 2026
-
[37]
F. Ungerechts, J. B¨ atge, M. M. Billah, L. Krieger, R. Munoz, P. Nuschke, A. Hoffmann, G. Zarantonello, and C. Ospelkaus, CIRQLE: A Comprehensive Register- Based Trapped-Ion Quantum Processor with Near-Field Microwave Control (2026), to be published
work page 2026
- [38]
-
[39]
A. Walther, F. Ziesel, T. Ruster, S. T. Dawkins, K. Ott, M. Hettrich, K. Singer, F. Schmidt-Kaler, and U. Poschinger, Controlling Fast Transport of Cold Trapped Ions, Physical Review Letters109, 080501 (2012)
work page 2012
-
[40]
X.-J. Lu, A. Ruschhaupt, and J. G. Muga, Fast shut- tling of a particle under weak spring-constant noise of the moving trap, Physical Review A97, 053402 (2018)
work page 2018
-
[41]
V. Kaushal, B. Lekitsch, A. Stahl, J. Hilder, D. Pijn, C. Schmiegelow, A. Bermudez, M. M¨ uller, F. Schmidt- Kaler, and U. Poschinger, Shuttling-based trapped-ion quantum information processing, AVS Quantum Science 2, 014101 (2020)
work page 2020
-
[42]
D. Schoenberger and R. Wille, Orchestrating Multi-Zone Shuttling in Trapped-Ion Quantum Computers, in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 01 (2025) pp. 1069–1075
work page 2025
-
[43]
D. Schoenberger, J. Hilder, F. Schmidt-Kaler, and R. Wille, Shuttling for Trapped-Ion Quantum Computers with Embedded Processing Zones, in2025 IEEE Interna- tional Conference on Quantum Software (QSW)(2025) pp. 123–129
work page 2025
-
[44]
T. Schmale, Hybrid quantum-classical computation – from infrastructure to algorithms, Institutionelles Repositorium der Leibniz Universit¨ at Hannover 10.15488/20338 (2026)
-
[45]
Sipser, Introduction to the theory of computation, ACM Sigact News27, 27 (1996)
M. Sipser, Introduction to the theory of computation, ACM Sigact News27, 27 (1996)
work page 1996
-
[46]
M. Bukov and F. Marquardt, Reinforcement Learning for Quantum Technology (2026), arXiv:2601.18953 [quant- ph]
-
[47]
M. L. Puterman,Markov decision processes: discrete stochastic dynamic programming(John Wiley & Sons, 2014)
work page 2014
-
[48]
R. S. Sutton and A. G. Barto,Reinforcement learning: an introduction, second edition ed., edited by F. Bach (MIT press, 2018)
work page 2018
-
[49]
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Advances in neural information processing systems12(1999)
work page 1999
-
[50]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[51]
High-Dimensional Continuous Control Using Generalized Advantage Estimation
J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, High-dimensional continuous control us- ing generalized advantage estimation, arXiv preprint arXiv:1506.02438 (2015). 15
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [52]
-
[53]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, At- tention is all you need, Advances in neural information processing systems30(2017)
work page 2017
-
[54]
L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., Impala: Scalable distributed deep-RL with im- portance weighted actor-learner architectures, inInter- national conference on machine learning(PMLR, 2018) pp. 1407–1416
work page 2018
-
[55]
H. Lee, D. Hwang, D. Kim, H. Kim, J. J. Tai, K. Subra- manian, P. R. Wurman, J. Choo, P. Stone, and T. Seno, Simba: Simplicity bias for scaling up parameters in deep reinforcement learning, in13th International Conference on Learning Representations, ICLR 2025(International Conference on Learning Representations, ICLR, 2025) pp. 50050–50082
work page 2025
-
[56]
A. Y. Ng, D. Harada, and S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, inIcml, Vol. 99 (1999) pp. 278–287
work page 1999
-
[57]
See Supplemental Material at [URL will be inserted by publisher] for all results on MQT circuits and animations of some shuttling sequences
-
[58]
L. De Moura and N. Bjørner, Z3: An efficient SMT solver, inInternational conference on Tools and Algo- rithms for the Construction and Analysis of Systems (Springer, 2008) pp. 337–340. Supplementary Material: Reinforcement learning for ion shuttling on trapped-ion quantum computers Maximilian Schier ∗ and Bodo Rosenhahn Institute for Information Processin...
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.