Reinforcement learning for ion shuttling on trapped-ion quantum computers

Bodo Rosenhahn; Christian Staufenbiel; Daniel Borcherding; Lea Richtmann; Maximilian Schier; Mich\`ele Heurs; Tobias Schmale

arxiv: 2605.22463 · v1 · pith:BLKKWE25new · submitted 2026-05-21 · 🪐 quant-ph · cs.LG

Reinforcement learning for ion shuttling on trapped-ion quantum computers

Maximilian Schier , Lea Richtmann , Christian Staufenbiel , Tobias Schmale , Daniel Borcherding , Mich\`ele Heurs , Bodo Rosenhahn This is my paper

Pith reviewed 2026-05-22 05:37 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords reinforcement learningion shuttlingtrapped-ion quantum computingmodular architecturesquantum hardware optimizationquantum controlscalable quantum computing

0 comments

The pith

Reinforcement learning optimizes ion shuttling on trapped-ion quantum computers and cuts operations by up to 36.3 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Trapped-ion quantum computers rely on modular chips that divide tasks into separate zones for storage, preparation, and gates. Moving ions between these zones is called shuttling, and the number of required moves grows rapidly with more ions, turning it into a hard optimization task. The paper shows that a reinforcement learning agent can discover better shuttling strategies by trial and error inside a simulation of the device. This learned policy reduces the total number of shuttling steps by as much as 36.3 percent compared with standard heuristic rules. The same method works on several different chip layouts, giving hardware designers a practical way to check shuttling performance early in the design process.

Core claim

We demonstrate the first use of reinforcement learning for optimizing ion shuttling. The RL agent learns a shuttling policy through direct interaction with a simulation of the modular trapped-ion architecture. This policy outperforms existing heuristic techniques and reduces the number of shuttling operations by up to 36.3 percent. The approach applies readily to multiple chip architectures and supplies a tool for evaluating shuttling efficiency while designing future, more complex hardware.

What carries the argument

Reinforcement learning agent that learns a policy for choosing ion transport steps to minimize total shuttling operations in a simulated modular chip.

If this is right

Fewer shuttling steps lower the chance of errors during transport, supporting more reliable quantum circuits.
The method scales to larger ion numbers where manual or heuristic planning becomes impractical.
Designers can test proposed chip layouts for shuttling cost before fabrication.
The same RL framework can be reused across different zone arrangements with little extra tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the policy transfers well to hardware, it could shorten the engineering cycle for scaling modular ion traps.
Analogous RL techniques might later optimize other real-time control tasks such as gate tuning or error correction scheduling.
Combining the shuttling optimizer with full-circuit simulators would let researchers measure end-to-end speedups on larger algorithms.

Load-bearing premise

The simulation used for training must capture the main physical constraints and noise so that the learned policy works on real hardware without retraining.

What would settle it

Deploy the trained RL policy on a physical trapped-ion processor and count whether it performs fewer shuttling operations than the best current heuristic method on the same circuit.

Figures

Figures reproduced from arXiv: 2605.22463 by Bodo Rosenhahn, Christian Staufenbiel, Daniel Borcherding, Lea Richtmann, Maximilian Schier, Mich\`ele Heurs, Tobias Schmale.

**Figure 2.** Figure 2: FIG. 2. Proposed representation. The top left shows the chip state. A chip-specific adapter translates the chip state into a [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Comparison of ion shuttling durations using trajecto [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Comparison of ion shuttling duration of our pro [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Ion shuttling duration for different architectures op [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Detailed study on the influence of numeric encodings [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Scalable trapped-ion quantum computing is commonly realized with modular chips that feature distinct zones with specific functionalities, such as storage, state preparation, and gate execution. To execute a quantum circuit, the ions must be transported between these zones. This process is called ion shuttling. To achieve reliable computation results, the shuttling process must be optimized. However, as the number of ions increases, this becomes a high-dimensional optimization problem where optimal solutions cannot be computed efficiently. We demonstrate, to the best of our knowledge, the first use of reinforcement learning (RL) for the optimization of ion shuttling. RL is well-suited for such scenarios, as it enables learning a strategy through direct interaction with the problem. We show that our RL approach outperforms current state-of-the-art heuristic techniques, yielding a reduction in shuttling operations of up to 36.3 %. Furthermore, we show that our method is easily applicable to various chip architectures. Our approach offers a versatile method to study shuttling efficiency during chip design and, therefore, a highly relevant tool for future, more complex architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RL for ion shuttling reports a 36% reduction over heuristics inside simulation but shows no hardware transfer results.

read the letter

The main takeaway is that this paper applies reinforcement learning to optimize ion shuttling in modular trapped-ion chips and claims up to a 36.3% drop in operations compared with standard heuristics, while showing the method works on different layouts. They frame it as the first RL treatment of this specific task. That framing holds up against the cited prior work on shuttling heuristics and other quantum control uses of RL. The approach fits the problem because shuttling paths grow into a high-dimensional search as ion count and zone count increase, and RL can learn a policy through repeated interaction without exhaustive enumeration. Demonstrating applicability across architectures is useful for early chip design studies. The quantitative comparison gives a clear benchmark number to evaluate against existing methods. The central weakness is that every result stays inside the authors' simulator. No hardware runs close the loop, so we have no data on whether the learned policy survives real motional heating, voltage noise, or inter-zone coupling. Without those tests or even a detailed comparison of simulator fidelity to measured device parameters, the 36% figure remains an in-simulation improvement whose practical size is unknown. Training details, statistical variability, and sensitivity to simulator assumptions are also missing from the abstract-level description. This work is aimed at trapped-ion hardware teams and control-software groups who already use heuristics for shuttling and want to explore learned alternatives during architecture studies. A reader already running RL on quantum tasks could extract the environment setup and reward design for their own experiments. The paper is coherent on its own terms and engages the relevant literature, so it shows honest engagement rather than fitting artifacts. I would bring it to a reading group focused on quantum control or applied RL. I would not cite it yet because the evidence is simulation-only. It deserves peer review so referees can examine the simulator construction and ask for at least preliminary hardware checks or transfer experiments.

Referee Report

2 major / 1 minor

Summary. The paper presents a reinforcement learning (RL) method for optimizing ion shuttling in modular trapped-ion quantum computing architectures. It claims to be the first application of RL to this problem and reports that the approach reduces the number of shuttling operations by up to 36.3% relative to existing heuristic techniques while remaining applicable across different chip layouts.

Significance. If the simulator faithfully captures the dominant physical constraints and the learned policies transfer to hardware, the work would supply a practical, scalable tool for studying and improving shuttling efficiency during the design of complex trapped-ion chips. The absence of hardware validation and simulation-fidelity metrics currently limits the strength of this assessment.

major comments (2)

[Abstract] Abstract: The headline performance claim of a 36.3% reduction in shuttling operations is obtained entirely inside an author-defined simulator. No information is supplied on the motional-heating rates, voltage-noise spectra, inter-zone coupling strengths, or other error channels included in the environment, nor are any closed-loop hardware experiments reported that would close the sim-to-real gap. This directly affects the load-bearing claim that the method is useful for real devices.
[Abstract] The manuscript states that the RL policy outperforms 'current state-of-the-art heuristic techniques' but provides neither the explicit definitions of those heuristics nor quantitative tables comparing operation counts, fidelity, or runtime across multiple ion numbers and architectures. Without these baselines the magnitude of the reported improvement cannot be independently verified.

minor comments (1)

[Abstract] The abstract asserts applicability to 'various chip architectures' but does not indicate whether the same reward function and state representation were used without modification or whether architecture-specific retraining was required.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful comments, which have helped us clarify the scope and presentation of our work. We provide point-by-point responses to the major comments below, indicating where revisions will be incorporated.

read point-by-point responses

Referee: [Abstract] Abstract: The headline performance claim of a 36.3% reduction in shuttling operations is obtained entirely inside an author-defined simulator. No information is supplied on the motional-heating rates, voltage-noise spectra, inter-zone coupling strengths, or other error channels included in the environment, nor are any closed-loop hardware experiments reported that would close the sim-to-real gap. This directly affects the load-bearing claim that the method is useful for real devices.

Authors: We agree that additional details on the simulator would improve transparency. Our environment models the discrete zone assignments and transport operations central to modular trapped-ion architectures, with simplified representations of physical constraints to enable scalable RL training. We will revise the methods section to explicitly list the included assumptions (e.g., idealized transport times and basic heating estimates) and any omitted channels. We acknowledge that the work does not include hardware validation or full error-channel fidelity metrics; as a simulation study demonstrating RL feasibility, we will add a limitations paragraph discussing the sim-to-real gap and suggesting future experimental directions, but we cannot report closed-loop hardware results at this stage. revision: partial
Referee: [Abstract] The manuscript states that the RL policy outperforms 'current state-of-the-art heuristic techniques' but provides neither the explicit definitions of those heuristics nor quantitative tables comparing operation counts, fidelity, or runtime across multiple ion numbers and architectures. Without these baselines the magnitude of the reported improvement cannot be independently verified.

Authors: We will revise the manuscript to provide explicit definitions of the baseline heuristics, including nearest-zone greedy assignment and shortest-path routing methods drawn from prior trapped-ion literature. We will also add quantitative comparison tables and supplementary figures reporting operation counts, estimated fidelities, and wall-clock runtimes for the RL policy versus these heuristics, evaluated across ion numbers from 4 to 20 and at least three distinct chip layouts. These additions will allow direct verification of the maximum 36.3% reduction in shuttling operations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL performance comparison in author-defined simulator

full rationale

The paper reports an empirical result: an RL policy trained inside a custom simulator achieves up to 36.3% fewer shuttling operations than heuristics. No equations, derivations, or uniqueness theorems are presented whose outputs reduce by construction to the inputs or to self-citations. The performance metric is measured directly against the same simulator used for training; this is a standard empirical benchmark, not a definitional or fitted-input circularity. The sim-to-real transfer gap is a validity concern, not a circularity in the reported chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5741 in / 1065 out tokens · 51196 ms · 2026-05-22T05:37:41.864158+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 4 internal anchors

[1]

It consists of four registers connected by an X-junction, we refer to it as the “X-chip”

Example architecture 1: X-chip Our first example architecture is the QVLS QROSS chip [23], the first proposal for a QCCD developed by Quantum Valley Lower Saxony (QVLS) [16]. It consists of four registers connected by an X-junction, we refer to it as the “X-chip”. The registers include a compute zone that can hold up to 2 ions, a state preparation and mea...

work page
[2]

Here, the storage zone is consolidated in a ring

Example architecture 2: Q-chip We also study an alternative chip design, the QVLS CIRQLE chip [24], with a more compact storage register, allowing more ions to fit on the same chip size. Here, the storage zone is consolidated in a ring. The compute zone (capacity 2 ions) and the SPAM zone (capacity 1 ion) are connected to this ring via a junction, resulti...

work page
[3]

[10] also implement proximity sorting and use the readout zone as temporary storage to make ions needed shortly after easily accessible

store ions that are used together in proximity. [10] also implement proximity sorting and use the readout zone as temporary storage to make ions needed shortly after easily accessible. Handling traffic blocks:Different heuristics are used in order to avoid an ion from being blocked because the path it has to take is occupied by other ions [13]. This is do...

work page
[4]

With this formula, they can then determine whether an ion should move or stay in the same trap

propose a probabilistic formula that accounts for several heuristics, such asleast movementsandhandling traffic blocks. With this formula, they can then determine whether an ion should move or stay in the same trap. The mentioned strategies are planning strategies; a shuttling protocol is developed for a given circuit in ad- vance and then executed on the...

work page
[5]

Reference 1: Heuristic compiler In the framework of the QVLS X-chip, a shuttling com- piler was developed to address the ion shuttling problem 4 using heuristics derived from observations of the chip’s architecture [10]. The challenge of orchestrating ions across the chip to execute a given quantum circuit was therefore decomposed into several phases, two...

work page
[6]

SAT solvers

Reference 2: SAT solver For benchmarking purposes, it is useful to compare ob- tained trajectories against optimal ones. While finding these optimal trajectories is likely unfeasible in the gen- eral case, we can at least study small instances to gain some basic insight. In principle, a naive exhaustive search through all shut- tling sequences of a fixed ...

work page
[7]

vectors, using some transformation

Requirements for representations When employing a neural network as a policyπ, the state spaceSmust usually be transformed to compatible representations, e.g. vectors, using some transformation. For simplicity in a slight abuse of notation we useSas the representation space directly. A good representation of the chip state and circuit to be executed shoul...

work page
[8]

This is illustrated in Figure 2 for a lookahead ofk lookahead = 2

Proposed representation The core idea of our proposed representation is ab- stracting the qubit label and sequence position of a two- qubit gate by encoding a gate through the cell-location of the other operand and the depth of the gate in the dependency graph. This is illustrated in Figure 2 for a lookahead ofk lookahead = 2. The following steps are per- formed:

work page
[9]

Cell” and “Qubit

A chip-specific adapter translates the chip state (top left) into a tabular formK(columns “Cell” and “Qubit” on the right). In our case the adapter simply iterates all zones starting with the position next to the junction

work page
[10]

If the circuit (bottom left) is given as a list of gates, the directed acyclic graph of the circuit is calcu- 7 1 2 3 4 5 g1 g2 g3 g4 Circuit of Two-Qubit Gates g1 1 3 g2 2 4 g3 1 5 g4 1 3 Depth 0 Depth 1 Depth 2 Directed Circuit Graph 1 2 34 5 Storage Compute Spam Chip State Storage Adapter Encoding M (1, 5, 6) (1, 10, ⋄) (0, ⋄, ⋄) (0, ⋄, ⋄) (1, 2, ⋄) (1...

work page
[11]

For each cell, it is encoded whether it is occupied by a qubit

The encoding matrixMis computed (right). For each cell, it is encoded whether it is occupied by a qubit. Next, for depths in{0, . . . , k lookahead −1}, it is checked if a gate at that depth exists with the qubit of the current cell. If it exists, the cell of the other operand is encoded. Otherwise, an empty token⋄is encoded. Gates at a depth ofk lookahea...

work page
[12]

Shaped reward The basic reward signal for a goal-reaching problem is very sparse, as the agent receives a negative reward at a constant ratec r until a goal state is reached. If the problem only terminates upon reaching a goal state and the agent has not encountered any goal states yet, the value of every state must be estimated as V=−c r R ∞ 0 e−βtdt=− c...

work page
[13]

A starting state is generated by first drawing the number of ions or qubits on the chip: z∼Uniform({2,

Problem generation during training When training the RL agent, a diverse range of starting states is desirable, such that the entire possible problem space is well covered. A starting state is generated by first drawing the number of ions or qubits on the chip: z∼Uniform({2, . . . , n max}). Here,n max is the maximum number of ions supported. The qubits a...

work page 2057
[14]

C. D. Bruzewicz, J. Chiaverini, R. McConnell, and J. M. Sage, Trapped-ion quantum computing: Progress and challenges, Applied Physics Reviews6, 021314 (2019)

work page 2019
[15]

J. I. Cirac and P. Zoller, Quantum computations with cold trapped ions, Physical Review Letters74, 4091 (1995)

work page 1995
[16]

Sørensen and K

A. Sørensen and K. Mølmer, Quantum Computation with Ions in Thermal Motion, Physical Review Letters82, 1971 (1999)

work page 1971
[17]

Zarantonello, H

G. Zarantonello, H. Hahn, J. Morgner, M. Schulte, A. Bautista-Salvador, R. F. Werner, K. Hammerer, and C. Ospelkaus, Robust and Resource-Efficient Microwave Near-Field Entangling Be + 9 Gate, Physical Review Let- ters123, 260503 (2019)

work page 2019
[18]

Kielpinski, C

D. Kielpinski, C. Monroe, and D. J. Wineland, Architec- ture for a large-scale ion-trap quantum computer, Nature 417, 709 (2002)

work page 2002
[19]

J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyen- huis, Demonstration of the trapped-ion quantum CCD computer architecture, Nature592, 209 (2021)

work page 2021
[20]

S. A. Moses, C. H. Baldwin, M. S. Allman, R. An- cona, L. Ascarrunz, C. Barnes, J. Bartolotta, B. Bjork, P. Blanchard, M. Bohn, J. G. Bohnet, N. C. Brown, N. Q. Burdick, W. C. Burton, S. L. Campbell, J. P. Campora, C. Carron, J. Chambers, J. W. Chan, Y. H. Chen, A. Chernoguzov, E. Chertkov, J. Colina, J. P. Curtis, R. Daniel, M. DeCross, D. Deen, C. Delan...

work page 2023
[21]

Durandau, J

J. Durandau, J. Wagner, F. Mailhot, C.-A. Brunet, F. Schmidt-Kaler, U. Poschinger, and Y. B´ erub´ e- Lauzi` ere, Automated Generation of Shuttling Sequences for a Linear Segmented Ion Trap Quantum Computer, Quantum7, 1175 (2023)

work page 2023
[22]

Helios: A 98-qubit trapped-ion quantum computer

A. Ransford, M. S. Allman, J. Arkinstall, J. P. Campora, S. F. Cooper, R. D. Delaney, J. M. Dreiling, B. Estey, C. Figgatt, A. Hall, A. A. Husain, A. Isanaka, C. J. Kennedy, N. Kotibhaskar, I. S. Madjarov, K. Mayer, A. R. Milne, A. J. Park, A. P. Reed, R. Ancona, M. P. Andersen, P. Andres-Martinez, W. Angenent, L. Ar- gueta, B. Arkin, L. Ascarrunz, W. Bak...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Schmale, B

T. Schmale, B. Temesi, A. Baishya, N. Pulido-Mateo, L. Krinner, T. Dubielzig, C. Ospelkaus, H. Weimer, and D. Borcherding, Backend compiler phases for trapped-ion quantum computers, in2022 IEEE International Confer- ence on Quantum Software (QSW)(2022) pp. 32–37

work page 2022
[24]

A. A. Saki, R. O. Topaloglu, and S. Ghosh, Muzzle the Shuttle: Efficient Compilation for Multi-Trap Trapped- Ion Quantum Computers, in2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)(2022) pp. 322–327

work page 2022
[25]

Murali, D

P. Murali, D. M. Debroy, K. R. Brown, and M. Martonosi, Architecting Noisy Intermediate-Scale Trapped Ion Quantum Computers, in2020 ACM/IEEE 47th Annual International Symposium on Computer Ar- chitecture (ISCA)(2020) pp. 529–542

work page 2020
[26]

X. Wu, C. Zhu, J. Wang, and X. Wang, MUSS-TI: Multi- level Shuttle Scheduling for Large-Scale Entanglement Module Linked Trapped-Ion (2025), arXiv:2509.25988 [quant-ph]

work page arXiv 2025
[27]

Schoenberger, S

D. Schoenberger, S. Hillmich, M. Brandl, and R. Wille, Shuttling for Scalable Trapped-Ion Quantum Computers, IEEE Transactions on Computer-Aided Design of Inte- grated Circuits and Systems44, 2144 (2025)

work page 2025
[28]

W. Dai, K. A. Brown, and T. G. Robertazzi, Ad- vanced Shuttle Strategies for Parallel QCCD Architec- tures, IEEE Transactions on Quantum Engineering5, 1 (2024)

work page 2024
[29]

V.,https://qvls.de/ en/(2026), accessed: 2026-05-20

Quantum Valley Lower Saxony e. V.,https://qvls.de/ en/(2026), accessed: 2026-05-20

work page 2026
[30]

The MQT handbook: A summary of design automation tools and software for quantum computing,

R. Wille, L. Berent, T. Forster, J. Kunasaikaran, K. Mato, T. Peham, N. Quetschlich, D. Rovara, A. Sander, L. Schmid, D. Schoenberger, Y. Stade, and L. Burgholzer, The MQT handbook: A summary of de- sign automation tools and software for quantum com- puting, inIEEE International Conference on Quantum Software (QSW)(2024) 2405.17543

work page arXiv 2024
[31]

A. W. Cross, L. S. Bishop, S. Sheldon, P. D. Nation, and J. M. Gambetta, Validating quantum computers us- ing randomized model circuits, Physical Review A100, 032328 (2019)

work page 2019
[32]

qvls-q1.de/forschung(2026), accessed: 2026-05-20

Quantum Valley Lower Saxony Q1 - Forschung,www. qvls-q1.de/forschung(2026), accessed: 2026-05-20

work page 2026
[33]

Schoenberger, S

D. Schoenberger, S. Hillmich, M. Brandl, and R. Wille, Using Boolean Satisfiability for Exact Shuttling in Trapped-Ion Quantum Computers, in2024 29th Asia and South Pacific Design Automation Conference (ASP- DAC)(2024) pp. 127–133

work page 2024
[34]

R. B. Blakestad,Transport of Trapped-Ion Qubits within a Scalable Quantum Processor, Ph.D. thesis, University of Colorado (2010)

work page 2010
[35]

On the Transport of Atomic Ions in Linear and Multidimensional Ion Trap Arrays

D. Hucul, M. Yeo, W. K. Hensinger, J. Rabchuk, S. Olm- schenk, and C. Monroe, On the Transport of Atomic Ions in Linear and Multidimensional Ion Trap Arrays (2008), arXiv:quant-ph/0702175

work page internal anchor Pith review Pith/arXiv arXiv 2008
[36]

Ungerechts, R

F. Ungerechts, R. Munoz, A. Hoffmann, J. B¨ atge, M. M. Billah, T. Meiners, B. Kaune, G. Zarantonello, and C. Ospelkaus, Designing a Trapped-Ion Quantum Pro- cessor based on Near-Field Microwave Quantum Logic Gates (2026), to be published

work page 2026
[37]

Ungerechts, J

F. Ungerechts, J. B¨ atge, M. M. Billah, L. Krieger, R. Munoz, P. Nuschke, A. Hoffmann, G. Zarantonello, and C. Ospelkaus, CIRQLE: A Comprehensive Register- Based Trapped-Ion Quantum Processor with Near-Field Microwave Control (2026), to be published

work page 2026
[38]

Bowler, J

R. Bowler, J. Gaebler, Y. Lin, T. R. Tan, D. Han- neke, J. D. Jost, J. P. Home, D. Leibfried, and D. J. Wineland, Coherent Diabatic Ion Transport and Separa- tion in a Multizone Trap Array, Physical Review Letters 109, 080502 (2012)

work page 2012
[39]

Walther, F

A. Walther, F. Ziesel, T. Ruster, S. T. Dawkins, K. Ott, M. Hettrich, K. Singer, F. Schmidt-Kaler, and U. Poschinger, Controlling Fast Transport of Cold Trapped Ions, Physical Review Letters109, 080501 (2012)

work page 2012
[40]

X.-J. Lu, A. Ruschhaupt, and J. G. Muga, Fast shut- tling of a particle under weak spring-constant noise of the moving trap, Physical Review A97, 053402 (2018)

work page 2018
[41]

Kaushal, B

V. Kaushal, B. Lekitsch, A. Stahl, J. Hilder, D. Pijn, C. Schmiegelow, A. Bermudez, M. M¨ uller, F. Schmidt- Kaler, and U. Poschinger, Shuttling-based trapped-ion quantum information processing, AVS Quantum Science 2, 014101 (2020)

work page 2020
[42]

Schoenberger and R

D. Schoenberger and R. Wille, Orchestrating Multi-Zone Shuttling in Trapped-Ion Quantum Computers, in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 01 (2025) pp. 1069–1075

work page 2025
[43]

Schoenberger, J

D. Schoenberger, J. Hilder, F. Schmidt-Kaler, and R. Wille, Shuttling for Trapped-Ion Quantum Computers with Embedded Processing Zones, in2025 IEEE Interna- tional Conference on Quantum Software (QSW)(2025) pp. 123–129

work page 2025
[44]

Schmale, Hybrid quantum-classical computation – from infrastructure to algorithms, Institutionelles Repositorium der Leibniz Universit¨ at Hannover 10.15488/20338 (2026)

T. Schmale, Hybrid quantum-classical computation – from infrastructure to algorithms, Institutionelles Repositorium der Leibniz Universit¨ at Hannover 10.15488/20338 (2026)

work page doi:10.15488/20338 2026
[45]

Sipser, Introduction to the theory of computation, ACM Sigact News27, 27 (1996)

M. Sipser, Introduction to the theory of computation, ACM Sigact News27, 27 (1996)

work page 1996
[46]

Bukov and F

M. Bukov and F. Marquardt, Reinforcement Learning for Quantum Technology (2026), arXiv:2601.18953 [quant- ph]

work page arXiv 2026
[47]

M. L. Puterman,Markov decision processes: discrete stochastic dynamic programming(John Wiley & Sons, 2014)

work page 2014
[48]

R. S. Sutton and A. G. Barto,Reinforcement learning: an introduction, second edition ed., edited by F. Bach (MIT press, 2018)

work page 2018
[49]

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Advances in neural information processing systems12(1999)

work page 1999
[50]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[51]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, High-dimensional continuous control us- ing generalized advantage estimation, arXiv preprint arXiv:1506.02438 (2015). 15

work page internal anchor Pith review Pith/arXiv arXiv 2015
[52]

Givan, T

R. Givan, T. Dean, and M. Greig, Equivalence notions and model minimization in markov decision processes, Artificial intelligence147, 163 (2003)

work page 2003
[53]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, At- tention is all you need, Advances in neural information processing systems30(2017)

work page 2017
[54]

Espeholt, H

L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., Impala: Scalable distributed deep-RL with im- portance weighted actor-learner architectures, inInter- national conference on machine learning(PMLR, 2018) pp. 1407–1416

work page 2018
[55]

H. Lee, D. Hwang, D. Kim, H. Kim, J. J. Tai, K. Subra- manian, P. R. Wurman, J. Choo, P. Stone, and T. Seno, Simba: Simplicity bias for scaling up parameters in deep reinforcement learning, in13th International Conference on Learning Representations, ICLR 2025(International Conference on Learning Representations, ICLR, 2025) pp. 50050–50082

work page 2025
[56]

A. Y. Ng, D. Harada, and S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, inIcml, Vol. 99 (1999) pp. 278–287

work page 1999
[57]

See Supplemental Material at [URL will be inserted by publisher] for all results on MQT circuits and animations of some shuttling sequences

work page
[58]

De Moura and N

L. De Moura and N. Bjørner, Z3: An efficient SMT solver, inInternational conference on Tools and Algo- rithms for the Construction and Analysis of Systems (Springer, 2008) pp. 337–340. Supplementary Material: Reinforcement learning for ion shuttling on trapped-ion quantum computers Maximilian Schier ∗ and Bodo Rosenhahn Institute for Information Processin...

work page 2008

[1] [1]

It consists of four registers connected by an X-junction, we refer to it as the “X-chip”

Example architecture 1: X-chip Our first example architecture is the QVLS QROSS chip [23], the first proposal for a QCCD developed by Quantum Valley Lower Saxony (QVLS) [16]. It consists of four registers connected by an X-junction, we refer to it as the “X-chip”. The registers include a compute zone that can hold up to 2 ions, a state preparation and mea...

work page

[2] [2]

Here, the storage zone is consolidated in a ring

Example architecture 2: Q-chip We also study an alternative chip design, the QVLS CIRQLE chip [24], with a more compact storage register, allowing more ions to fit on the same chip size. Here, the storage zone is consolidated in a ring. The compute zone (capacity 2 ions) and the SPAM zone (capacity 1 ion) are connected to this ring via a junction, resulti...

work page

[3] [3]

[10] also implement proximity sorting and use the readout zone as temporary storage to make ions needed shortly after easily accessible

store ions that are used together in proximity. [10] also implement proximity sorting and use the readout zone as temporary storage to make ions needed shortly after easily accessible. Handling traffic blocks:Different heuristics are used in order to avoid an ion from being blocked because the path it has to take is occupied by other ions [13]. This is do...

work page

[4] [4]

With this formula, they can then determine whether an ion should move or stay in the same trap

propose a probabilistic formula that accounts for several heuristics, such asleast movementsandhandling traffic blocks. With this formula, they can then determine whether an ion should move or stay in the same trap. The mentioned strategies are planning strategies; a shuttling protocol is developed for a given circuit in ad- vance and then executed on the...

work page

[5] [5]

Reference 1: Heuristic compiler In the framework of the QVLS X-chip, a shuttling com- piler was developed to address the ion shuttling problem 4 using heuristics derived from observations of the chip’s architecture [10]. The challenge of orchestrating ions across the chip to execute a given quantum circuit was therefore decomposed into several phases, two...

work page

[6] [6]

SAT solvers

Reference 2: SAT solver For benchmarking purposes, it is useful to compare ob- tained trajectories against optimal ones. While finding these optimal trajectories is likely unfeasible in the gen- eral case, we can at least study small instances to gain some basic insight. In principle, a naive exhaustive search through all shut- tling sequences of a fixed ...

work page

[7] [7]

vectors, using some transformation

Requirements for representations When employing a neural network as a policyπ, the state spaceSmust usually be transformed to compatible representations, e.g. vectors, using some transformation. For simplicity in a slight abuse of notation we useSas the representation space directly. A good representation of the chip state and circuit to be executed shoul...

work page

[8] [8]

This is illustrated in Figure 2 for a lookahead ofk lookahead = 2

Proposed representation The core idea of our proposed representation is ab- stracting the qubit label and sequence position of a two- qubit gate by encoding a gate through the cell-location of the other operand and the depth of the gate in the dependency graph. This is illustrated in Figure 2 for a lookahead ofk lookahead = 2. The following steps are per- formed:

work page

[9] [9]

Cell” and “Qubit

A chip-specific adapter translates the chip state (top left) into a tabular formK(columns “Cell” and “Qubit” on the right). In our case the adapter simply iterates all zones starting with the position next to the junction

work page

[10] [10]

If the circuit (bottom left) is given as a list of gates, the directed acyclic graph of the circuit is calcu- 7 1 2 3 4 5 g1 g2 g3 g4 Circuit of Two-Qubit Gates g1 1 3 g2 2 4 g3 1 5 g4 1 3 Depth 0 Depth 1 Depth 2 Directed Circuit Graph 1 2 34 5 Storage Compute Spam Chip State Storage Adapter Encoding M (1, 5, 6) (1, 10, ⋄) (0, ⋄, ⋄) (0, ⋄, ⋄) (1, 2, ⋄) (1...

work page

[11] [11]

For each cell, it is encoded whether it is occupied by a qubit

The encoding matrixMis computed (right). For each cell, it is encoded whether it is occupied by a qubit. Next, for depths in{0, . . . , k lookahead −1}, it is checked if a gate at that depth exists with the qubit of the current cell. If it exists, the cell of the other operand is encoded. Otherwise, an empty token⋄is encoded. Gates at a depth ofk lookahea...

work page

[12] [12]

Shaped reward The basic reward signal for a goal-reaching problem is very sparse, as the agent receives a negative reward at a constant ratec r until a goal state is reached. If the problem only terminates upon reaching a goal state and the agent has not encountered any goal states yet, the value of every state must be estimated as V=−c r R ∞ 0 e−βtdt=− c...

work page

[13] [13]

A starting state is generated by first drawing the number of ions or qubits on the chip: z∼Uniform({2,

Problem generation during training When training the RL agent, a diverse range of starting states is desirable, such that the entire possible problem space is well covered. A starting state is generated by first drawing the number of ions or qubits on the chip: z∼Uniform({2, . . . , n max}). Here,n max is the maximum number of ions supported. The qubits a...

work page 2057

[14] [14]

C. D. Bruzewicz, J. Chiaverini, R. McConnell, and J. M. Sage, Trapped-ion quantum computing: Progress and challenges, Applied Physics Reviews6, 021314 (2019)

work page 2019

[15] [15]

J. I. Cirac and P. Zoller, Quantum computations with cold trapped ions, Physical Review Letters74, 4091 (1995)

work page 1995

[16] [16]

Sørensen and K

A. Sørensen and K. Mølmer, Quantum Computation with Ions in Thermal Motion, Physical Review Letters82, 1971 (1999)

work page 1971

[17] [17]

Zarantonello, H

G. Zarantonello, H. Hahn, J. Morgner, M. Schulte, A. Bautista-Salvador, R. F. Werner, K. Hammerer, and C. Ospelkaus, Robust and Resource-Efficient Microwave Near-Field Entangling Be + 9 Gate, Physical Review Let- ters123, 260503 (2019)

work page 2019

[18] [18]

Kielpinski, C

D. Kielpinski, C. Monroe, and D. J. Wineland, Architec- ture for a large-scale ion-trap quantum computer, Nature 417, 709 (2002)

work page 2002

[19] [19]

J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyen- huis, Demonstration of the trapped-ion quantum CCD computer architecture, Nature592, 209 (2021)

work page 2021

[20] [20]

S. A. Moses, C. H. Baldwin, M. S. Allman, R. An- cona, L. Ascarrunz, C. Barnes, J. Bartolotta, B. Bjork, P. Blanchard, M. Bohn, J. G. Bohnet, N. C. Brown, N. Q. Burdick, W. C. Burton, S. L. Campbell, J. P. Campora, C. Carron, J. Chambers, J. W. Chan, Y. H. Chen, A. Chernoguzov, E. Chertkov, J. Colina, J. P. Curtis, R. Daniel, M. DeCross, D. Deen, C. Delan...

work page 2023

[21] [21]

Durandau, J

J. Durandau, J. Wagner, F. Mailhot, C.-A. Brunet, F. Schmidt-Kaler, U. Poschinger, and Y. B´ erub´ e- Lauzi` ere, Automated Generation of Shuttling Sequences for a Linear Segmented Ion Trap Quantum Computer, Quantum7, 1175 (2023)

work page 2023

[22] [22]

Helios: A 98-qubit trapped-ion quantum computer

A. Ransford, M. S. Allman, J. Arkinstall, J. P. Campora, S. F. Cooper, R. D. Delaney, J. M. Dreiling, B. Estey, C. Figgatt, A. Hall, A. A. Husain, A. Isanaka, C. J. Kennedy, N. Kotibhaskar, I. S. Madjarov, K. Mayer, A. R. Milne, A. J. Park, A. P. Reed, R. Ancona, M. P. Andersen, P. Andres-Martinez, W. Angenent, L. Ar- gueta, B. Arkin, L. Ascarrunz, W. Bak...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Schmale, B

T. Schmale, B. Temesi, A. Baishya, N. Pulido-Mateo, L. Krinner, T. Dubielzig, C. Ospelkaus, H. Weimer, and D. Borcherding, Backend compiler phases for trapped-ion quantum computers, in2022 IEEE International Confer- ence on Quantum Software (QSW)(2022) pp. 32–37

work page 2022

[24] [24]

A. A. Saki, R. O. Topaloglu, and S. Ghosh, Muzzle the Shuttle: Efficient Compilation for Multi-Trap Trapped- Ion Quantum Computers, in2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)(2022) pp. 322–327

work page 2022

[25] [25]

Murali, D

P. Murali, D. M. Debroy, K. R. Brown, and M. Martonosi, Architecting Noisy Intermediate-Scale Trapped Ion Quantum Computers, in2020 ACM/IEEE 47th Annual International Symposium on Computer Ar- chitecture (ISCA)(2020) pp. 529–542

work page 2020

[26] [26]

X. Wu, C. Zhu, J. Wang, and X. Wang, MUSS-TI: Multi- level Shuttle Scheduling for Large-Scale Entanglement Module Linked Trapped-Ion (2025), arXiv:2509.25988 [quant-ph]

work page arXiv 2025

[27] [27]

Schoenberger, S

D. Schoenberger, S. Hillmich, M. Brandl, and R. Wille, Shuttling for Scalable Trapped-Ion Quantum Computers, IEEE Transactions on Computer-Aided Design of Inte- grated Circuits and Systems44, 2144 (2025)

work page 2025

[28] [28]

W. Dai, K. A. Brown, and T. G. Robertazzi, Ad- vanced Shuttle Strategies for Parallel QCCD Architec- tures, IEEE Transactions on Quantum Engineering5, 1 (2024)

work page 2024

[29] [29]

V.,https://qvls.de/ en/(2026), accessed: 2026-05-20

Quantum Valley Lower Saxony e. V.,https://qvls.de/ en/(2026), accessed: 2026-05-20

work page 2026

[30] [30]

The MQT handbook: A summary of design automation tools and software for quantum computing,

R. Wille, L. Berent, T. Forster, J. Kunasaikaran, K. Mato, T. Peham, N. Quetschlich, D. Rovara, A. Sander, L. Schmid, D. Schoenberger, Y. Stade, and L. Burgholzer, The MQT handbook: A summary of de- sign automation tools and software for quantum com- puting, inIEEE International Conference on Quantum Software (QSW)(2024) 2405.17543

work page arXiv 2024

[31] [31]

A. W. Cross, L. S. Bishop, S. Sheldon, P. D. Nation, and J. M. Gambetta, Validating quantum computers us- ing randomized model circuits, Physical Review A100, 032328 (2019)

work page 2019

[32] [32]

qvls-q1.de/forschung(2026), accessed: 2026-05-20

Quantum Valley Lower Saxony Q1 - Forschung,www. qvls-q1.de/forschung(2026), accessed: 2026-05-20

work page 2026

[33] [33]

Schoenberger, S

D. Schoenberger, S. Hillmich, M. Brandl, and R. Wille, Using Boolean Satisfiability for Exact Shuttling in Trapped-Ion Quantum Computers, in2024 29th Asia and South Pacific Design Automation Conference (ASP- DAC)(2024) pp. 127–133

work page 2024

[34] [34]

R. B. Blakestad,Transport of Trapped-Ion Qubits within a Scalable Quantum Processor, Ph.D. thesis, University of Colorado (2010)

work page 2010

[35] [35]

On the Transport of Atomic Ions in Linear and Multidimensional Ion Trap Arrays

D. Hucul, M. Yeo, W. K. Hensinger, J. Rabchuk, S. Olm- schenk, and C. Monroe, On the Transport of Atomic Ions in Linear and Multidimensional Ion Trap Arrays (2008), arXiv:quant-ph/0702175

work page internal anchor Pith review Pith/arXiv arXiv 2008

[36] [36]

Ungerechts, R

F. Ungerechts, R. Munoz, A. Hoffmann, J. B¨ atge, M. M. Billah, T. Meiners, B. Kaune, G. Zarantonello, and C. Ospelkaus, Designing a Trapped-Ion Quantum Pro- cessor based on Near-Field Microwave Quantum Logic Gates (2026), to be published

work page 2026

[37] [37]

Ungerechts, J

F. Ungerechts, J. B¨ atge, M. M. Billah, L. Krieger, R. Munoz, P. Nuschke, A. Hoffmann, G. Zarantonello, and C. Ospelkaus, CIRQLE: A Comprehensive Register- Based Trapped-Ion Quantum Processor with Near-Field Microwave Control (2026), to be published

work page 2026

[38] [38]

Bowler, J

R. Bowler, J. Gaebler, Y. Lin, T. R. Tan, D. Han- neke, J. D. Jost, J. P. Home, D. Leibfried, and D. J. Wineland, Coherent Diabatic Ion Transport and Separa- tion in a Multizone Trap Array, Physical Review Letters 109, 080502 (2012)

work page 2012

[39] [39]

Walther, F

A. Walther, F. Ziesel, T. Ruster, S. T. Dawkins, K. Ott, M. Hettrich, K. Singer, F. Schmidt-Kaler, and U. Poschinger, Controlling Fast Transport of Cold Trapped Ions, Physical Review Letters109, 080501 (2012)

work page 2012

[40] [40]

X.-J. Lu, A. Ruschhaupt, and J. G. Muga, Fast shut- tling of a particle under weak spring-constant noise of the moving trap, Physical Review A97, 053402 (2018)

work page 2018

[41] [41]

Kaushal, B

V. Kaushal, B. Lekitsch, A. Stahl, J. Hilder, D. Pijn, C. Schmiegelow, A. Bermudez, M. M¨ uller, F. Schmidt- Kaler, and U. Poschinger, Shuttling-based trapped-ion quantum information processing, AVS Quantum Science 2, 014101 (2020)

work page 2020

[42] [42]

Schoenberger and R

D. Schoenberger and R. Wille, Orchestrating Multi-Zone Shuttling in Trapped-Ion Quantum Computers, in2025 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 01 (2025) pp. 1069–1075

work page 2025

[43] [43]

Schoenberger, J

D. Schoenberger, J. Hilder, F. Schmidt-Kaler, and R. Wille, Shuttling for Trapped-Ion Quantum Computers with Embedded Processing Zones, in2025 IEEE Interna- tional Conference on Quantum Software (QSW)(2025) pp. 123–129

work page 2025

[44] [44]

Schmale, Hybrid quantum-classical computation – from infrastructure to algorithms, Institutionelles Repositorium der Leibniz Universit¨ at Hannover 10.15488/20338 (2026)

T. Schmale, Hybrid quantum-classical computation – from infrastructure to algorithms, Institutionelles Repositorium der Leibniz Universit¨ at Hannover 10.15488/20338 (2026)

work page doi:10.15488/20338 2026

[45] [45]

Sipser, Introduction to the theory of computation, ACM Sigact News27, 27 (1996)

M. Sipser, Introduction to the theory of computation, ACM Sigact News27, 27 (1996)

work page 1996

[46] [46]

Bukov and F

M. Bukov and F. Marquardt, Reinforcement Learning for Quantum Technology (2026), arXiv:2601.18953 [quant- ph]

work page arXiv 2026

[47] [47]

M. L. Puterman,Markov decision processes: discrete stochastic dynamic programming(John Wiley & Sons, 2014)

work page 2014

[48] [48]

R. S. Sutton and A. G. Barto,Reinforcement learning: an introduction, second edition ed., edited by F. Bach (MIT press, 2018)

work page 2018

[49] [49]

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Advances in neural information processing systems12(1999)

work page 1999

[50] [50]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[51] [51]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, High-dimensional continuous control us- ing generalized advantage estimation, arXiv preprint arXiv:1506.02438 (2015). 15

work page internal anchor Pith review Pith/arXiv arXiv 2015

[52] [52]

Givan, T

R. Givan, T. Dean, and M. Greig, Equivalence notions and model minimization in markov decision processes, Artificial intelligence147, 163 (2003)

work page 2003

[53] [53]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, At- tention is all you need, Advances in neural information processing systems30(2017)

work page 2017

[54] [54]

Espeholt, H

L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., Impala: Scalable distributed deep-RL with im- portance weighted actor-learner architectures, inInter- national conference on machine learning(PMLR, 2018) pp. 1407–1416

work page 2018

[55] [55]

H. Lee, D. Hwang, D. Kim, H. Kim, J. J. Tai, K. Subra- manian, P. R. Wurman, J. Choo, P. Stone, and T. Seno, Simba: Simplicity bias for scaling up parameters in deep reinforcement learning, in13th International Conference on Learning Representations, ICLR 2025(International Conference on Learning Representations, ICLR, 2025) pp. 50050–50082

work page 2025

[56] [56]

A. Y. Ng, D. Harada, and S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, inIcml, Vol. 99 (1999) pp. 278–287

work page 1999

[57] [57]

See Supplemental Material at [URL will be inserted by publisher] for all results on MQT circuits and animations of some shuttling sequences

work page

[58] [58]

De Moura and N

L. De Moura and N. Bjørner, Z3: An efficient SMT solver, inInternational conference on Tools and Algo- rithms for the Construction and Analysis of Systems (Springer, 2008) pp. 337–340. Supplementary Material: Reinforcement learning for ion shuttling on trapped-ion quantum computers Maximilian Schier ∗ and Bodo Rosenhahn Institute for Information Processin...

work page 2008