Molecular Quantum Control Algorithm Design by Reinforcement Learning

Anastasia Pipi; Arianna Wu; David R. Leibrandt; Prineha Narang; Xuecheng Tao

arxiv: 2410.11839 · v4 · submitted 2024-10-15 · 🪐 quant-ph · physics.atom-ph· physics.chem-ph· physics.optics

Molecular Quantum Control Algorithm Design by Reinforcement Learning

Anastasia Pipi , Xuecheng Tao , Arianna Wu , Prineha Narang , David R. Leibrandt This is my paper

Pith reviewed 2026-05-23 18:41 UTC · model grok-4.3

classification 🪐 quant-ph physics.atom-phphysics.chem-phphysics.optics

keywords reinforcement learningquantum controlmolecular ionsquantum logic spectroscopypolyatomic moleculesstate preparationH3O+

0 comments

The pith

Reinforcement learning designs pulse sequences to prepare polyatomic molecular ions in single pure quantum states

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RL-QLS, where a reinforcement learning agent optimizes sequences of laser pulses each followed by a projective measurement to drive the collapse of a molecular ion's quantum state to a single pure state. This is numerically shown to work for H3O+ despite 130 thermally populated eigenstates and degenerate transitions inside inversion doublets, and for CaH+ when thermal radiation is present. The agent is trained inside a quantum Markov decision process whose reward function encodes the physics of the measurements and transitions. If the approach holds, it removes the need for hand-crafted control sequences for complex polyatomics whose rovibrational structure has so far blocked precision experiments.

Core claim

RL-QLS lets a reinforcement-learning agent find effective sequences of pulses and projective measurements that probabilistically collapse a molecular ion into a single pure quantum state, using a quantum Markov decision process model and a physics-informed reward function; the method succeeds numerically for H3O+ with its 130 states and degeneracies and for CaH+ under thermal radiation.

What carries the argument

Reinforcement learning agent that selects pulse sequences followed by projective measurements inside a quantum Markov decision process whose reward function encodes state purity and transition physics

If this is right

Control sequences become feasible for polyatomic ions whose dense state manifolds previously resisted manual design.
Single pure quantum states can be reached even when many transitions are degenerate within inversion doublets.
The same modeling framework can incorporate quantum-chemistry calculations of energies and transition moments.
The resulting sequences are directly implementable in existing trapped-ion quantum-logic spectroscopy setups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to other polyatomic species targeted for tests of symmetry violation or dark-matter searches.
Similar RL agents might automate control design for other high-dimensional quantum systems such as larger molecules or many-body ensembles.
Experimental validation would demonstrate a hybrid quantum-classical loop for state preparation that scales beyond current manual methods.

Load-bearing premise

The quantum Markov decision process model together with its physics-informed reward function correctly reproduces the actual dynamics of pulses, projective measurements, degenerate transitions, and environmental noise.

What would settle it

Apply the RL-derived pulse sequence to a trapped H3O+ ion in the lab and measure whether the final population occupies a single eigenstate to high probability.

Figures

Figures reproduced from arXiv: 2410.11839 by Anastasia Pipi, Arianna Wu, David R. Leibrandt, Prineha Narang, Xuecheng Tao.

**Figure 2.** Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4: ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Precision measurements of molecules offer an unparalleled paradigm to probe physics beyond the Standard Model. The rich internal structure within these molecules makes them exquisite sensors for detecting fundamental symmetry violations, local position invariance, and dark matter. While trapping and control of diatomic and a few very simple polyatomic molecules have been experimentally demonstrated, leveraging the complex rovibrational structure of more general polyatomics demands the development of robust and efficient quantum control schemes. In this study, we present reinforcement-learning quantum-logic spectroscopy (RL-QLS), a general, reinforcement-learning-designed, quantum logic approach to prepare molecular ions in single, pure quantum states. The reinforcement learning agent optimizes the pulse sequence, each followed by a projective measurement, and probabilistically manipulates the collapse of the quantum system to a single state. The performance of the control algorithm is numerically demonstrated for the polyatomic molecule H$_3$O$^+$ with 130 thermally populated eigenstates and degenerate transitions within inversion doublets, where quantum Markov decision process modeling and a physics-informed reward function play a key role, as well as for CaH$^+$ under the disturbance of environmental thermal radiation. The developed theoretical framework cohesively integrates techniques from quantum chemistry, AMO physics, and artificial intelligence, and we expect that the results can be readily implemented for quantum control of polyatomic molecular ions with densely populated structures, thereby enabling new experimental tests of fundamental theories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RL-QLS gets numerical sequences for H3O+ state prep but the QMDP model's match to real projective measurements and degeneracies is untested.

read the letter

The paper's main result is that a reinforcement learning agent can design pulse-plus-measurement sequences to collapse a 130-state molecular ion like H3O+ into a single pure state, even with inversion-doublet degeneracies, and can do the same for CaH+ when thermal radiation is added. The framework uses a quantum Markov decision process plus a physics-informed reward to score the probabilistic outcomes of each projective measurement. That is a concrete, domain-specific application rather than a generic RL claim. The two numerical cases are the clearest evidence that the agent finds workable sequences inside the model. The integration of quantum chemistry inputs with the RL loop is also handled cleanly enough to be usable by someone already working on molecular ions. The soft spot is exactly the one the stress-test note flags: all performance numbers come from running inside the QMDP. How the model resolves multiple degenerate transitions with one measurement, or how thermal coupling populates the manifold, is not checked against lab data or even against an independent simulation. Without that check, or without baselines and error bars on the success rates, it is hard to know how much of the reported performance is real versus an artifact of the reward function and simulation choices. The abstract does not mention shipped code either, so independent reproduction is not straightforward. This paper is for people already doing quantum control or precision measurements with polyatomic ions. A reader who needs a new tool for dense level structures would find the framework worth looking at. It deserves a serious referee because the target problem is genuine and the RL extension is a legitimate one, even if the validation stays simulation-only for now. I would send it out for review.

Referee Report

2 major / 1 minor

Summary. The paper introduces reinforcement-learning quantum-logic spectroscopy (RL-QLS), a method that uses reinforcement learning to optimize sequences of control pulses followed by projective measurements for preparing molecular ions in single pure quantum states. It models the problem via a quantum Markov decision process with a physics-informed reward function and provides numerical demonstrations for H3O+ (130 thermally populated eigenstates with degenerate inversion-doublet transitions) and CaH+ (under thermal radiation disturbance).

Significance. If the underlying model accurately reproduces laboratory dynamics, the approach could offer a scalable route to quantum control of complex polyatomics, enabling new precision measurements. The integration of RL with quantum-logic spectroscopy is a coherent combination of techniques from quantum chemistry, AMO physics, and AI. However, the entirely in silico results generated inside the proposed QMDP model, without external benchmarks or code, reduce the immediate assessed significance.

major comments (2)

[Abstract / Numerical demonstrations] Abstract and numerical demonstrations: the central claim of a 'general approach' for systems with 130 states and degenerate transitions rests on unshown implementation specifics, including error bars, convergence criteria, comparison baselines, and handling of post-hoc choices in RL training; without these the reported performance cannot be evaluated.
[Quantum Markov decision process modeling] Quantum Markov decision process modeling: the claim that the QMDP plus physics-informed reward accurately captures projective collapse, pulse-driven transitions, and environmental noise for degenerate transitions is load-bearing for transferability, yet all results are generated inside this model with no validation against independent dynamics or real-device behavior.

minor comments (1)

The abstract would be clearer if it explicitly stated the success metric (e.g., probability of reaching the target state) and the number of training episodes or convergence threshold used.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below with clarifications and indicate where revisions will be made to strengthen the presentation.

read point-by-point responses

Referee: [Abstract / Numerical demonstrations] Abstract and numerical demonstrations: the central claim of a 'general approach' for systems with 130 states and degenerate transitions rests on unshown implementation specifics, including error bars, convergence criteria, comparison baselines, and handling of post-hoc choices in RL training; without these the reported performance cannot be evaluated.

Authors: We agree that additional implementation details are required for rigorous evaluation of the numerical results. The full manuscript describes the RL agent, QMDP formulation, and reward function in the Methods and supplementary sections, but we will revise the main text and figures to explicitly report: (i) error bars obtained from multiple independent training runs with different random seeds, (ii) convergence criteria (e.g., stabilization of average cumulative reward over a sliding window of episodes together with the maximum episode count), (iii) performance baselines including random pulse sequences and non-physics-informed RL variants, and (iv) documentation of hyperparameter selection and any post-hoc analysis choices. These additions will be incorporated in the revised manuscript. revision: yes
Referee: [Quantum Markov decision process modeling] Quantum Markov decision process modeling: the claim that the QMDP plus physics-informed reward accurately captures projective collapse, pulse-driven transitions, and environmental noise for degenerate transitions is load-bearing for transferability, yet all results are generated inside this model with no validation against independent dynamics or real-device behavior.

Authors: The QMDP is constructed from standard quantum-optical models of coherent pulse driving, projective quantum-logic measurements, and thermal radiation noise, with transition rates and degeneracies taken from ab initio quantum-chemistry calculations and known spectroscopic constants for H3O+ and CaH+. The physics-informed reward explicitly accounts for the target-state fidelity while penalizing population leakage into degenerate inversion-doublet states. We acknowledge that all demonstrations remain within this model and that external validation against independent solvers or laboratory data is absent. In revision we will expand the Discussion section to articulate model assumptions, parameter sources, and expected discrepancies with real devices, thereby clarifying the scope of transferability claims. Real-device validation lies beyond the present theoretical study. revision: partial

standing simulated objections not resolved

Validation of the QMDP model against independent dynamics solvers or real experimental/device behavior, which is outside the scope of this purely numerical theoretical proposal.

Circularity Check

0 steps flagged

No significant circularity in RL-QLS numerical demonstration

full rationale

The paper presents a reinforcement-learning method (RL-QLS) to optimize pulse sequences for preparing molecular ions in pure states, using a quantum Markov decision process model and physics-informed reward. Numerical results for H3O+ (130 states) and CaH+ are direct outputs of running the RL agent inside this explicitly defined model. No load-bearing step reduces by the paper's own equations or self-citation to its inputs by construction; the reward function and QMDP are model definitions, not fitted parameters renamed as predictions. The framework combines standard quantum mechanics, AMO techniques, and RL optimization without invoking uniqueness theorems or ansatzes from prior self-work. This is a self-contained simulation-based method paper whose claims are falsifiable against external laboratory benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach assumes standard quantum mechanics for state evolution and projective measurements, plus the validity of the quantum MDP formulation; no new entities are introduced.

axioms (2)

domain assumption Molecular energy levels and transition matrix elements can be accurately computed from quantum chemistry methods.
Required to define the state space and pulse effects for H3O+ and CaH+.
domain assumption Projective measurements can be modeled as instantaneous collapses in the quantum Markov decision process.
Central to the RL control loop described in the abstract.

pith-pipeline@v0.9.0 · 5797 in / 1370 out tokens · 19875 ms · 2026-05-23T18:41:43.346570+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 4 internal anchors

[1]

(b) Mean number of steps to prepare a pure molecular state for diﬀerent purity thresholds, with the eﬀective BBR temperatures at 10 and 100 K. Fig. 2 presents the usage of the RL-QLS approach189 for state preparation. For illustration purposes, ini-190 tially we consider only the J f 2 manifolds of CaH +191 to match the NIST experiments [ 18, 19, 21] (Fig...

work page
[2]

Safronova, D

M. Safronova, D. Budker, D. DeMille, D. F. J. Kim-388 ball, A. Derevianko, and C. W. Clark, Search for389 new physics with atoms and molecules, Reviews of390 Modern Physics 90, 025008 (2018) .391

work page 2018
[3]

DeMille, N

D. DeMille, N. R. Hutzler, A. M. Rey, and392 T. Zelevinsky, Quantum sensing and metrology for393 fundamental physics with molecules, Nature Physics394 20, 741 (2024) .395

work page 2024
[4]

Kozlov and S

M. Kozlov and S. Levshakov, Sensitivity of the396 H3O+ inversion–rotational spectrum to changes in397 the electron-to-proton mass ratio, The Astrophysi-398 cal Journal 726, 65 (2010) .399

work page 2010
[5]

Letokhov, On diﬀerence of energy levels of400 left and right molecules due to weak interactions,401 Physics Letters A 53, 275 (1975) .402

V. Letokhov, On diﬀerence of energy levels of400 left and right molecules due to weak interactions,401 Physics Letters A 53, 275 (1975) .402

work page 1975
[6]

Quack, G

M. Quack, G. Seyfang, and G. Wichmann, Perspec-403 tives on parity violation in chiral molecules: the-404 ory, spectroscopic experiment and biomolecular ho-405 mochirality, Chemical Science 13, 10598 (2022) .406

work page 2022
[7]

Landau, E

A. Landau, E. Eduardus, D. Behar, E. R. Wal-407 lach, L. F. Pašteka, S. Faraji, A. Borschevsky, and408 Y. Shagam, Chiral molecule candidates for trapped409 ion spectroscopy by ab initio calculations: From410 state preparation to parity violation, J. Chem. Phys.411 159, 114307 (2023) .412

work page 2023
[8]

Mitra, K

D. Mitra, K. H. Leung, and T. Zelevinsky, Quan-413 tum control of molecules for fundamental physics,414 Physical Review A 105, 040101 (2022) .415

work page 2022
[9]

Patterson, Method for preparation and readout416 of polyatomic molecules in single quantum states,417 Physical Review A 97, 033403 (2018) .418

D. Patterson, Method for preparation and readout416 of polyatomic molecules in single quantum states,417 Physical Review A 97, 033403 (2018) .418

work page 2018
[10]

E. R. Hudson, Sympathetic cooling of molecular ions419 with ultracold atoms, EPJ Techniques and Instru-420 mentation 3, 1 (2016) .421

work page 2016
[11]

McCarron, M

D. McCarron, M. Steinecker, Y. Zhu, and D. De-422 Mille, Magnetic trapping of an ultracold gas of polar423 molecules, Phys. Rev. Lett. 121, 013202 (2018) .424

work page 2018
[12]

Ospelkaus, K.-K

S. Ospelkaus, K.-K. Ni, G. Quéméner, B. Neyenhuis,425 D. Wang, M. H. G. de Miranda, J. L. Bohn, J. Ye,426 and D. S. Jin, Controlling the hyperﬁne state of427 rovibronic ground-state polar molecules, Phys. Rev.428 Lett. 104, 030402 (2010) .429

work page 2010
[13]

B. L. Augenbraun, J. M. Doyle, T. Zelevinsky, and430 I. Kozyryev, Molecular asymmetry and optical cy-431 cling: laser cooling asymmetric top molecules, Phys.432 Rev. X 10, 031022 (2020) .433

work page 2020
[14]

Y. Zeng, A. Jadbabaie, A. N. Patel, P. Yu, T. C.434 Steimle, and N. R. Hutzler, Optical cycling in poly-435 atomic molecules with complex hyperﬁne structure,436 Phys. Rev. A 108, 012813 (2023) .437

work page 2023
[15]

C. E. Dickerson, A. N. Alexandrova, P. Narang,438 and J. P. Philbin, Single molecule superra-439 diance for optical cycling, arXiv:2310.01534440 10.48550/arXiv.2310.01534 (2023).441

work page doi:10.48550/arxiv.2310.01534 2023
[16]

P. O. Schmidt, T. Rosenband, C. Langer, W. M.442 Itano, J. C. Bergquist, and D. J. Wineland, Spec-443 troscopy using quantum logic, Science 309, 749444 (2005).445

work page 2005
[17]

Leibfried, Quantum state preparation and con-446 trol of single molecular ions, New Journal of Physics447 14, 023029 (2012) .448

D. Leibfried, Quantum state preparation and con-446 trol of single molecular ions, New Journal of Physics447 14, 023029 (2012) .448

work page 2012
[18]

Ding and D

S. Ding and D. Matsukevich, Quantum logic for the449 control and manipulation of molecular ions using a450 frequency comb, New Journal of Physics 14, 023028451 (2012).452

work page 2012
[19]

C.-w. Chou, C. Kurz, D. B. Hume, P. N. Plessow,453 D. R. Leibrandt, and D. Leibfried, Preparation and454 coherent manipulation of pure quantum states of a455 single molecular ion, Nature 545, 203 (2017) .456

work page 2017
[20]

Y. Lin, D. R. Leibrandt, D. Leibfried, and C.-w.457 Chou, Quantum entanglement between an atom and458 a molecule, Nature 581, 273 (2020) .459

work page 2020
[21]

C.-w. Chou, A. L. Collopy, C. Kurz, Y. Lin, M. E.460 Harding, P. N. Plessow, T. Fortier, S. Diddams,461 D. Leibfried, and D. R. Leibrandt, Frequency-comb462 spectroscopy on pure quantum states of a single463 molecular ion, Science 367, 1458 (2020) .464

work page 2020
[22]

Y. Liu, J. Schmidt, Z. Liu, D. R. Leibrandt,465 D. Leibfried, and C.-w. Chou, Quantum state track-466 ing and control of a single molecular ion in a thermal467 environment, Science 385, 790 (2024) .468

work page 2024
[23]

Holzapfel, F

D. Holzapfel, F. Schmid, N. Schwegler, O. Stadler,469 M. Stadler, A. Ferk, J. P. Home, and D. Kien-470 zler, Quantum control of a single H + 2 molecular471 ion, arXiv:2409.06495 10.48550/arXiv.2409.06495472 (2024).473

work page doi:10.48550/arxiv.2409.06495472 2024
[24]

Sinhal, Z

M. Sinhal, Z. Meir, K. Najaﬁan, G. Hegi, and474 S. Willitsch, Quantum-nondemolition state detec-475 tion and spectroscopy of single trapped molecules,476 Science 367, 1213 (2020) .477 7

work page 2020
[25]

F. Wolf, Y. Wan, J. C. Heip, F. Gebert, C. Shi,478 and P. O. Schmidt, Non-destructive state detection479 for quantum logic spectroscopy of molecular ions,480 Nature 530, 457 (2016) .481

work page 2016
[26]

Zhang, Z

X.-M. Zhang, Z. Wei, R. Asad, X.-C. Yang, and482 X. Wang, When does reinforcement learning stand483 out in quantum control? a comparative study on484 state preparation, npj Quantum Information 5, 85485 (2019).486

work page 2019
[27]

An, H.-J

Z. An, H.-J. Song, Q.-K. He, and D. Zhou, Quan-487 tum optimal control of multilevel dissipative quan-488 tum systems with reinforcement learning, Physical489 Review A 103, 012404 (2021) .490

work page 2021
[28]

Mackeprang, D

J. Mackeprang, D. B. R. Dasari, and J. Wrachtrup,491 A reinforcement learning approach for quantum492 state engineering, Quantum Machine Intelligence 2,493 1 (2020) .494

work page 2020
[29]

Paparelle, L

I. Paparelle, L. Moro, and E. Prati, Digitally stimu-495 lated raman passage by deep reinforcement learning,496 Physics Letters A 384, 126266 (2020) .497

work page 2020
[30]

M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and498 H. Neven, Universal quantum control through deep499 reinforcement learning, npj Quantum Information500 5, 33 (2019) .501

work page 2019
[31]

Preti, M

F. Preti, M. Schilling, S. Jerbi, L. M. Trenkwalder,502 H. P. Nautrup, F. Motzoi, and H. J. Briegel, Hy-503 brid discrete-continuous compilation of trapped-ion504 quantum circuits with deep reinforcement learning,505 Quantum 8, 1343 (2024) .506

work page 2024
[32]

R. S. Sutton and A. G. Barto, Reinforcement learn-507 ing: An introduction (MIT press, 2018).508

work page 2018
[34]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu,513 J. Veness, M. G. Bellemare, A. Graves, M. Ried-514 miller, A. K. Fidjeland, G. Ostrovski, et al., Human-515 level control through deep reinforcement learning,516 Nature 518, 529 (2015) .517

work page 2015
[35]

C. J. Watkins, Learning from delayed rewards, Ph.D.518 thesis, King’s College, Cambridge United Kingdom519 (1989).520

work page 1989
[36]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszke, S. Gross, F. Massa, A. Lerer,521 J. Bradbury, G. Chanan, T. Killeen, Z. Lin,522 N. Gimelshein, L. Antiga, et al. , Pytorch: An im-523 perative style, high-performance deep learning li-524 brary, arXiv:1912.01703 10.48550/arXiv:1912.01703525 (2019).526

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv:1912.01703525 1912
[37]

Chaﬀee, B

D. Chaﬀee, B. Margulis, A. Sheﬃeld, J. Schmidt,527 A. Reisenfeld, D. R. Leibrandt, D. Leibfried, and528 C.-W. Chou, High-ﬁdelity quantum state control529 of a polar molecular ion in a cryogenic environ-530 ment, arXiv:2506.14740 10.48550/arXiv.2506.14740531 (2025).532

work page doi:10.48550/arxiv.2506.14740531 2025
[38]

Wu et al

A. Wu et al. , Prospects of local position invariance533 measurement with quantum logic spectroscopy of a534 hydronium ion, in preparation.535

work page
[39]

Barry, D

J. Barry, D. T. Barry, and S. Aaronson, Quan-536 tum partially observable markov decision processes,537 Phys. Rev. A 90, 032311 (2014) .538

work page 2014
[40]

M. L. Littman, A. R. Cassandra, and L. P. Kael-539 bling, Learning policies for partially observable en-540 vironments: Scaling up, in Machine Learning Pro-541 ceedings (Elsevier, 1995) pp. 362–370.542

work page 1995
[41]

Molecular Quantum Control Algorithm Design by Reinforcement Learning

M. A. Nielsen and I. L. Chuang, Quantum computa-543 tion and quantum information (Cambridge univer-544 sity press, 2010).545 1 Supplementary Material for “Molecular Quantum Control Algorithm Design by Reinforcement Learning” Anastasia Pipi ∗ , Xuecheng Tao ∗, , Arianna Wu, Prineha Narang !, and David R. Leibrandt § Contact author: xuechengtao@gmail.com. C...

work page 2010
[42]

C.-w. Chou, C. Kurz, D. B. Hume, P. N. Plessow, D. R. Leibrandt, and D. Leibfried, Nature 545, 203 (2017)

work page 2017
[43]

J. R. Johansson, P. D. Nation, and F. Nori, Computer physics communications 183, 1760 (2012)

work page 2012
[44]

Johansson, P

J. Johansson, P. Nation, and F. Nori, Computer Physics Communications 184, 1234 (2013)

work page 2013
[45]

Y. Liu, J. Schmidt, Z. Liu, D. R. Leibrandt, D. Leibfried, and C.-w. Chou, Science 385, 790 (2024)

work page 2024
[46]

A. L. Collopy, J. Schmidt, D. Leibfried, D. R. Leibrandt, and C.-W. Chou, Physical Review Letters 130, 223201 (2023)

work page 2023
[47]

C. H. Townes and A. L. Schawlow, Microwave spectroscopy (McGRA W-HILL BOOK COMPANY, 1955)

work page 1955
[48]

Wu et al

A. Wu et al. , in preparation

work page
[49]

Corney, Atomic and laser spectroscopy (Clarendon Press, 1977)

A. Corney, Atomic and laser spectroscopy (Clarendon Press, 1977)

work page 1977
[50]

Virtanen, R

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, ...

work page 2020
[51]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, arXiv preprint arXiv:1707.06347 10.48550/arXiv.1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
[52]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, arXiv:1312.5602 10.48550/arXiv.1312.5602 (2013)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1312.5602 2013
[53]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. , Nature 518, 529 (2015) . 7

work page 2015
[54]

C. J. Watkins, Learning from delayed rewards, Ph.D. thesis, King’s College, Cambridge United Kingdom (1989)

work page 1989
[55]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , arXiv:1912.01703 10.48550/arXiv:1912.01703 (2019)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv:1912.01703 1912
[56]

model 600

J. Huerta-Cepas, F. Serra, and P. Bork, Molecular biology and evolution 33, 1635 (2016). 8 Sec. SE. Supplementary Figures and T ables 0.0 0.2 0.4 0.6 0.8 1.0 Time (ms) 0.0 0.2 0.4 0.6 0.8 1.0Probability Rabi Oscillations J=1 (no damping) |1,1.5,+ |1,0.5,+ Experimental data 0.0 0.2 0.4 0.6 0.8 1.0 Time (ms) 0.0 0.2 0.4 0.6 0.8 1.0Probability Rabi Oscillati...

work page 2016

[1] [1]

(b) Mean number of steps to prepare a pure molecular state for diﬀerent purity thresholds, with the eﬀective BBR temperatures at 10 and 100 K. Fig. 2 presents the usage of the RL-QLS approach189 for state preparation. For illustration purposes, ini-190 tially we consider only the J f 2 manifolds of CaH +191 to match the NIST experiments [ 18, 19, 21] (Fig...

work page

[2] [2]

Safronova, D

M. Safronova, D. Budker, D. DeMille, D. F. J. Kim-388 ball, A. Derevianko, and C. W. Clark, Search for389 new physics with atoms and molecules, Reviews of390 Modern Physics 90, 025008 (2018) .391

work page 2018

[3] [3]

DeMille, N

D. DeMille, N. R. Hutzler, A. M. Rey, and392 T. Zelevinsky, Quantum sensing and metrology for393 fundamental physics with molecules, Nature Physics394 20, 741 (2024) .395

work page 2024

[4] [4]

Kozlov and S

M. Kozlov and S. Levshakov, Sensitivity of the396 H3O+ inversion–rotational spectrum to changes in397 the electron-to-proton mass ratio, The Astrophysi-398 cal Journal 726, 65 (2010) .399

work page 2010

[5] [5]

Letokhov, On diﬀerence of energy levels of400 left and right molecules due to weak interactions,401 Physics Letters A 53, 275 (1975) .402

V. Letokhov, On diﬀerence of energy levels of400 left and right molecules due to weak interactions,401 Physics Letters A 53, 275 (1975) .402

work page 1975

[6] [6]

Quack, G

M. Quack, G. Seyfang, and G. Wichmann, Perspec-403 tives on parity violation in chiral molecules: the-404 ory, spectroscopic experiment and biomolecular ho-405 mochirality, Chemical Science 13, 10598 (2022) .406

work page 2022

[7] [7]

Landau, E

A. Landau, E. Eduardus, D. Behar, E. R. Wal-407 lach, L. F. Pašteka, S. Faraji, A. Borschevsky, and408 Y. Shagam, Chiral molecule candidates for trapped409 ion spectroscopy by ab initio calculations: From410 state preparation to parity violation, J. Chem. Phys.411 159, 114307 (2023) .412

work page 2023

[8] [8]

Mitra, K

D. Mitra, K. H. Leung, and T. Zelevinsky, Quan-413 tum control of molecules for fundamental physics,414 Physical Review A 105, 040101 (2022) .415

work page 2022

[9] [9]

Patterson, Method for preparation and readout416 of polyatomic molecules in single quantum states,417 Physical Review A 97, 033403 (2018) .418

D. Patterson, Method for preparation and readout416 of polyatomic molecules in single quantum states,417 Physical Review A 97, 033403 (2018) .418

work page 2018

[10] [10]

E. R. Hudson, Sympathetic cooling of molecular ions419 with ultracold atoms, EPJ Techniques and Instru-420 mentation 3, 1 (2016) .421

work page 2016

[11] [11]

McCarron, M

D. McCarron, M. Steinecker, Y. Zhu, and D. De-422 Mille, Magnetic trapping of an ultracold gas of polar423 molecules, Phys. Rev. Lett. 121, 013202 (2018) .424

work page 2018

[12] [12]

Ospelkaus, K.-K

S. Ospelkaus, K.-K. Ni, G. Quéméner, B. Neyenhuis,425 D. Wang, M. H. G. de Miranda, J. L. Bohn, J. Ye,426 and D. S. Jin, Controlling the hyperﬁne state of427 rovibronic ground-state polar molecules, Phys. Rev.428 Lett. 104, 030402 (2010) .429

work page 2010

[13] [13]

B. L. Augenbraun, J. M. Doyle, T. Zelevinsky, and430 I. Kozyryev, Molecular asymmetry and optical cy-431 cling: laser cooling asymmetric top molecules, Phys.432 Rev. X 10, 031022 (2020) .433

work page 2020

[14] [14]

Y. Zeng, A. Jadbabaie, A. N. Patel, P. Yu, T. C.434 Steimle, and N. R. Hutzler, Optical cycling in poly-435 atomic molecules with complex hyperﬁne structure,436 Phys. Rev. A 108, 012813 (2023) .437

work page 2023

[15] [15]

C. E. Dickerson, A. N. Alexandrova, P. Narang,438 and J. P. Philbin, Single molecule superra-439 diance for optical cycling, arXiv:2310.01534440 10.48550/arXiv.2310.01534 (2023).441

work page doi:10.48550/arxiv.2310.01534 2023

[16] [16]

P. O. Schmidt, T. Rosenband, C. Langer, W. M.442 Itano, J. C. Bergquist, and D. J. Wineland, Spec-443 troscopy using quantum logic, Science 309, 749444 (2005).445

work page 2005

[17] [17]

Leibfried, Quantum state preparation and con-446 trol of single molecular ions, New Journal of Physics447 14, 023029 (2012) .448

D. Leibfried, Quantum state preparation and con-446 trol of single molecular ions, New Journal of Physics447 14, 023029 (2012) .448

work page 2012

[18] [18]

Ding and D

S. Ding and D. Matsukevich, Quantum logic for the449 control and manipulation of molecular ions using a450 frequency comb, New Journal of Physics 14, 023028451 (2012).452

work page 2012

[19] [19]

C.-w. Chou, C. Kurz, D. B. Hume, P. N. Plessow,453 D. R. Leibrandt, and D. Leibfried, Preparation and454 coherent manipulation of pure quantum states of a455 single molecular ion, Nature 545, 203 (2017) .456

work page 2017

[20] [20]

Y. Lin, D. R. Leibrandt, D. Leibfried, and C.-w.457 Chou, Quantum entanglement between an atom and458 a molecule, Nature 581, 273 (2020) .459

work page 2020

[21] [21]

C.-w. Chou, A. L. Collopy, C. Kurz, Y. Lin, M. E.460 Harding, P. N. Plessow, T. Fortier, S. Diddams,461 D. Leibfried, and D. R. Leibrandt, Frequency-comb462 spectroscopy on pure quantum states of a single463 molecular ion, Science 367, 1458 (2020) .464

work page 2020

[22] [22]

Y. Liu, J. Schmidt, Z. Liu, D. R. Leibrandt,465 D. Leibfried, and C.-w. Chou, Quantum state track-466 ing and control of a single molecular ion in a thermal467 environment, Science 385, 790 (2024) .468

work page 2024

[23] [23]

Holzapfel, F

D. Holzapfel, F. Schmid, N. Schwegler, O. Stadler,469 M. Stadler, A. Ferk, J. P. Home, and D. Kien-470 zler, Quantum control of a single H + 2 molecular471 ion, arXiv:2409.06495 10.48550/arXiv.2409.06495472 (2024).473

work page doi:10.48550/arxiv.2409.06495472 2024

[24] [24]

Sinhal, Z

M. Sinhal, Z. Meir, K. Najaﬁan, G. Hegi, and474 S. Willitsch, Quantum-nondemolition state detec-475 tion and spectroscopy of single trapped molecules,476 Science 367, 1213 (2020) .477 7

work page 2020

[25] [25]

F. Wolf, Y. Wan, J. C. Heip, F. Gebert, C. Shi,478 and P. O. Schmidt, Non-destructive state detection479 for quantum logic spectroscopy of molecular ions,480 Nature 530, 457 (2016) .481

work page 2016

[26] [26]

Zhang, Z

X.-M. Zhang, Z. Wei, R. Asad, X.-C. Yang, and482 X. Wang, When does reinforcement learning stand483 out in quantum control? a comparative study on484 state preparation, npj Quantum Information 5, 85485 (2019).486

work page 2019

[27] [27]

An, H.-J

Z. An, H.-J. Song, Q.-K. He, and D. Zhou, Quan-487 tum optimal control of multilevel dissipative quan-488 tum systems with reinforcement learning, Physical489 Review A 103, 012404 (2021) .490

work page 2021

[28] [28]

Mackeprang, D

J. Mackeprang, D. B. R. Dasari, and J. Wrachtrup,491 A reinforcement learning approach for quantum492 state engineering, Quantum Machine Intelligence 2,493 1 (2020) .494

work page 2020

[29] [29]

Paparelle, L

I. Paparelle, L. Moro, and E. Prati, Digitally stimu-495 lated raman passage by deep reinforcement learning,496 Physics Letters A 384, 126266 (2020) .497

work page 2020

[30] [30]

M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and498 H. Neven, Universal quantum control through deep499 reinforcement learning, npj Quantum Information500 5, 33 (2019) .501

work page 2019

[31] [31]

Preti, M

F. Preti, M. Schilling, S. Jerbi, L. M. Trenkwalder,502 H. P. Nautrup, F. Motzoi, and H. J. Briegel, Hy-503 brid discrete-continuous compilation of trapped-ion504 quantum circuits with deep reinforcement learning,505 Quantum 8, 1343 (2024) .506

work page 2024

[32] [32]

R. S. Sutton and A. G. Barto, Reinforcement learn-507 ing: An introduction (MIT press, 2018).508

work page 2018

[33] [34]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu,513 J. Veness, M. G. Bellemare, A. Graves, M. Ried-514 miller, A. K. Fidjeland, G. Ostrovski, et al., Human-515 level control through deep reinforcement learning,516 Nature 518, 529 (2015) .517

work page 2015

[34] [35]

C. J. Watkins, Learning from delayed rewards, Ph.D.518 thesis, King’s College, Cambridge United Kingdom519 (1989).520

work page 1989

[35] [36]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszke, S. Gross, F. Massa, A. Lerer,521 J. Bradbury, G. Chanan, T. Killeen, Z. Lin,522 N. Gimelshein, L. Antiga, et al. , Pytorch: An im-523 perative style, high-performance deep learning li-524 brary, arXiv:1912.01703 10.48550/arXiv:1912.01703525 (2019).526

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv:1912.01703525 1912

[36] [37]

Chaﬀee, B

D. Chaﬀee, B. Margulis, A. Sheﬃeld, J. Schmidt,527 A. Reisenfeld, D. R. Leibrandt, D. Leibfried, and528 C.-W. Chou, High-ﬁdelity quantum state control529 of a polar molecular ion in a cryogenic environ-530 ment, arXiv:2506.14740 10.48550/arXiv.2506.14740531 (2025).532

work page doi:10.48550/arxiv.2506.14740531 2025

[37] [38]

Wu et al

A. Wu et al. , Prospects of local position invariance533 measurement with quantum logic spectroscopy of a534 hydronium ion, in preparation.535

work page

[38] [39]

Barry, D

J. Barry, D. T. Barry, and S. Aaronson, Quan-536 tum partially observable markov decision processes,537 Phys. Rev. A 90, 032311 (2014) .538

work page 2014

[39] [40]

M. L. Littman, A. R. Cassandra, and L. P. Kael-539 bling, Learning policies for partially observable en-540 vironments: Scaling up, in Machine Learning Pro-541 ceedings (Elsevier, 1995) pp. 362–370.542

work page 1995

[40] [41]

Molecular Quantum Control Algorithm Design by Reinforcement Learning

M. A. Nielsen and I. L. Chuang, Quantum computa-543 tion and quantum information (Cambridge univer-544 sity press, 2010).545 1 Supplementary Material for “Molecular Quantum Control Algorithm Design by Reinforcement Learning” Anastasia Pipi ∗ , Xuecheng Tao ∗, , Arianna Wu, Prineha Narang !, and David R. Leibrandt § Contact author: xuechengtao@gmail.com. C...

work page 2010

[41] [42]

C.-w. Chou, C. Kurz, D. B. Hume, P. N. Plessow, D. R. Leibrandt, and D. Leibfried, Nature 545, 203 (2017)

work page 2017

[42] [43]

J. R. Johansson, P. D. Nation, and F. Nori, Computer physics communications 183, 1760 (2012)

work page 2012

[43] [44]

Johansson, P

J. Johansson, P. Nation, and F. Nori, Computer Physics Communications 184, 1234 (2013)

work page 2013

[44] [45]

Y. Liu, J. Schmidt, Z. Liu, D. R. Leibrandt, D. Leibfried, and C.-w. Chou, Science 385, 790 (2024)

work page 2024

[45] [46]

A. L. Collopy, J. Schmidt, D. Leibfried, D. R. Leibrandt, and C.-W. Chou, Physical Review Letters 130, 223201 (2023)

work page 2023

[46] [47]

C. H. Townes and A. L. Schawlow, Microwave spectroscopy (McGRA W-HILL BOOK COMPANY, 1955)

work page 1955

[47] [48]

Wu et al

A. Wu et al. , in preparation

work page

[48] [49]

Corney, Atomic and laser spectroscopy (Clarendon Press, 1977)

A. Corney, Atomic and laser spectroscopy (Clarendon Press, 1977)

work page 1977

[49] [50]

Virtanen, R

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, ...

work page 2020

[50] [51]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, arXiv preprint arXiv:1707.06347 10.48550/arXiv.1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017

[51] [52]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, arXiv:1312.5602 10.48550/arXiv.1312.5602 (2013)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1312.5602 2013

[52] [53]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. , Nature 518, 529 (2015) . 7

work page 2015

[53] [54]

C. J. Watkins, Learning from delayed rewards, Ph.D. thesis, King’s College, Cambridge United Kingdom (1989)

work page 1989

[54] [55]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , arXiv:1912.01703 10.48550/arXiv:1912.01703 (2019)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv:1912.01703 1912

[55] [56]

model 600

J. Huerta-Cepas, F. Serra, and P. Bork, Molecular biology and evolution 33, 1635 (2016). 8 Sec. SE. Supplementary Figures and T ables 0.0 0.2 0.4 0.6 0.8 1.0 Time (ms) 0.0 0.2 0.4 0.6 0.8 1.0Probability Rabi Oscillations J=1 (no damping) |1,1.5,+ |1,0.5,+ Experimental data 0.0 0.2 0.4 0.6 0.8 1.0 Time (ms) 0.0 0.2 0.4 0.6 0.8 1.0Probability Rabi Oscillati...

work page 2016