Molecular Quantum Control Algorithm Design by Reinforcement Learning
Pith reviewed 2026-05-23 18:41 UTC · model grok-4.3
The pith
Reinforcement learning designs pulse sequences to prepare polyatomic molecular ions in single pure quantum states
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RL-QLS lets a reinforcement-learning agent find effective sequences of pulses and projective measurements that probabilistically collapse a molecular ion into a single pure quantum state, using a quantum Markov decision process model and a physics-informed reward function; the method succeeds numerically for H3O+ with its 130 states and degeneracies and for CaH+ under thermal radiation.
What carries the argument
Reinforcement learning agent that selects pulse sequences followed by projective measurements inside a quantum Markov decision process whose reward function encodes state purity and transition physics
If this is right
- Control sequences become feasible for polyatomic ions whose dense state manifolds previously resisted manual design.
- Single pure quantum states can be reached even when many transitions are degenerate within inversion doublets.
- The same modeling framework can incorporate quantum-chemistry calculations of energies and transition moments.
- The resulting sequences are directly implementable in existing trapped-ion quantum-logic spectroscopy setups.
Where Pith is reading between the lines
- The method could be applied to other polyatomic species targeted for tests of symmetry violation or dark-matter searches.
- Similar RL agents might automate control design for other high-dimensional quantum systems such as larger molecules or many-body ensembles.
- Experimental validation would demonstrate a hybrid quantum-classical loop for state preparation that scales beyond current manual methods.
Load-bearing premise
The quantum Markov decision process model together with its physics-informed reward function correctly reproduces the actual dynamics of pulses, projective measurements, degenerate transitions, and environmental noise.
What would settle it
Apply the RL-derived pulse sequence to a trapped H3O+ ion in the lab and measure whether the final population occupies a single eigenstate to high probability.
Figures
read the original abstract
Precision measurements of molecules offer an unparalleled paradigm to probe physics beyond the Standard Model. The rich internal structure within these molecules makes them exquisite sensors for detecting fundamental symmetry violations, local position invariance, and dark matter. While trapping and control of diatomic and a few very simple polyatomic molecules have been experimentally demonstrated, leveraging the complex rovibrational structure of more general polyatomics demands the development of robust and efficient quantum control schemes. In this study, we present reinforcement-learning quantum-logic spectroscopy (RL-QLS), a general, reinforcement-learning-designed, quantum logic approach to prepare molecular ions in single, pure quantum states. The reinforcement learning agent optimizes the pulse sequence, each followed by a projective measurement, and probabilistically manipulates the collapse of the quantum system to a single state. The performance of the control algorithm is numerically demonstrated for the polyatomic molecule H$_3$O$^+$ with 130 thermally populated eigenstates and degenerate transitions within inversion doublets, where quantum Markov decision process modeling and a physics-informed reward function play a key role, as well as for CaH$^+$ under the disturbance of environmental thermal radiation. The developed theoretical framework cohesively integrates techniques from quantum chemistry, AMO physics, and artificial intelligence, and we expect that the results can be readily implemented for quantum control of polyatomic molecular ions with densely populated structures, thereby enabling new experimental tests of fundamental theories.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces reinforcement-learning quantum-logic spectroscopy (RL-QLS), a method that uses reinforcement learning to optimize sequences of control pulses followed by projective measurements for preparing molecular ions in single pure quantum states. It models the problem via a quantum Markov decision process with a physics-informed reward function and provides numerical demonstrations for H3O+ (130 thermally populated eigenstates with degenerate inversion-doublet transitions) and CaH+ (under thermal radiation disturbance).
Significance. If the underlying model accurately reproduces laboratory dynamics, the approach could offer a scalable route to quantum control of complex polyatomics, enabling new precision measurements. The integration of RL with quantum-logic spectroscopy is a coherent combination of techniques from quantum chemistry, AMO physics, and AI. However, the entirely in silico results generated inside the proposed QMDP model, without external benchmarks or code, reduce the immediate assessed significance.
major comments (2)
- [Abstract / Numerical demonstrations] Abstract and numerical demonstrations: the central claim of a 'general approach' for systems with 130 states and degenerate transitions rests on unshown implementation specifics, including error bars, convergence criteria, comparison baselines, and handling of post-hoc choices in RL training; without these the reported performance cannot be evaluated.
- [Quantum Markov decision process modeling] Quantum Markov decision process modeling: the claim that the QMDP plus physics-informed reward accurately captures projective collapse, pulse-driven transitions, and environmental noise for degenerate transitions is load-bearing for transferability, yet all results are generated inside this model with no validation against independent dynamics or real-device behavior.
minor comments (1)
- The abstract would be clearer if it explicitly stated the success metric (e.g., probability of reaching the target state) and the number of training episodes or convergence threshold used.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below with clarifications and indicate where revisions will be made to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract / Numerical demonstrations] Abstract and numerical demonstrations: the central claim of a 'general approach' for systems with 130 states and degenerate transitions rests on unshown implementation specifics, including error bars, convergence criteria, comparison baselines, and handling of post-hoc choices in RL training; without these the reported performance cannot be evaluated.
Authors: We agree that additional implementation details are required for rigorous evaluation of the numerical results. The full manuscript describes the RL agent, QMDP formulation, and reward function in the Methods and supplementary sections, but we will revise the main text and figures to explicitly report: (i) error bars obtained from multiple independent training runs with different random seeds, (ii) convergence criteria (e.g., stabilization of average cumulative reward over a sliding window of episodes together with the maximum episode count), (iii) performance baselines including random pulse sequences and non-physics-informed RL variants, and (iv) documentation of hyperparameter selection and any post-hoc analysis choices. These additions will be incorporated in the revised manuscript. revision: yes
-
Referee: [Quantum Markov decision process modeling] Quantum Markov decision process modeling: the claim that the QMDP plus physics-informed reward accurately captures projective collapse, pulse-driven transitions, and environmental noise for degenerate transitions is load-bearing for transferability, yet all results are generated inside this model with no validation against independent dynamics or real-device behavior.
Authors: The QMDP is constructed from standard quantum-optical models of coherent pulse driving, projective quantum-logic measurements, and thermal radiation noise, with transition rates and degeneracies taken from ab initio quantum-chemistry calculations and known spectroscopic constants for H3O+ and CaH+. The physics-informed reward explicitly accounts for the target-state fidelity while penalizing population leakage into degenerate inversion-doublet states. We acknowledge that all demonstrations remain within this model and that external validation against independent solvers or laboratory data is absent. In revision we will expand the Discussion section to articulate model assumptions, parameter sources, and expected discrepancies with real devices, thereby clarifying the scope of transferability claims. Real-device validation lies beyond the present theoretical study. revision: partial
- Validation of the QMDP model against independent dynamics solvers or real experimental/device behavior, which is outside the scope of this purely numerical theoretical proposal.
Circularity Check
No significant circularity in RL-QLS numerical demonstration
full rationale
The paper presents a reinforcement-learning method (RL-QLS) to optimize pulse sequences for preparing molecular ions in pure states, using a quantum Markov decision process model and physics-informed reward. Numerical results for H3O+ (130 states) and CaH+ are direct outputs of running the RL agent inside this explicitly defined model. No load-bearing step reduces by the paper's own equations or self-citation to its inputs by construction; the reward function and QMDP are model definitions, not fitted parameters renamed as predictions. The framework combines standard quantum mechanics, AMO techniques, and RL optimization without invoking uniqueness theorems or ansatzes from prior self-work. This is a self-contained simulation-based method paper whose claims are falsifiable against external laboratory benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Molecular energy levels and transition matrix elements can be accurately computed from quantum chemistry methods.
- domain assumption Projective measurements can be modeled as instantaneous collapses in the quantum Markov decision process.
Reference graph
Works this paper leans on
-
[1]
(b) Mean number of steps to prepare a pure molecular state for different purity thresholds, with the effective BBR temperatures at 10 and 100 K. Fig. 2 presents the usage of the RL-QLS approach189 for state preparation. For illustration purposes, ini-190 tially we consider only the J f 2 manifolds of CaH +191 to match the NIST experiments [ 18, 19, 21] (Fig...
-
[2]
M. Safronova, D. Budker, D. DeMille, D. F. J. Kim-388 ball, A. Derevianko, and C. W. Clark, Search for389 new physics with atoms and molecules, Reviews of390 Modern Physics 90, 025008 (2018) .391
work page 2018
-
[3]
D. DeMille, N. R. Hutzler, A. M. Rey, and392 T. Zelevinsky, Quantum sensing and metrology for393 fundamental physics with molecules, Nature Physics394 20, 741 (2024) .395
work page 2024
-
[4]
M. Kozlov and S. Levshakov, Sensitivity of the396 H3O+ inversion–rotational spectrum to changes in397 the electron-to-proton mass ratio, The Astrophysi-398 cal Journal 726, 65 (2010) .399
work page 2010
-
[5]
V. Letokhov, On difference of energy levels of400 left and right molecules due to weak interactions,401 Physics Letters A 53, 275 (1975) .402
work page 1975
- [6]
-
[7]
A. Landau, E. Eduardus, D. Behar, E. R. Wal-407 lach, L. F. Pašteka, S. Faraji, A. Borschevsky, and408 Y. Shagam, Chiral molecule candidates for trapped409 ion spectroscopy by ab initio calculations: From410 state preparation to parity violation, J. Chem. Phys.411 159, 114307 (2023) .412
work page 2023
- [8]
-
[9]
D. Patterson, Method for preparation and readout416 of polyatomic molecules in single quantum states,417 Physical Review A 97, 033403 (2018) .418
work page 2018
-
[10]
E. R. Hudson, Sympathetic cooling of molecular ions419 with ultracold atoms, EPJ Techniques and Instru-420 mentation 3, 1 (2016) .421
work page 2016
-
[11]
D. McCarron, M. Steinecker, Y. Zhu, and D. De-422 Mille, Magnetic trapping of an ultracold gas of polar423 molecules, Phys. Rev. Lett. 121, 013202 (2018) .424
work page 2018
-
[12]
S. Ospelkaus, K.-K. Ni, G. Quéméner, B. Neyenhuis,425 D. Wang, M. H. G. de Miranda, J. L. Bohn, J. Ye,426 and D. S. Jin, Controlling the hyperfine state of427 rovibronic ground-state polar molecules, Phys. Rev.428 Lett. 104, 030402 (2010) .429
work page 2010
-
[13]
B. L. Augenbraun, J. M. Doyle, T. Zelevinsky, and430 I. Kozyryev, Molecular asymmetry and optical cy-431 cling: laser cooling asymmetric top molecules, Phys.432 Rev. X 10, 031022 (2020) .433
work page 2020
-
[14]
Y. Zeng, A. Jadbabaie, A. N. Patel, P. Yu, T. C.434 Steimle, and N. R. Hutzler, Optical cycling in poly-435 atomic molecules with complex hyperfine structure,436 Phys. Rev. A 108, 012813 (2023) .437
work page 2023
-
[15]
C. E. Dickerson, A. N. Alexandrova, P. Narang,438 and J. P. Philbin, Single molecule superra-439 diance for optical cycling, arXiv:2310.01534440 10.48550/arXiv.2310.01534 (2023).441
-
[16]
P. O. Schmidt, T. Rosenband, C. Langer, W. M.442 Itano, J. C. Bergquist, and D. J. Wineland, Spec-443 troscopy using quantum logic, Science 309, 749444 (2005).445
work page 2005
-
[17]
D. Leibfried, Quantum state preparation and con-446 trol of single molecular ions, New Journal of Physics447 14, 023029 (2012) .448
work page 2012
-
[18]
S. Ding and D. Matsukevich, Quantum logic for the449 control and manipulation of molecular ions using a450 frequency comb, New Journal of Physics 14, 023028451 (2012).452
work page 2012
-
[19]
C.-w. Chou, C. Kurz, D. B. Hume, P. N. Plessow,453 D. R. Leibrandt, and D. Leibfried, Preparation and454 coherent manipulation of pure quantum states of a455 single molecular ion, Nature 545, 203 (2017) .456
work page 2017
-
[20]
Y. Lin, D. R. Leibrandt, D. Leibfried, and C.-w.457 Chou, Quantum entanglement between an atom and458 a molecule, Nature 581, 273 (2020) .459
work page 2020
-
[21]
C.-w. Chou, A. L. Collopy, C. Kurz, Y. Lin, M. E.460 Harding, P. N. Plessow, T. Fortier, S. Diddams,461 D. Leibfried, and D. R. Leibrandt, Frequency-comb462 spectroscopy on pure quantum states of a single463 molecular ion, Science 367, 1458 (2020) .464
work page 2020
-
[22]
Y. Liu, J. Schmidt, Z. Liu, D. R. Leibrandt,465 D. Leibfried, and C.-w. Chou, Quantum state track-466 ing and control of a single molecular ion in a thermal467 environment, Science 385, 790 (2024) .468
work page 2024
-
[23]
D. Holzapfel, F. Schmid, N. Schwegler, O. Stadler,469 M. Stadler, A. Ferk, J. P. Home, and D. Kien-470 zler, Quantum control of a single H + 2 molecular471 ion, arXiv:2409.06495 10.48550/arXiv.2409.06495472 (2024).473
- [24]
-
[25]
F. Wolf, Y. Wan, J. C. Heip, F. Gebert, C. Shi,478 and P. O. Schmidt, Non-destructive state detection479 for quantum logic spectroscopy of molecular ions,480 Nature 530, 457 (2016) .481
work page 2016
- [26]
- [27]
-
[28]
J. Mackeprang, D. B. R. Dasari, and J. Wrachtrup,491 A reinforcement learning approach for quantum492 state engineering, Quantum Machine Intelligence 2,493 1 (2020) .494
work page 2020
-
[29]
I. Paparelle, L. Moro, and E. Prati, Digitally stimu-495 lated raman passage by deep reinforcement learning,496 Physics Letters A 384, 126266 (2020) .497
work page 2020
-
[30]
M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and498 H. Neven, Universal quantum control through deep499 reinforcement learning, npj Quantum Information500 5, 33 (2019) .501
work page 2019
- [31]
-
[32]
R. S. Sutton and A. G. Barto, Reinforcement learn-507 ing: An introduction (MIT press, 2018).508
work page 2018
-
[34]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu,513 J. Veness, M. G. Bellemare, A. Graves, M. Ried-514 miller, A. K. Fidjeland, G. Ostrovski, et al., Human-515 level control through deep reinforcement learning,516 Nature 518, 529 (2015) .517
work page 2015
-
[35]
C. J. Watkins, Learning from delayed rewards, Ph.D.518 thesis, King’s College, Cambridge United Kingdom519 (1989).520
work page 1989
-
[36]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
A. Paszke, S. Gross, F. Massa, A. Lerer,521 J. Bradbury, G. Chanan, T. Killeen, Z. Lin,522 N. Gimelshein, L. Antiga, et al. , Pytorch: An im-523 perative style, high-performance deep learning li-524 brary, arXiv:1912.01703 10.48550/arXiv:1912.01703525 (2019).526
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv:1912.01703525 1912
-
[37]
D. Chaffee, B. Margulis, A. Sheffield, J. Schmidt,527 A. Reisenfeld, D. R. Leibrandt, D. Leibfried, and528 C.-W. Chou, High-fidelity quantum state control529 of a polar molecular ion in a cryogenic environ-530 ment, arXiv:2506.14740 10.48550/arXiv.2506.14740531 (2025).532
- [38]
- [39]
-
[40]
M. L. Littman, A. R. Cassandra, and L. P. Kael-539 bling, Learning policies for partially observable en-540 vironments: Scaling up, in Machine Learning Pro-541 ceedings (Elsevier, 1995) pp. 362–370.542
work page 1995
-
[41]
Molecular Quantum Control Algorithm Design by Reinforcement Learning
M. A. Nielsen and I. L. Chuang, Quantum computa-543 tion and quantum information (Cambridge univer-544 sity press, 2010).545 1 Supplementary Material for “Molecular Quantum Control Algorithm Design by Reinforcement Learning” Anastasia Pipi ∗ , Xuecheng Tao ∗, , Arianna Wu, Prineha Narang !, and David R. Leibrandt § Contact author: xuechengtao@gmail.com. C...
work page 2010
-
[42]
C.-w. Chou, C. Kurz, D. B. Hume, P. N. Plessow, D. R. Leibrandt, and D. Leibfried, Nature 545, 203 (2017)
work page 2017
-
[43]
J. R. Johansson, P. D. Nation, and F. Nori, Computer physics communications 183, 1760 (2012)
work page 2012
-
[44]
J. Johansson, P. Nation, and F. Nori, Computer Physics Communications 184, 1234 (2013)
work page 2013
-
[45]
Y. Liu, J. Schmidt, Z. Liu, D. R. Leibrandt, D. Leibfried, and C.-w. Chou, Science 385, 790 (2024)
work page 2024
-
[46]
A. L. Collopy, J. Schmidt, D. Leibfried, D. R. Leibrandt, and C.-W. Chou, Physical Review Letters 130, 223201 (2023)
work page 2023
-
[47]
C. H. Townes and A. L. Schawlow, Microwave spectroscopy (McGRA W-HILL BOOK COMPANY, 1955)
work page 1955
- [48]
-
[49]
Corney, Atomic and laser spectroscopy (Clarendon Press, 1977)
A. Corney, Atomic and laser spectroscopy (Clarendon Press, 1977)
work page 1977
-
[50]
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, ...
work page 2020
-
[51]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, arXiv preprint arXiv:1707.06347 10.48550/arXiv.1707.06347 (2017)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
-
[52]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, arXiv:1312.5602 10.48550/arXiv.1312.5602 (2013)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1312.5602 2013
-
[53]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. , Nature 518, 529 (2015) . 7
work page 2015
-
[54]
C. J. Watkins, Learning from delayed rewards, Ph.D. thesis, King’s College, Cambridge United Kingdom (1989)
work page 1989
-
[55]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , arXiv:1912.01703 10.48550/arXiv:1912.01703 (2019)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv:1912.01703 1912
-
[56]
J. Huerta-Cepas, F. Serra, and P. Bork, Molecular biology and evolution 33, 1635 (2016). 8 Sec. SE. Supplementary Figures and T ables 0.0 0.2 0.4 0.6 0.8 1.0 Time (ms) 0.0 0.2 0.4 0.6 0.8 1.0Probability Rabi Oscillations J=1 (no damping) |1,1.5,+ |1,0.5,+ Experimental data 0.0 0.2 0.4 0.6 0.8 1.0 Time (ms) 0.0 0.2 0.4 0.6 0.8 1.0Probability Rabi Oscillati...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.