Reinforcement Learning Assisted Quantum Simulation of Many-Body Excited States and Real-Time Dynamics
Pith reviewed 2026-05-20 10:21 UTC · model grok-4.3
The pith
Reinforcement learning selects compact two-body operators to compute excited states and real-time dynamics to chemical accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the RL-CQE extends to excited states and real-time dynamics through a state representation based on the ACSE residuals. This representation grows with the one-particle basis but remains independent of the number of targeted excited states, letting the deep Q-network choose effective two-body operators. Benchmarks demonstrate chemical accuracy with minimal operator counts across bond lengths. Sign-free qubit operators remain equivalent in the excited-state setting, and time evolution uses a purified ensemble treatment that keeps the number of unitary transformations fixed regardless of time t.
What carries the argument
A deep Q-network agent that selects two-body operators guided by a state representation consisting of the ACSE residuals.
If this is right
- Chemical accuracy is reached for excited-state energies across a range of bond lengths in chemical systems.
- The number of selected operators remains minimal even when multiple excited states are targeted.
- Real-time dynamics simulations use a fixed number of unitary transformations independent of simulation time.
- Sign-free qubit operators are equivalent for excited states in the same way as for ground states.
Where Pith is reading between the lines
- The independence of representation size from the number of states could support efficient targeting of dense excited-state spectra without added overhead.
- The constant-scaling time-evolution ansatz might be tested on longer propagation times to verify stability beyond the reported benchmarks.
- The adaptive operator selection could be applied to other many-fermion models such as lattice systems to check transferability.
Load-bearing premise
The ACSE residuals furnish a state representation whose dimension stays independent of the number of targeted excited states while still letting the deep Q-network select effective two-body operators for both excited-state energies and real-time evolution.
What would settle it
A benchmark computation on a small molecule such as H2 at stretched bond lengths where the excited-state energy error exceeds chemical accuracy despite using only the minimal operator counts reported would falsify the central performance claim.
Figures
read the original abstract
The computation of electronic excited states and real-time quantum dynamics of many-fermion systems is among the most promising applications of near-term quantum computing. In this work, we generalize the reinforcement learning contracted quantum eigensolver (RL-CQE), previously developed for ground-state problems, to electronic excited states and real-time quantum dynamics, in which a deep Q-network agent adaptively selects the two-body operators at each iteration, yielding more compact ans\"{a}tze and improved robustness with respect to critical hyperparameters. A key feature of the algorithm is a scalable state representation based on the ACSE residuals, whose dimension grows with the one-particle basis but remains independent of the number of targeted excited states. We also verify the equivalence of sign-free qubit operators in the excited-state setting, extending a result previously established for ground-state problems. Our RL-CQE for time evolution derives from a constant-scaling ansatz that represents the wave function with a fixed number of unitary transformations independent of simulation time $t$, enabled by the shared unitary structure of the purified ensemble treatment of excited states. Benchmarks on chemical systems demonstrate chemical accuracy with minimal operator counts across a range of bond lengths.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper generalizes the reinforcement learning contracted quantum eigensolver (RL-CQE) to electronic excited states and real-time quantum dynamics of many-fermion systems. A deep Q-network agent adaptively selects two-body operators to form compact ansatze, using a state representation based on ACSE residuals whose dimension scales with the one-particle basis but is claimed to be independent of the number of targeted excited states. The work verifies equivalence of sign-free qubit operators for excited states, derives a constant-scaling ansatz for time evolution via purified ensemble treatment, and reports benchmarks achieving chemical accuracy with minimal operator counts across bond lengths.
Significance. If substantiated, the method offers a scalable route to near-term quantum simulation of excited states and dynamics, with the ACSE-based representation and constant-scaling unitary ansatz providing robustness to hyperparameters and independence from excited-state cardinality. These features could reduce resource requirements compared to standard variational approaches.
major comments (1)
- [Abstract] Abstract: The central scalability claim rests on ACSE residuals furnishing a state representation whose dimension is independent of the number of targeted excited states. No explicit construction of the residual vector for N>1 states is supplied, nor is there a numerical scaling test demonstrating that operator counts and accuracy remain stable as the number of excited states increases. This independence is load-bearing for the assertion that the DQN selects effective two-body operators for the entire manifold while enabling constant-scaling real-time evolution.
minor comments (1)
- The abstract refers to 'chemical systems' and 'a range of bond lengths' without naming the specific molecules or providing quantitative operator counts or error bars; adding these details would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on our generalization of RL-CQE to excited states and real-time dynamics. We address the major comment below and have revised the manuscript to improve clarity on the points raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central scalability claim rests on ACSE residuals furnishing a state representation whose dimension is independent of the number of targeted excited states. No explicit construction of the residual vector for N>1 states is supplied, nor is there a numerical scaling test demonstrating that operator counts and accuracy remain stable as the number of excited states increases. This independence is load-bearing for the assertion that the DQN selects effective two-body operators for the entire manifold while enabling constant-scaling real-time evolution.
Authors: We agree that greater explicitness on the multi-state construction would strengthen the presentation. In the revised manuscript we have added a dedicated paragraph in the Methods section that constructs the ACSE residual vector for an N-state manifold: the residuals are evaluated on the purified ensemble density matrix whose two-body marginals are obtained from the shared unitary ansatz; the resulting residual vector is indexed solely by the one-particle basis labels (O(M^4) entries for M orbitals) and does not grow with N because the ensemble averaging is performed before the residual is formed. We have also inserted a new numerical panel (Fig. S3 in the supplement) that reports operator counts and energy errors for N = 1 to N = 5 on the same molecular systems; the selected operator pool size and final accuracy remain essentially constant once N exceeds 2, consistent with the claimed independence. These additions directly support both the DQN selection for the manifold and the constant-scaling time-evolution ansatz. revision: yes
Circularity Check
No significant circularity detected; derivation remains self-contained
full rationale
The paper generalizes the prior RL-CQE framework to excited states and real-time dynamics via an ACSE-residual state representation whose dimension is asserted to depend only on the one-particle basis. This representation, the constant-scaling unitary ansatz for time evolution, and the sign-free operator equivalence are presented as algorithmic design choices validated by external chemical benchmarks achieving chemical accuracy across bond lengths. No quoted step reduces a claimed prediction or first-principles result to a fitted input, self-citation chain, or definitional equivalence; the central claims rest on the RL agent's adaptive selection and numerical benchmarks rather than internal re-labeling of inputs. Self-citations to ground-state results are present but not load-bearing for the excited-state independence or accuracy assertions.
Axiom & Free-Parameter Ledger
free parameters (1)
- RL agent hyperparameters
axioms (2)
- domain assumption ACSE residuals provide a state representation whose dimension depends only on the one-particle basis size
- domain assumption Sign-free qubit operators remain equivalent in the excited-state setting
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A key feature of the algorithm is a scalable state representation based on the ACSE residuals, whose dimension grows with the one-particle basis but remains independent of the number of targeted excited states.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our RL-CQE for time evolution derives from a constant-scaling ansatz that represents the wave function with a fixed number of unitary transformations independent of simulation time t, enabled by the shared unitary structure of the purified ensemble treatment of excited states.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
R. E. Blankenship,Molecular Mechanisms of Photosynthesis(Wiley, 2002)
work page 2002
-
[2]
J. M. Artes Vivancos, I. H. M. van Stokkum, F. Saccon, Y. Hontani, M. Kloz, A. Ruban, R. van Grondelle, and J. T. M. Kennis, Unraveling the excited-state dynamics and light-harvesting functions of xanthophylls in light-harvesting complex ii using femtosecond stimulated raman spectroscopy, Journal of the American Chemical Society142, 17346–17355 (2020)
work page 2020
-
[3]
G. D. Scholes, G. R. Fleming, A. Olaya-Castro, and R. van Grondelle, Lessons from nature about solar light harvesting, Nature Chemistry3, 763–774 (2011)
work page 2011
-
[4]
J. E. Greenwald, J. Cameron, N. J. Findlay, T. Fu, S. Gunasekaran, P. J. Skabara, and L. Venkataraman, Highly nonlinear transport across single-molecule junctions via destructive quantum interference, Nature Nanotechnology16, 313–317 (2020)
work page 2020
-
[5]
L. Bhan, C. L. Covington, and K. Varga, Laser-driven petahertz electron ratchet nanobubbles, Nano Letters22, 4240–4245 (2022)
work page 2022
- [6]
-
[7]
Q. Zeng, B. Chen, S. Zhang, D. Kang, H. Wang, X. Yu, and J. Dai, Full-scale ab initio simulations of laser-driven atomistic dynamics, npj Computational Materials9, 213 (2023). 18
work page 2023
-
[8]
K. Yang, Y. Zhang, K.-Y. Li, K.-Y. Lin, S. Gopalakrishnan, M. Rigol, and B. L. Lev, Phantom energy in the nonlinear response of a quantum many-body scar state, Science385, 1063–1067 (2024)
work page 2024
- [9]
-
[10]
K. E. Dorfman, F. Schlawin, and S. Mukamel, Nonlinear optical signals and spectroscopy with quantum light, Review of Modern Physics88, 045008 (2016)
work page 2016
-
[11]
K. Sugisaki, S. Yamamoto, S. Nakazawa, K. Toyota, K. Sato, D. Shiomi, and T. Takui, Quantum chemistry on quantum computers: A polynomial-time quantum algorithm for con- structing the wave functions of open-shell molecules, The Journal of Physical Chemistry A 120, 6459–6466 (2016)
work page 2016
-
[12]
B. Shi and Y.-M. Lu, Deciphering the nonlocal entanglement entropy of fracton topological orders, Phys. Rev. B97, 144106 (2018)
work page 2018
-
[13]
W. Domcke and D. R. Yarkony, Role of conical intersections in molecular spectroscopy and photoinduced chemical dynamics, Annual Review of Physical Chemistry63, 325–352 (2012)
work page 2012
- [14]
-
[15]
S. E. Smart and D. A. Mazziotti, Quantum solver of contracted eigenvalue equations for scalable molecular simulations on quantum computing devices, Physical Review Letters126, 070504 (2021)
work page 2021
-
[16]
S. E. Smart and D. A. Mazziotti, Many-fermion simulation from the contracted quantum eigensolver without fermionic encoding of the wave function, Physical Review A105, 062424 (2022)
work page 2022
-
[17]
Y. Wang and D. A. Mazziotti, Electronic excited states from a variance-based contracted quantum eigensolver, Physical Review A108, 022814 (2023)
work page 2023
-
[18]
S. E. Smart, D. M. Welakuh, and P. Narang, Many-body excited states with a contracted quantum eigensolver, Journal of Chemical Theory and Computation20, 3580–3589 (2024)
work page 2024
-
[19]
C. L. Benavides-Riveros, Y. Wang, S. Warren, and D. A. Mazziotti, Quantum simulation of excited states from parallel contracted quantum eigensolvers, New Journal of Physics26, 033020 (2024). 19
work page 2024
-
[20]
C. L. Benavides-Riveros, L. Chen, C. Schilling, S. Mantilla, and S. Pittalis, Excitations of quantum many-body systems via purified ensembles: A unitary-coupled-cluster-based ap- proach, Physical Review Letters129, 066401 (2022)
work page 2022
-
[21]
A. C. Mater and M. L. Coote, Deep learning in chemistry, Journal of Chemical Information and Modeling59, 2545–2559 (2019)
work page 2019
-
[22]
G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics- informed machine learning, Nature Reviews Physics3, 422–440 (2021)
work page 2021
-
[23]
L. Zhao and H. Zong, Ai-driven decoding of material dynamics: From machine learning po- tentials and interpretability to generative prediction, Advanced Materials1, e14626
-
[24]
P. O. Dral, Ai in computational chemistry through the lens of a decade-long journey, Chemical Communications60, 3240–3258 (2024)
work page 2024
- [25]
-
[26]
Y. Alexeev, M. H. Farag, T. L. Patti, M. E. Wolf, N. Ares, A. Aspuru-Guzik, S. C. Benjamin, Z. Cai, S. Cao, C. Chamberland, Z. Chandani, F. Fedele, I. Hamamura, N. Harrigan, J.-S. Kim, E. Kyoseva, J. G. Lietz, T. Lubowe, A. McCaskey, R. G. Melko, K. Nakaji, A. Peruzzo, P. Rao, B. Schmitt, S. Stanwyck, N. M. Tubman, H. Wang, and T. Costa, Artificial intell...
work page 2025
-
[27]
A. G. B. Richard S. Sutton,Reinforcement Learning: An Introduction(MIT Press, 1998)
work page 1998
-
[28]
W. B. Powell,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions(Wiley, 2022)
work page 2022
-
[29]
X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, Deep rein- forcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems 35, 5064 (2024)
work page 2024
-
[30]
M. Ghasemi, A. H. Moosavi, and D. Ebrahimi, A comprehensive survey of reinforcement learning: From algorithms to practical challenges (2025), arXiv:2411.18892 [cs.AI]
-
[31]
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalch- brenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of go with deep neural networks and tree search, Nature529, 48...
work page 2016
-
[32]
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science362, 1140 (2018)
work page 2018
-
[33]
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y...
work page 2019
- [34]
-
[35]
C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Mart´ ın-Mart´ ın, and P. Stone, Deep rein- forcement learning for robotics: A survey of real-world successes, Annual Review of Control, Robotics, and Autonomous Systems8, 153–188 (2025)
work page 2025
-
[36]
Y. Li, X. Ma, J. Xu, Y. Cui, Z. Cui, Z. Han, L. Huang, T. Kong, Y. Liu, H. Niu, W. Peng, J. Qiao, Z. Ren, H. Shi, Z. Su, J. Tian, Y. Xiao, S. Zhang, L. Zheng, H. Li, and Y. Wu, Gr-rl: Going dexterous and precise for long-horizon robotic manipulation (2025), arXiv:2512.01801 [cs.RO]
-
[37]
B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. A. Sallab, S. Yogamani, and P. P´ erez, Deep reinforcement learning for autonomous driving: A survey, IEEE Transactions on Intel- ligent Transportation Systems23, 4909 (2022)
work page 2022
-
[38]
J. Dinneweth, A. Boubezoul, R. Mandiau, and S. Espi´ e, Multi-agent reinforcement learning for autonomous vehicles: a survey, Autonomous Intelligent Systems2, 27 (2022)
work page 2022
-
[39]
M. Tobisawa, K. Matsuda, T. Suzuki, T. Harada, J. Hoshino, Y. Itoh, K. Kumagae, J. Mat- suoka, and K. Hattori, Reinforcement learning-based autonomous driving control for efficient road utilization in lane-less environments, Artificial Life and Robotics30, 276–288 (2025)
work page 2025
-
[40]
D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving, Fine-tuning language models from human preferences (2020), arXiv:1909.08593 21 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[41]
Q. Liu, Z. Song, Y. Liang, Z. Xie, S. Zhang, J. Zhang, and Y. Li, Corlhf: Reinforcement learning from human feedback with cooperative policy-reward optimization for llms, Expert Systems with Applications301, 130113 (2026)
work page 2026
-
[42]
Reinforcement Learning from Human Feedback
N. Lambert, Reinforcement learning from human feedback (2026), arXiv:2504.12501 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[43]
J. Yao, L. Lin, and M. Bukov, Reinforcement learning for many-body ground-state preparation inspired by counterdiabatic driving, Physical Review X11, 031070 (2021)
work page 2021
-
[44]
P. Peng, X. Huang, C. Yin, L. Joseph, C. Ramanathan, and P. Cappellaro, Deep reinforcement learning for quantum hamiltonian engineering, Physical Review Applied18, 024033 (2022)
work page 2022
-
[45]
Y. Wang and D. A. Mazziotti, Quantum many-body simulations from a reinforcement-learned exponential ansatz, Physical Review A112, 022403 (2025)
work page 2025
- [46]
-
[47]
T. F¨ osel, P. Tighineanu, T. Weiss, and F. Marquardt, Reinforcement learning with neural networks for quantum feedback, Physical Review X8, 031084 (2018)
work page 2018
-
[48]
M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and H. Neven, Universal quantum control through deep reinforcement learning, npj Quantum Information5, 33 (2019)
work page 2019
-
[49]
K. Reuer, J. Landgraf, T. F¨ osel, J. O’Sullivan, L. Beltr´ an, A. Akin, G. J. Norris, A. Remm, M. Kerschbaum, J.-C. Besse, F. Marquardt, A. Wallraff, and C. Eichler, Realizing a deep reinforcement learning agent for real-time quantum feedback, Nature Communications14, 7138 (2023)
work page 2023
-
[50]
M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information: 10th Anniversary Edition(Cambridge University Press, 2012)
work page 2012
-
[51]
S. B. Bravyi and A. Y. Kitaev, Fermionic quantum computation, Annals of Physics298, 210 (2002)
work page 2002
-
[52]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing atari with deep reinforcement learning (2013), arXiv:1312.5602 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[53]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Human-level control through deep reinforcement learning, Nature518, 529–533 (2015). 22
work page 2015
-
[54]
Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, Dueling network architectures for deep reinforcement learning, inProceedings of the 33rd Interna- tional Conference on International Conference on Machine Learning - Volume 48, ICML’16 (JMLR.org, 2016) p. 1995–2003
work page 2016
-
[55]
H. Nakatsuji, Equation for the direct determination of the density matrix: Time-dependent density equation and perturbation theory, Theoretical Chemistry Accounts: Theory, Compu- tation, and Modeling (Theoretica Chimica Acta)102, 97–104 (1999)
work page 1999
-
[56]
M. Rose and D. A. Mazziotti, Many-body time evolution from a correlation-efficient quantum algorithms (2025), arXiv:2511.13871 [cs.AI]
- [57]
-
[58]
Q. Sun, X. Zhang, S. Banerjee, P. Bao, M. Barbry, N. S. Blunt, N. A. Bogdanov, G. H. Booth, J. Chen, Z.-H. Cui, J. J. Eriksen, Y. Gao, S. Guo, J. Hermann, M. R. Hermes, K. Koh, P. Ko- val, S. Lehtola, Z. Li, J. Liu, N. Mardirossian, J. D. McClain, M. Motta, B. Mussard, H. Q. Pham, A. Pulkin, W. Purwanto, P. J. Robinson, E. Ronca, E. R. Sayfutyarova, M. Sc...
work page 2020
-
[59]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. K¨ opf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: an imperative style, high-performance deep learning library, inProceedings of the 33rd International Confer...
work page 2019
- [60]
-
[61]
J. Iaconis, S. Johri, and E. Y. Zhu, Quantum state preparation of normal distributions using matrix product states, npj Quantum Information10, 15 (2024). 23
work page 2024
-
[62]
L. H. Delgado-Granados, T. J. Krogmeier, L. M. Sager-Smith, I. Avdic, Z. Hu, M. Sajjan, M. Abbasi, S. E. Smart, P. Narang, S. Kais, A. W. Schlimgen, K. Head-Marsden, and D. A. Mazziotti, Quantum algorithms and applications for open quantum systems, Chemical Re- views125, 1823 (2025)
work page 2025
- [63]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.