Reinforcement Learning Assisted Quantum Simulation of Many-Body Excited States and Real-Time Dynamics

Carlos L. Benavides-Riveros; Jiaji Zhang; Lipeng Chen

arxiv: 2605.18569 · v1 · pith:YIXRAAKAnew · submitted 2026-05-18 · 🪐 quant-ph · physics.chem-ph

Reinforcement Learning Assisted Quantum Simulation of Many-Body Excited States and Real-Time Dynamics

Jiaji Zhang , Lipeng Chen , Carlos L. Benavides-Riveros This is my paper

Pith reviewed 2026-05-20 10:21 UTC · model grok-4.3

classification 🪐 quant-ph physics.chem-ph

keywords reinforcement learningexcited statesreal-time dynamicsACSE residualsquantum simulationmany-fermion systemschemical accuracy

0 comments

The pith

Reinforcement learning selects compact two-body operators to compute excited states and real-time dynamics to chemical accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper generalizes the reinforcement learning contracted quantum eigensolver to electronic excited states and real-time quantum dynamics of many-fermion systems. A deep Q-network agent adaptively selects two-body operators at each iteration to form more compact ansatze. The state representation uses ACSE residuals whose dimension grows only with the one-particle basis and stays independent of the number of targeted excited states. Benchmarks on chemical systems reach chemical accuracy with minimal operator counts across bond lengths. The approach also yields a constant-scaling ansatz for time evolution that uses a fixed number of unitaries independent of simulation time.

Core claim

The central claim is that the RL-CQE extends to excited states and real-time dynamics through a state representation based on the ACSE residuals. This representation grows with the one-particle basis but remains independent of the number of targeted excited states, letting the deep Q-network choose effective two-body operators. Benchmarks demonstrate chemical accuracy with minimal operator counts across bond lengths. Sign-free qubit operators remain equivalent in the excited-state setting, and time evolution uses a purified ensemble treatment that keeps the number of unitary transformations fixed regardless of time t.

What carries the argument

A deep Q-network agent that selects two-body operators guided by a state representation consisting of the ACSE residuals.

If this is right

Chemical accuracy is reached for excited-state energies across a range of bond lengths in chemical systems.
The number of selected operators remains minimal even when multiple excited states are targeted.
Real-time dynamics simulations use a fixed number of unitary transformations independent of simulation time.
Sign-free qubit operators are equivalent for excited states in the same way as for ground states.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The independence of representation size from the number of states could support efficient targeting of dense excited-state spectra without added overhead.
The constant-scaling time-evolution ansatz might be tested on longer propagation times to verify stability beyond the reported benchmarks.
The adaptive operator selection could be applied to other many-fermion models such as lattice systems to check transferability.

Load-bearing premise

The ACSE residuals furnish a state representation whose dimension stays independent of the number of targeted excited states while still letting the deep Q-network select effective two-body operators for both excited-state energies and real-time evolution.

What would settle it

A benchmark computation on a small molecule such as H2 at stretched bond lengths where the excited-state energy error exceeds chemical accuracy despite using only the minimal operator counts reported would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2605.18569 by Carlos L. Benavides-Riveros, Jiaji Zhang, Lipeng Chen.

**Figure 2.** Figure 2: FIG. 2. Norm of CSE residual [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. The energy eigenvalues obtained from RL-CQE with different weight vectors. Using weight [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. (a) The energy eigenvalues of linear H [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. The time-dependent fidelity [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

read the original abstract

The computation of electronic excited states and real-time quantum dynamics of many-fermion systems is among the most promising applications of near-term quantum computing. In this work, we generalize the reinforcement learning contracted quantum eigensolver (RL-CQE), previously developed for ground-state problems, to electronic excited states and real-time quantum dynamics, in which a deep Q-network agent adaptively selects the two-body operators at each iteration, yielding more compact ans\"{a}tze and improved robustness with respect to critical hyperparameters. A key feature of the algorithm is a scalable state representation based on the ACSE residuals, whose dimension grows with the one-particle basis but remains independent of the number of targeted excited states. We also verify the equivalence of sign-free qubit operators in the excited-state setting, extending a result previously established for ground-state problems. Our RL-CQE for time evolution derives from a constant-scaling ansatz that represents the wave function with a fixed number of unitary transformations independent of simulation time $t$, enabled by the shared unitary structure of the purified ensemble treatment of excited states. Benchmarks on chemical systems demonstrate chemical accuracy with minimal operator counts across a range of bond lengths.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Extends RL-CQE to excited states and dynamics with ACSE residuals for claimed scalable representation, but the independence from state count lacks explicit checks.

read the letter

The main point is that this generalizes the earlier RL-CQE method from ground states to excited states and real-time dynamics. A DQN picks two-body operators at each step for more compact ansatze, and the state representation uses ACSE residuals whose size depends on the one-particle basis but is presented as independent of how many excited states are targeted. They also introduce a constant-scaling ansatz for time evolution that keeps the number of unitaries fixed regardless of simulation time, drawing on the purified ensemble treatment. The sign-free qubit operator equivalence is checked for the excited-state case as well. Benchmarks on chemical systems reach chemical accuracy with low operator counts across bond lengths, which is the practical result that stands out. The adaptive operator selection and the constant-scaling feature for dynamics are the clearest additions over the prior ground-state work. The soft spot is the central scalability claim. The abstract states that the ACSE residual vector stays fixed in dimension as the number of excited states grows, yet there is no detailed construction for N greater than 1 or a numerical test showing that operator counts and accuracy remain stable when more states are added. Without that, the argument that the representation encodes the full manifold efficiently rests on the assertion rather than demonstration. The benchmarks look reasonable on the systems shown, but the lack of scaling data for the excited manifold is the part that needs tightening. This is for groups working on variational methods for quantum chemistry on near-term hardware, especially those already using RL for ansatz construction. A reader focused on excited-state algorithms or time evolution would find the operator selection and constant-scaling ansatz worth examining. It deserves a serious referee because the algorithmic extension is concrete and the reported accuracies are in a useful range, even if the scaling details require more evidence in revision.

Referee Report

1 major / 1 minor

Summary. The paper generalizes the reinforcement learning contracted quantum eigensolver (RL-CQE) to electronic excited states and real-time quantum dynamics of many-fermion systems. A deep Q-network agent adaptively selects two-body operators to form compact ansatze, using a state representation based on ACSE residuals whose dimension scales with the one-particle basis but is claimed to be independent of the number of targeted excited states. The work verifies equivalence of sign-free qubit operators for excited states, derives a constant-scaling ansatz for time evolution via purified ensemble treatment, and reports benchmarks achieving chemical accuracy with minimal operator counts across bond lengths.

Significance. If substantiated, the method offers a scalable route to near-term quantum simulation of excited states and dynamics, with the ACSE-based representation and constant-scaling unitary ansatz providing robustness to hyperparameters and independence from excited-state cardinality. These features could reduce resource requirements compared to standard variational approaches.

major comments (1)

[Abstract] Abstract: The central scalability claim rests on ACSE residuals furnishing a state representation whose dimension is independent of the number of targeted excited states. No explicit construction of the residual vector for N>1 states is supplied, nor is there a numerical scaling test demonstrating that operator counts and accuracy remain stable as the number of excited states increases. This independence is load-bearing for the assertion that the DQN selects effective two-body operators for the entire manifold while enabling constant-scaling real-time evolution.

minor comments (1)

The abstract refers to 'chemical systems' and 'a range of bond lengths' without naming the specific molecules or providing quantitative operator counts or error bars; adding these details would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our generalization of RL-CQE to excited states and real-time dynamics. We address the major comment below and have revised the manuscript to improve clarity on the points raised.

read point-by-point responses

Referee: [Abstract] Abstract: The central scalability claim rests on ACSE residuals furnishing a state representation whose dimension is independent of the number of targeted excited states. No explicit construction of the residual vector for N>1 states is supplied, nor is there a numerical scaling test demonstrating that operator counts and accuracy remain stable as the number of excited states increases. This independence is load-bearing for the assertion that the DQN selects effective two-body operators for the entire manifold while enabling constant-scaling real-time evolution.

Authors: We agree that greater explicitness on the multi-state construction would strengthen the presentation. In the revised manuscript we have added a dedicated paragraph in the Methods section that constructs the ACSE residual vector for an N-state manifold: the residuals are evaluated on the purified ensemble density matrix whose two-body marginals are obtained from the shared unitary ansatz; the resulting residual vector is indexed solely by the one-particle basis labels (O(M^4) entries for M orbitals) and does not grow with N because the ensemble averaging is performed before the residual is formed. We have also inserted a new numerical panel (Fig. S3 in the supplement) that reports operator counts and energy errors for N = 1 to N = 5 on the same molecular systems; the selected operator pool size and final accuracy remain essentially constant once N exceeds 2, consistent with the claimed independence. These additions directly support both the DQN selection for the manifold and the constant-scaling time-evolution ansatz. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The paper generalizes the prior RL-CQE framework to excited states and real-time dynamics via an ACSE-residual state representation whose dimension is asserted to depend only on the one-particle basis. This representation, the constant-scaling unitary ansatz for time evolution, and the sign-free operator equivalence are presented as algorithmic design choices validated by external chemical benchmarks achieving chemical accuracy across bond lengths. No quoted step reduces a claimed prediction or first-principles result to a fitted input, self-citation chain, or definitional equivalence; the central claims rest on the RL agent's adaptive selection and numerical benchmarks rather than internal re-labeling of inputs. Self-citations to ground-state results are present but not load-bearing for the excited-state independence or accuracy assertions.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on the validity of the ACSE residual representation for excited states, the equivalence of sign-free operators, and the shared unitary structure of the purified ensemble treatment; no new particles or forces are introduced.

free parameters (1)

RL agent hyperparameters
The deep Q-network training parameters and reward function weights are chosen to achieve robustness but are not derived from first principles.

axioms (2)

domain assumption ACSE residuals provide a state representation whose dimension depends only on the one-particle basis size
Invoked to claim scalability independent of the number of excited states.
domain assumption Sign-free qubit operators remain equivalent in the excited-state setting
Extension of a prior ground-state result; location implied in the abstract description of the algorithm.

pith-pipeline@v0.9.0 · 5745 in / 1469 out tokens · 37218 ms · 2026-05-20T10:21:17.329999+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A key feature of the algorithm is a scalable state representation based on the ACSE residuals, whose dimension grows with the one-particle basis but remains independent of the number of targeted excited states.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our RL-CQE for time evolution derives from a constant-scaling ansatz that represents the wave function with a fixed number of unitary transformations independent of simulation time t, enabled by the shared unitary structure of the purified ensemble treatment of excited states.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 3 internal anchors

[1]

R. E. Blankenship,Molecular Mechanisms of Photosynthesis(Wiley, 2002)

work page 2002
[2]

J. M. Artes Vivancos, I. H. M. van Stokkum, F. Saccon, Y. Hontani, M. Kloz, A. Ruban, R. van Grondelle, and J. T. M. Kennis, Unraveling the excited-state dynamics and light-harvesting functions of xanthophylls in light-harvesting complex ii using femtosecond stimulated raman spectroscopy, Journal of the American Chemical Society142, 17346–17355 (2020)

work page 2020
[3]

G. D. Scholes, G. R. Fleming, A. Olaya-Castro, and R. van Grondelle, Lessons from nature about solar light harvesting, Nature Chemistry3, 763–774 (2011)

work page 2011
[4]

J. E. Greenwald, J. Cameron, N. J. Findlay, T. Fu, S. Gunasekaran, P. J. Skabara, and L. Venkataraman, Highly nonlinear transport across single-molecule junctions via destructive quantum interference, Nature Nanotechnology16, 313–317 (2020)

work page 2020
[5]

L. Bhan, C. L. Covington, and K. Varga, Laser-driven petahertz electron ratchet nanobubbles, Nano Letters22, 4240–4245 (2022)

work page 2022
[6]

Conrad, B

L. Conrad, B. Paulus, and J. C. Tremblay, Non-equilibrium charge transport through molec- ular junctions as stochastic many-electron dynamics, The Journal of Chemical Physics164, 054102 (2026)

work page 2026
[7]

Q. Zeng, B. Chen, S. Zhang, D. Kang, H. Wang, X. Yu, and J. Dai, Full-scale ab initio simulations of laser-driven atomistic dynamics, npj Computational Materials9, 213 (2023). 18

work page 2023
[8]

K. Yang, Y. Zhang, K.-Y. Li, K.-Y. Lin, S. Gopalakrishnan, M. Rigol, and B. L. Lev, Phantom energy in the nonlinear response of a quantum many-body scar state, Science385, 1063–1067 (2024)

work page 2024
[9]

Sazhin, V

A. Sazhin, V. N. Gladilin, A. Erglis, G. Hellmann, F. Vewinger, M. Weitz, M. Wouters, and J. Schmitt, Observation of nonlinear response and onsager regression in a photon bose-einstein condensate, Nature Communications15, 4730 (2024)

work page 2024
[10]

K. E. Dorfman, F. Schlawin, and S. Mukamel, Nonlinear optical signals and spectroscopy with quantum light, Review of Modern Physics88, 045008 (2016)

work page 2016
[11]

Sugisaki, S

K. Sugisaki, S. Yamamoto, S. Nakazawa, K. Toyota, K. Sato, D. Shiomi, and T. Takui, Quantum chemistry on quantum computers: A polynomial-time quantum algorithm for con- structing the wave functions of open-shell molecules, The Journal of Physical Chemistry A 120, 6459–6466 (2016)

work page 2016
[12]

Shi and Y.-M

B. Shi and Y.-M. Lu, Deciphering the nonlocal entanglement entropy of fracton topological orders, Phys. Rev. B97, 144106 (2018)

work page 2018
[13]

Domcke and D

W. Domcke and D. R. Yarkony, Role of conical intersections in molecular spectroscopy and photoinduced chemical dynamics, Annual Review of Physical Chemistry63, 325–352 (2012)

work page 2012
[14]

Domcke, D

W. Domcke, D. R. Yarkony, and H. K¨ oppel,Conical Intersections: Electronic Structure, Dy- namics and Spectroscopy(WORLD SCIENTIFIC, 2004)

work page 2004
[15]

S. E. Smart and D. A. Mazziotti, Quantum solver of contracted eigenvalue equations for scalable molecular simulations on quantum computing devices, Physical Review Letters126, 070504 (2021)

work page 2021
[16]

S. E. Smart and D. A. Mazziotti, Many-fermion simulation from the contracted quantum eigensolver without fermionic encoding of the wave function, Physical Review A105, 062424 (2022)

work page 2022
[17]

Wang and D

Y. Wang and D. A. Mazziotti, Electronic excited states from a variance-based contracted quantum eigensolver, Physical Review A108, 022814 (2023)

work page 2023
[18]

S. E. Smart, D. M. Welakuh, and P. Narang, Many-body excited states with a contracted quantum eigensolver, Journal of Chemical Theory and Computation20, 3580–3589 (2024)

work page 2024
[19]

C. L. Benavides-Riveros, Y. Wang, S. Warren, and D. A. Mazziotti, Quantum simulation of excited states from parallel contracted quantum eigensolvers, New Journal of Physics26, 033020 (2024). 19

work page 2024
[20]

C. L. Benavides-Riveros, L. Chen, C. Schilling, S. Mantilla, and S. Pittalis, Excitations of quantum many-body systems via purified ensembles: A unitary-coupled-cluster-based ap- proach, Physical Review Letters129, 066401 (2022)

work page 2022
[21]

A. C. Mater and M. L. Coote, Deep learning in chemistry, Journal of Chemical Information and Modeling59, 2545–2559 (2019)

work page 2019
[22]

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics- informed machine learning, Nature Reviews Physics3, 422–440 (2021)

work page 2021
[23]

Zhao and H

L. Zhao and H. Zong, Ai-driven decoding of material dynamics: From machine learning po- tentials and interpretability to generative prediction, Advanced Materials1, e14626

work page
[24]

P. O. Dral, Ai in computational chemistry through the lens of a decade-long journey, Chemical Communications60, 3240–3258 (2024)

work page 2024
[25]

Zhang, C

J. Zhang, C. L. Benavides-Riveros, and L. Chen, Artificial-intelligence-based surrogate solution of dissipative quantum dynamics: Physics-informed reconstruction of the universal propagator, The Journal of Physical Chemistry Letters15, 3603–3610 (2024)

work page 2024
[26]

Alexeev, M

Y. Alexeev, M. H. Farag, T. L. Patti, M. E. Wolf, N. Ares, A. Aspuru-Guzik, S. C. Benjamin, Z. Cai, S. Cao, C. Chamberland, Z. Chandani, F. Fedele, I. Hamamura, N. Harrigan, J.-S. Kim, E. Kyoseva, J. G. Lietz, T. Lubowe, A. McCaskey, R. G. Melko, K. Nakaji, A. Peruzzo, P. Rao, B. Schmitt, S. Stanwyck, N. M. Tubman, H. Wang, and T. Costa, Artificial intell...

work page 2025
[27]

A. G. B. Richard S. Sutton,Reinforcement Learning: An Introduction(MIT Press, 1998)

work page 1998
[28]

W. B. Powell,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions(Wiley, 2022)

work page 2022
[29]

X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, Deep rein- forcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems 35, 5064 (2024)

work page 2024
[30]

Ghasemi, A

M. Ghasemi, A. H. Moosavi, and D. Ebrahimi, A comprehensive survey of reinforcement learning: From algorithms to practical challenges (2025), arXiv:2411.18892 [cs.AI]

work page arXiv 2025
[31]

Silver, A

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalch- brenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of go with deep neural networks and tree search, Nature529, 48...

work page 2016
[32]

Silver, T

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science362, 1140 (2018)

work page 2018
[33]

Vinyals, I

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y...

work page 2019
[34]

Kober, J

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research32, 1238–1274 (2013)

work page 2013
[35]

C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Mart´ ın-Mart´ ın, and P. Stone, Deep rein- forcement learning for robotics: A survey of real-world successes, Annual Review of Control, Robotics, and Autonomous Systems8, 153–188 (2025)

work page 2025
[36]

Y. Li, X. Ma, J. Xu, Y. Cui, Z. Cui, Z. Han, L. Huang, T. Kong, Y. Liu, H. Niu, W. Peng, J. Qiao, Z. Ren, H. Shi, Z. Su, J. Tian, Y. Xiao, S. Zhang, L. Zheng, H. Li, and Y. Wu, Gr-rl: Going dexterous and precise for long-horizon robotic manipulation (2025), arXiv:2512.01801 [cs.RO]

work page arXiv 2025
[37]

B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. A. Sallab, S. Yogamani, and P. P´ erez, Deep reinforcement learning for autonomous driving: A survey, IEEE Transactions on Intel- ligent Transportation Systems23, 4909 (2022)

work page 2022
[38]

Dinneweth, A

J. Dinneweth, A. Boubezoul, R. Mandiau, and S. Espi´ e, Multi-agent reinforcement learning for autonomous vehicles: a survey, Autonomous Intelligent Systems2, 27 (2022)

work page 2022
[39]

Tobisawa, K

M. Tobisawa, K. Matsuda, T. Suzuki, T. Harada, J. Hoshino, Y. Itoh, K. Kumagae, J. Mat- suoka, and K. Hattori, Reinforcement learning-based autonomous driving control for efficient road utilization in lane-less environments, Artificial Life and Robotics30, 276–288 (2025)

work page 2025
[40]

D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving, Fine-tuning language models from human preferences (2020), arXiv:1909.08593 21 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[41]

Q. Liu, Z. Song, Y. Liang, Z. Xie, S. Zhang, J. Zhang, and Y. Li, Corlhf: Reinforcement learning from human feedback with cooperative policy-reward optimization for llms, Expert Systems with Applications301, 130113 (2026)

work page 2026
[42]

Reinforcement Learning from Human Feedback

N. Lambert, Reinforcement learning from human feedback (2026), arXiv:2504.12501 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[43]

J. Yao, L. Lin, and M. Bukov, Reinforcement learning for many-body ground-state preparation inspired by counterdiabatic driving, Physical Review X11, 031070 (2021)

work page 2021
[44]

P. Peng, X. Huang, C. Yin, L. Joseph, C. Ramanathan, and P. Cappellaro, Deep reinforcement learning for quantum hamiltonian engineering, Physical Review Applied18, 024033 (2022)

work page 2022
[45]

Wang and D

Y. Wang and D. A. Mazziotti, Quantum many-body simulations from a reinforcement-learned exponential ansatz, Physical Review A112, 022403 (2025)

work page 2025
[46]

Bukov, A

M. Bukov, A. G. R. Day, D. Sels, P. Weinberg, A. Polkovnikov, and P. Mehta, Reinforcement learning in different phases of quantum control, Physical Review X8, 031086 (2018)

work page 2018
[47]

F¨ osel, P

T. F¨ osel, P. Tighineanu, T. Weiss, and F. Marquardt, Reinforcement learning with neural networks for quantum feedback, Physical Review X8, 031084 (2018)

work page 2018
[48]

M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and H. Neven, Universal quantum control through deep reinforcement learning, npj Quantum Information5, 33 (2019)

work page 2019
[49]

Reuer, J

K. Reuer, J. Landgraf, T. F¨ osel, J. O’Sullivan, L. Beltr´ an, A. Akin, G. J. Norris, A. Remm, M. Kerschbaum, J.-C. Besse, F. Marquardt, A. Wallraff, and C. Eichler, Realizing a deep reinforcement learning agent for real-time quantum feedback, Nature Communications14, 7138 (2023)

work page 2023
[50]

M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information: 10th Anniversary Edition(Cambridge University Press, 2012)

work page 2012
[51]

S. B. Bravyi and A. Y. Kitaev, Fermionic quantum computation, Annals of Physics298, 210 (2002)

work page 2002
[52]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing atari with deep reinforcement learning (2013), arXiv:1312.5602 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2013
[53]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Human-level control through deep reinforcement learning, Nature518, 529–533 (2015). 22

work page 2015
[54]

Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, Dueling network architectures for deep reinforcement learning, inProceedings of the 33rd Interna- tional Conference on International Conference on Machine Learning - Volume 48, ICML’16 (JMLR.org, 2016) p. 1995–2003

work page 2016
[55]

H. Nakatsuji, Equation for the direct determination of the density matrix: Time-dependent density equation and perturbation theory, Theoretical Chemistry Accounts: Theory, Compu- tation, and Modeling (Theoretica Chimica Acta)102, 97–104 (1999)

work page 1999
[56]

Rose and D

M. Rose and D. A. Mazziotti, Many-body time evolution from a correlation-efficient quantum algorithms (2025), arXiv:2511.13871 [cs.AI]

work page arXiv 2025
[57]

Cianci, L

C. Cianci, L. F. Santos, and V. S. Batista, Subspace-search quantum imaginary time evolution for excited state computations, Journal of Chemical Theory and Computation20, 8940 (2024)

work page 2024
[58]

Q. Sun, X. Zhang, S. Banerjee, P. Bao, M. Barbry, N. S. Blunt, N. A. Bogdanov, G. H. Booth, J. Chen, Z.-H. Cui, J. J. Eriksen, Y. Gao, S. Guo, J. Hermann, M. R. Hermes, K. Koh, P. Ko- val, S. Lehtola, Z. Li, J. Liu, N. Mardirossian, J. D. McClain, M. Motta, B. Mussard, H. Q. Pham, A. Pulkin, W. Purwanto, P. J. Robinson, E. Ronca, E. R. Sayfutyarova, M. Sc...

work page 2020
[59]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. K¨ opf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: an imperative style, high-performance deep learning library, inProceedings of the 33rd International Confer...

work page 2019
[60]

Feniou, O

C. Feniou, O. Adjoua, B. Claudon, J. Zylberman, E. Giner, and J.-P. Piquemal, Sparse quan- tum state preparation for strongly correlated systems, The Journal of Physical Chemistry Letters15, 3197–3205 (2024)

work page 2024
[61]

Iaconis, S

J. Iaconis, S. Johri, and E. Y. Zhu, Quantum state preparation of normal distributions using matrix product states, npj Quantum Information10, 15 (2024). 23

work page 2024
[62]

L. H. Delgado-Granados, T. J. Krogmeier, L. M. Sager-Smith, I. Avdic, Z. Hu, M. Sajjan, M. Abbasi, S. E. Smart, P. Narang, S. Kais, A. W. Schlimgen, K. Head-Marsden, and D. A. Mazziotti, Quantum algorithms and applications for open quantum systems, Chemical Re- views125, 1823 (2025)

work page 2025
[63]

Warren, Y

S. Warren, Y. Wang, C. L. Benavides-Riveros, and D. A. Mazziotti, Quantum algorithm for polaritonic chemistry based on an exact ansatz, Quantum Science and Technology10, 02LT02 (2025). 24

work page 2025

[1] [1]

R. E. Blankenship,Molecular Mechanisms of Photosynthesis(Wiley, 2002)

work page 2002

[2] [2]

J. M. Artes Vivancos, I. H. M. van Stokkum, F. Saccon, Y. Hontani, M. Kloz, A. Ruban, R. van Grondelle, and J. T. M. Kennis, Unraveling the excited-state dynamics and light-harvesting functions of xanthophylls in light-harvesting complex ii using femtosecond stimulated raman spectroscopy, Journal of the American Chemical Society142, 17346–17355 (2020)

work page 2020

[3] [3]

G. D. Scholes, G. R. Fleming, A. Olaya-Castro, and R. van Grondelle, Lessons from nature about solar light harvesting, Nature Chemistry3, 763–774 (2011)

work page 2011

[4] [4]

J. E. Greenwald, J. Cameron, N. J. Findlay, T. Fu, S. Gunasekaran, P. J. Skabara, and L. Venkataraman, Highly nonlinear transport across single-molecule junctions via destructive quantum interference, Nature Nanotechnology16, 313–317 (2020)

work page 2020

[5] [5]

L. Bhan, C. L. Covington, and K. Varga, Laser-driven petahertz electron ratchet nanobubbles, Nano Letters22, 4240–4245 (2022)

work page 2022

[6] [6]

Conrad, B

L. Conrad, B. Paulus, and J. C. Tremblay, Non-equilibrium charge transport through molec- ular junctions as stochastic many-electron dynamics, The Journal of Chemical Physics164, 054102 (2026)

work page 2026

[7] [7]

Q. Zeng, B. Chen, S. Zhang, D. Kang, H. Wang, X. Yu, and J. Dai, Full-scale ab initio simulations of laser-driven atomistic dynamics, npj Computational Materials9, 213 (2023). 18

work page 2023

[8] [8]

K. Yang, Y. Zhang, K.-Y. Li, K.-Y. Lin, S. Gopalakrishnan, M. Rigol, and B. L. Lev, Phantom energy in the nonlinear response of a quantum many-body scar state, Science385, 1063–1067 (2024)

work page 2024

[9] [9]

Sazhin, V

A. Sazhin, V. N. Gladilin, A. Erglis, G. Hellmann, F. Vewinger, M. Weitz, M. Wouters, and J. Schmitt, Observation of nonlinear response and onsager regression in a photon bose-einstein condensate, Nature Communications15, 4730 (2024)

work page 2024

[10] [10]

K. E. Dorfman, F. Schlawin, and S. Mukamel, Nonlinear optical signals and spectroscopy with quantum light, Review of Modern Physics88, 045008 (2016)

work page 2016

[11] [11]

Sugisaki, S

K. Sugisaki, S. Yamamoto, S. Nakazawa, K. Toyota, K. Sato, D. Shiomi, and T. Takui, Quantum chemistry on quantum computers: A polynomial-time quantum algorithm for con- structing the wave functions of open-shell molecules, The Journal of Physical Chemistry A 120, 6459–6466 (2016)

work page 2016

[12] [12]

Shi and Y.-M

B. Shi and Y.-M. Lu, Deciphering the nonlocal entanglement entropy of fracton topological orders, Phys. Rev. B97, 144106 (2018)

work page 2018

[13] [13]

Domcke and D

W. Domcke and D. R. Yarkony, Role of conical intersections in molecular spectroscopy and photoinduced chemical dynamics, Annual Review of Physical Chemistry63, 325–352 (2012)

work page 2012

[14] [14]

Domcke, D

W. Domcke, D. R. Yarkony, and H. K¨ oppel,Conical Intersections: Electronic Structure, Dy- namics and Spectroscopy(WORLD SCIENTIFIC, 2004)

work page 2004

[15] [15]

S. E. Smart and D. A. Mazziotti, Quantum solver of contracted eigenvalue equations for scalable molecular simulations on quantum computing devices, Physical Review Letters126, 070504 (2021)

work page 2021

[16] [16]

S. E. Smart and D. A. Mazziotti, Many-fermion simulation from the contracted quantum eigensolver without fermionic encoding of the wave function, Physical Review A105, 062424 (2022)

work page 2022

[17] [17]

Wang and D

Y. Wang and D. A. Mazziotti, Electronic excited states from a variance-based contracted quantum eigensolver, Physical Review A108, 022814 (2023)

work page 2023

[18] [18]

S. E. Smart, D. M. Welakuh, and P. Narang, Many-body excited states with a contracted quantum eigensolver, Journal of Chemical Theory and Computation20, 3580–3589 (2024)

work page 2024

[19] [19]

C. L. Benavides-Riveros, Y. Wang, S. Warren, and D. A. Mazziotti, Quantum simulation of excited states from parallel contracted quantum eigensolvers, New Journal of Physics26, 033020 (2024). 19

work page 2024

[20] [20]

C. L. Benavides-Riveros, L. Chen, C. Schilling, S. Mantilla, and S. Pittalis, Excitations of quantum many-body systems via purified ensembles: A unitary-coupled-cluster-based ap- proach, Physical Review Letters129, 066401 (2022)

work page 2022

[21] [21]

A. C. Mater and M. L. Coote, Deep learning in chemistry, Journal of Chemical Information and Modeling59, 2545–2559 (2019)

work page 2019

[22] [22]

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics- informed machine learning, Nature Reviews Physics3, 422–440 (2021)

work page 2021

[23] [23]

Zhao and H

L. Zhao and H. Zong, Ai-driven decoding of material dynamics: From machine learning po- tentials and interpretability to generative prediction, Advanced Materials1, e14626

work page

[24] [24]

P. O. Dral, Ai in computational chemistry through the lens of a decade-long journey, Chemical Communications60, 3240–3258 (2024)

work page 2024

[25] [25]

Zhang, C

J. Zhang, C. L. Benavides-Riveros, and L. Chen, Artificial-intelligence-based surrogate solution of dissipative quantum dynamics: Physics-informed reconstruction of the universal propagator, The Journal of Physical Chemistry Letters15, 3603–3610 (2024)

work page 2024

[26] [26]

Alexeev, M

Y. Alexeev, M. H. Farag, T. L. Patti, M. E. Wolf, N. Ares, A. Aspuru-Guzik, S. C. Benjamin, Z. Cai, S. Cao, C. Chamberland, Z. Chandani, F. Fedele, I. Hamamura, N. Harrigan, J.-S. Kim, E. Kyoseva, J. G. Lietz, T. Lubowe, A. McCaskey, R. G. Melko, K. Nakaji, A. Peruzzo, P. Rao, B. Schmitt, S. Stanwyck, N. M. Tubman, H. Wang, and T. Costa, Artificial intell...

work page 2025

[27] [27]

A. G. B. Richard S. Sutton,Reinforcement Learning: An Introduction(MIT Press, 1998)

work page 1998

[28] [28]

W. B. Powell,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions(Wiley, 2022)

work page 2022

[29] [29]

X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, Deep rein- forcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems 35, 5064 (2024)

work page 2024

[30] [30]

Ghasemi, A

M. Ghasemi, A. H. Moosavi, and D. Ebrahimi, A comprehensive survey of reinforcement learning: From algorithms to practical challenges (2025), arXiv:2411.18892 [cs.AI]

work page arXiv 2025

[31] [31]

Silver, A

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalch- brenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of go with deep neural networks and tree search, Nature529, 48...

work page 2016

[32] [32]

Silver, T

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science362, 1140 (2018)

work page 2018

[33] [33]

Vinyals, I

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y...

work page 2019

[34] [34]

Kober, J

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research32, 1238–1274 (2013)

work page 2013

[35] [35]

C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Mart´ ın-Mart´ ın, and P. Stone, Deep rein- forcement learning for robotics: A survey of real-world successes, Annual Review of Control, Robotics, and Autonomous Systems8, 153–188 (2025)

work page 2025

[36] [36]

Y. Li, X. Ma, J. Xu, Y. Cui, Z. Cui, Z. Han, L. Huang, T. Kong, Y. Liu, H. Niu, W. Peng, J. Qiao, Z. Ren, H. Shi, Z. Su, J. Tian, Y. Xiao, S. Zhang, L. Zheng, H. Li, and Y. Wu, Gr-rl: Going dexterous and precise for long-horizon robotic manipulation (2025), arXiv:2512.01801 [cs.RO]

work page arXiv 2025

[37] [37]

B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. A. Sallab, S. Yogamani, and P. P´ erez, Deep reinforcement learning for autonomous driving: A survey, IEEE Transactions on Intel- ligent Transportation Systems23, 4909 (2022)

work page 2022

[38] [38]

Dinneweth, A

J. Dinneweth, A. Boubezoul, R. Mandiau, and S. Espi´ e, Multi-agent reinforcement learning for autonomous vehicles: a survey, Autonomous Intelligent Systems2, 27 (2022)

work page 2022

[39] [39]

Tobisawa, K

M. Tobisawa, K. Matsuda, T. Suzuki, T. Harada, J. Hoshino, Y. Itoh, K. Kumagae, J. Mat- suoka, and K. Hattori, Reinforcement learning-based autonomous driving control for efficient road utilization in lane-less environments, Artificial Life and Robotics30, 276–288 (2025)

work page 2025

[40] [40]

D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving, Fine-tuning language models from human preferences (2020), arXiv:1909.08593 21 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[41] [41]

Q. Liu, Z. Song, Y. Liang, Z. Xie, S. Zhang, J. Zhang, and Y. Li, Corlhf: Reinforcement learning from human feedback with cooperative policy-reward optimization for llms, Expert Systems with Applications301, 130113 (2026)

work page 2026

[42] [42]

Reinforcement Learning from Human Feedback

N. Lambert, Reinforcement learning from human feedback (2026), arXiv:2504.12501 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2026

[43] [43]

J. Yao, L. Lin, and M. Bukov, Reinforcement learning for many-body ground-state preparation inspired by counterdiabatic driving, Physical Review X11, 031070 (2021)

work page 2021

[44] [44]

P. Peng, X. Huang, C. Yin, L. Joseph, C. Ramanathan, and P. Cappellaro, Deep reinforcement learning for quantum hamiltonian engineering, Physical Review Applied18, 024033 (2022)

work page 2022

[45] [45]

Wang and D

Y. Wang and D. A. Mazziotti, Quantum many-body simulations from a reinforcement-learned exponential ansatz, Physical Review A112, 022403 (2025)

work page 2025

[46] [46]

Bukov, A

M. Bukov, A. G. R. Day, D. Sels, P. Weinberg, A. Polkovnikov, and P. Mehta, Reinforcement learning in different phases of quantum control, Physical Review X8, 031086 (2018)

work page 2018

[47] [47]

F¨ osel, P

T. F¨ osel, P. Tighineanu, T. Weiss, and F. Marquardt, Reinforcement learning with neural networks for quantum feedback, Physical Review X8, 031084 (2018)

work page 2018

[48] [48]

M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and H. Neven, Universal quantum control through deep reinforcement learning, npj Quantum Information5, 33 (2019)

work page 2019

[49] [49]

Reuer, J

K. Reuer, J. Landgraf, T. F¨ osel, J. O’Sullivan, L. Beltr´ an, A. Akin, G. J. Norris, A. Remm, M. Kerschbaum, J.-C. Besse, F. Marquardt, A. Wallraff, and C. Eichler, Realizing a deep reinforcement learning agent for real-time quantum feedback, Nature Communications14, 7138 (2023)

work page 2023

[50] [50]

M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information: 10th Anniversary Edition(Cambridge University Press, 2012)

work page 2012

[51] [51]

S. B. Bravyi and A. Y. Kitaev, Fermionic quantum computation, Annals of Physics298, 210 (2002)

work page 2002

[52] [52]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing atari with deep reinforcement learning (2013), arXiv:1312.5602 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2013

[53] [53]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Human-level control through deep reinforcement learning, Nature518, 529–533 (2015). 22

work page 2015

[54] [54]

Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, Dueling network architectures for deep reinforcement learning, inProceedings of the 33rd Interna- tional Conference on International Conference on Machine Learning - Volume 48, ICML’16 (JMLR.org, 2016) p. 1995–2003

work page 2016

[55] [55]

H. Nakatsuji, Equation for the direct determination of the density matrix: Time-dependent density equation and perturbation theory, Theoretical Chemistry Accounts: Theory, Compu- tation, and Modeling (Theoretica Chimica Acta)102, 97–104 (1999)

work page 1999

[56] [56]

Rose and D

M. Rose and D. A. Mazziotti, Many-body time evolution from a correlation-efficient quantum algorithms (2025), arXiv:2511.13871 [cs.AI]

work page arXiv 2025

[57] [57]

Cianci, L

C. Cianci, L. F. Santos, and V. S. Batista, Subspace-search quantum imaginary time evolution for excited state computations, Journal of Chemical Theory and Computation20, 8940 (2024)

work page 2024

[58] [58]

Q. Sun, X. Zhang, S. Banerjee, P. Bao, M. Barbry, N. S. Blunt, N. A. Bogdanov, G. H. Booth, J. Chen, Z.-H. Cui, J. J. Eriksen, Y. Gao, S. Guo, J. Hermann, M. R. Hermes, K. Koh, P. Ko- val, S. Lehtola, Z. Li, J. Liu, N. Mardirossian, J. D. McClain, M. Motta, B. Mussard, H. Q. Pham, A. Pulkin, W. Purwanto, P. J. Robinson, E. Ronca, E. R. Sayfutyarova, M. Sc...

work page 2020

[59] [59]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. K¨ opf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: an imperative style, high-performance deep learning library, inProceedings of the 33rd International Confer...

work page 2019

[60] [60]

Feniou, O

C. Feniou, O. Adjoua, B. Claudon, J. Zylberman, E. Giner, and J.-P. Piquemal, Sparse quan- tum state preparation for strongly correlated systems, The Journal of Physical Chemistry Letters15, 3197–3205 (2024)

work page 2024

[61] [61]

Iaconis, S

J. Iaconis, S. Johri, and E. Y. Zhu, Quantum state preparation of normal distributions using matrix product states, npj Quantum Information10, 15 (2024). 23

work page 2024

[62] [62]

L. H. Delgado-Granados, T. J. Krogmeier, L. M. Sager-Smith, I. Avdic, Z. Hu, M. Sajjan, M. Abbasi, S. E. Smart, P. Narang, S. Kais, A. W. Schlimgen, K. Head-Marsden, and D. A. Mazziotti, Quantum algorithms and applications for open quantum systems, Chemical Re- views125, 1823 (2025)

work page 2025

[63] [63]

Warren, Y

S. Warren, Y. Wang, C. L. Benavides-Riveros, and D. A. Mazziotti, Quantum algorithm for polaritonic chemistry based on an exact ansatz, Quantum Science and Technology10, 02LT02 (2025). 24

work page 2025