AutoQResearch: LLM-Guided Closed-Loop Policy Search for Adaptive Variational Quantum Optimization

Hoong Chuin LAU; Monit Sharma

arxiv: 2604.24283 · v1 · submitted 2026-04-27 · 🪐 quant-ph

AutoQResearch: LLM-Guided Closed-Loop Policy Search for Adaptive Variational Quantum Optimization

Monit Sharma , Hoong Chuin LAU This is my paper

Pith reviewed 2026-05-08 04:09 UTC · model grok-4.3

classification 🪐 quant-ph

keywords variational quantum optimizationLLM-guided searchadaptive policy searchMaximum Independent SetCapacitated Vehicle Routing Problemclosed-loop experimentationstaged evaluation

0 comments

The pith

LLM-guided closed-loop search discovers adaptive policies for variational quantum optimization that outperform static baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework called AutoQResearch that uses a large language model to search for adaptive solver-control policies in variational quantum algorithms. Rather than fixing a single configuration of solver family, ansatz, objective, and optimizer, the system lets the LLM edit policies that respond to runtime diagnostics like feasibility and convergence. Candidates are first tested with cheap scout evaluations and only the best advance to full runs, guarding against unreliable rankings from proxies. On Maximum Independent Set instances from 16 to 64 vertices, the discovered policies beat static baselines and exhibit scale-dependent preferences, with CVaR objectives working at small sizes and qubit compression at larger ones. Similar adaptations in sampling budgets, penalties, and repair protocols produce high-quality solutions for Capacitated Vehicle Routing problems up to 13 customers.

Core claim

AutoQResearch casts variational quantum algorithm configuration for combinatorial optimization as sequential policy search over a design space. An LLM agent iteratively edits a small policy surface conditioned on diagnostics, with a staged evaluation harness that screens candidates via cheap scouts before full confirmation. This process yields policies that adapt to instance scale and problem type, substantially outperforming static baselines on MIS and CVRP benchmarks.

What carries the argument

LLM agent that edits a policy surface under a fixed evaluation harness, with cheap scout evaluations screening candidates before promotion to full confirmation runs.

If this is right

CVaR objectives are effective at small scale for MIS while QRAO-based qubit compression provides better scaling.
Adaptive policies for CVRP adjust sampling budget, penalty design, and hybrid repair protocols to achieve high-quality solutions on training curricula and held-out benchmarks.
Staged confirmation is essential because cheap proxy evaluations can materially misestimate policy quality and invert rankings.
The framework enables autonomous discovery of solver configurations without requiring continuous expert input.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach suggests that adaptive policies conditioned on diagnostics may be necessary as problem sizes grow beyond current scales.
Applying the same closed-loop search to other variational quantum tasks could automate configuration in quantum machine learning or simulation.
Improving the correlation between scout and full evaluations would strengthen the reliability of the selection process.
The observed scale-dependent behavior implies that future quantum optimization may benefit from policies that change with instance size rather than remaining fixed.

Load-bearing premise

Cheap scout evaluations rank candidate policies in the same order as more expensive full confirmation runs would.

What would settle it

Performing full confirmation runs on a larger set of candidates and observing that promoted policies perform worse than some discarded ones or static baselines on the test instances.

Figures

Figures reproduced from arXiv: 2604.24283 by Hoong Chuin LAU, Monit Sharma.

**Figure 1.** Figure 1: Stage-wise MIS search trajectories. Each panel is local to one stage. Green points denote retained scout candidates, gray points denote discarded view at source ↗

**Figure 2.** Figure 2: MIS curriculum overview comparing best scout proxy, best confirmed view at source ↗

**Figure 3.** Figure 3: Stage-wise CVRP search trajectories. The 8-customer stage improves feasibility through increased sampling, while the 10-customer stage improves view at source ↗

**Figure 4.** Figure 4: CVRP curriculum overview showing best scout proxy, best confirmed view at source ↗

read the original abstract

Configuring variational quantum algorithms for combinatorial optimization remains a difficult, expert-driven process requiring coordinated choices over solver family, ansatz, objective, and optimizer. We present AutoQResearch, an LLM-guided closed-loop experimentation framework that casts this task as sequential policy search over a curated design space. Instead of a single static configuration, the framework searches for adaptive solver-control policies that condition future decisions on diagnostics such as feasibility, optimality gap, and convergence stagnation. The system operates through a structured workflow: an LLM agent edits a small policy surface under a fixed evaluation harness, candidate policies are screened using cheap scout evaluations, and only the strongest candidates are promoted to full confirmation. This enables controlled autonomous exploration while guarding against proxy overfitting and unstable selection. We evaluate the framework on Maximum Independent Set (MIS) and the Capacitated Vehicle Routing Problem (CVRP). On MIS instances (16--64 vertices), discovered policies substantially outperform static baselines and reveal scale-dependent behavior: CVaR objectives are effective at small scale, while QRAO-based qubit compression provides the most effective explored scaling path. On CVRP curricula (8--12 customers) and a held-out E-n13-k4 benchmark, the framework discovers adaptations involving sampling budget, penalty design, and hybrid repair protocols, yielding high-quality solutions. Methodologically, we find that staged confirmation is essential: cheap proxy evaluations can materially misestimate policy quality and even invert candidate rankings. Overall, the paper positions AutoQResearch as a benchmarked quantum--GenAI co-design workflow for autonomous solver discovery in variational quantum optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AutoQResearch gives a practical LLM-guided framework for adaptive variational quantum policies on MIS and CVRP, but the staged scout-confirm process lacks the correlation data needed to support the outperformance claims.

read the letter

The main thing to know is that this paper describes AutoQResearch, an LLM-driven loop that searches for adaptive policies to configure variational quantum solvers instead of locking in one static setup, and it reports scale-dependent behaviors on MIS instances from 16 to 64 vertices plus CVRP curricula up to a held-out benchmark. The workflow has the LLM edit policies over a design space, screens candidates with cheap scout runs, and promotes only the stronger ones to full confirmation runs that use quantum diagnostics such as feasibility and convergence stagnation. This produces policies that switch objectives or adjust sampling and penalties depending on problem size, with CVaR favored at small scales and QRAO-based compression emerging as the better scaling path. On CVRP the system also discovers hybrid repair protocols that yield usable solutions. The combination of LLM editing with staged filtering and quantum-specific conditioning is the concrete new piece; it moves past purely static or black-box tuning by making the policy responsive to runtime signals. The paper does a solid job laying out an implementable workflow that could lower the barrier for experimenting with these solvers on graph and routing tasks. The central soft spot is the validation of the staged filter itself. The description correctly notes that cheap scouts can misestimate quality and invert rankings, yet no correlation figures, inversion rates, or misestimation magnitudes are supplied between scout scores and the full confirmation outcomes that underpin the reported gains. Without those numbers it remains possible that some of the observed advantage traces to selection artifacts rather than genuine policy superiority. The summary also omits specific deltas, instance counts, and error bars, which makes the size of the improvements hard to judge. This work is aimed at quantum optimization researchers who want to automate configuration for combinatorial problems and at hybrid quantum-classical groups open to GenAI assistance. A reader looking for a documented workflow and scale-dependent observations will get value from it. It deserves a serious referee because the core idea is grounded and the target problems are practical; I would send it to review with the expectation that the authors add the missing scout-to-full correlation metrics and the concrete performance numbers.

Referee Report

2 major / 1 minor

Summary. The paper introduces AutoQResearch, an LLM-guided closed-loop policy search framework for adaptive variational quantum optimization. It casts VQA configuration as sequential search over a design space of solver family, ansatz, objective, and optimizer, using an LLM agent to edit policies conditioned on diagnostics such as feasibility and optimality gap. Candidate policies are screened via cheap scout evaluations before full confirmation runs. On MIS instances (16-64 vertices) the framework claims discovered policies substantially outperform static baselines and exhibit scale-dependent behavior (CVaR effective at small scale, QRAO qubit compression for scaling). On CVRP curricula (8-12 customers) plus held-out E-n13-k4 it reports adaptations in sampling budget, penalty design, and hybrid repair that yield high-quality solutions. The abstract emphasizes that staged confirmation is essential because proxies can misestimate quality and invert rankings.

Significance. If the empirical claims are supported by detailed quantitative evidence, the work could meaningfully advance autonomous co-design of variational quantum algorithms by demonstrating that LLM-guided search can discover adaptive, scale-dependent policies that outperform hand-tuned static baselines. The staged scout-plus-confirmation workflow and explicit acknowledgment of proxy limitations address practical reproducibility concerns in quantum optimization. The reported scale-dependent strategy shifts (CVaR vs. QRAO) are potentially valuable for guiding future hardware-aware solver design.

major comments (2)

[Abstract] Abstract: the central claim that discovered policies 'substantially outperform static baselines' on MIS (16--64 vertices) is presented without numerical deltas, error bars, instance counts, or explicit baseline definitions, preventing quantitative assessment of the reported gains or their statistical reliability.
[Abstract] Abstract: the assertion that 'staged confirmation is essential' because 'cheap proxy evaluations can materially misestimate policy quality and even invert candidate rankings' is not accompanied by any quantitative validation (Spearman/Pearson correlation, inversion rate, or misestimation magnitude) between scout scores and the full confirmation runs that underpin the outperformance claims. This validation is load-bearing for the reliability of the promoted policies.

minor comments (1)

[Abstract] The abstract refers to 'a curated design space' and 'small policy surface' without specifying their cardinality or contents; adding a brief enumeration or table would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We agree that the abstract would benefit from additional quantitative detail to support the summarized claims, and we address each major comment below. The main text already contains the supporting experimental evidence, but we will revise the abstract accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that discovered policies 'substantially outperform static baselines' on MIS (16--64 vertices) is presented without numerical deltas, error bars, instance counts, or explicit baseline definitions, preventing quantitative assessment of the reported gains or their statistical reliability.

Authors: The manuscript body (Sections 4.1–4.2 and associated tables/figures) reports the requested details: performance deltas with error bars across repeated runs, the exact number of MIS instances evaluated at each scale (16–64 vertices), and explicit definitions of the static baselines (fixed configurations of ansatz, objective, and optimizer). To address the concern and improve self-containment of the abstract, we will revise it to incorporate key quantitative highlights from these results. revision: yes
Referee: [Abstract] Abstract: the assertion that 'staged confirmation is essential' because 'cheap proxy evaluations can materially misestimate policy quality and even invert candidate rankings' is not accompanied by any quantitative validation (Spearman/Pearson correlation, inversion rate, or misestimation magnitude) between scout scores and the full confirmation runs that underpin the outperformance claims. This validation is load-bearing for the reliability of the promoted policies.

Authors: The manuscript provides empirical support for this methodological finding through explicit analysis of scout-versus-confirmation discrepancies in Section 3.3 and the experimental results. We agree that summarizing the quantitative validation (e.g., observed correlations and inversion rates) directly in the abstract will strengthen the presentation. We will revise the abstract to include a concise statement of these metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical policy search framework is self-contained

full rationale

The paper presents AutoQResearch as an LLM-guided experimental workflow for discovering adaptive policies in variational quantum optimization, with results reported from direct evaluations on MIS (16-64 vertices) and CVRP instances. No derivation chain, first-principles prediction, or mathematical identity is claimed; the central claims rest on empirical outperformance of discovered policies versus static baselines, without any reduction to fitted parameters defined inside the paper or self-citation load-bearing steps. The staged scout/confirmation filter is described as a methodological safeguard whose correlation is not quantified, but this is a validation gap rather than circularity. The framework is therefore self-contained as an empirical search procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The workflow implicitly assumes a curated design space exists and that LLM edits remain within valid policy syntax.

axioms (1)

domain assumption LLM agent produces syntactically valid and semantically relevant policy edits within the curated design space
The closed-loop workflow depends on this capability to generate candidate policies.

pith-pipeline@v0.9.0 · 5588 in / 1203 out tokens · 71410 ms · 2026-05-08T04:09:35.252525+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Love, Al ´an Aspuru-Guzik, and Jeremy L

Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Al ´an Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvalue solver on a photonic quantum processor.Nature Communications, 5(1), July 2014

work page 2014
[2]

A quantum approximate optimization algorithm, 2014

Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm, 2014

work page 2014
[3]

Barkoutsos, Giacomo Nannicini, Anton Robert, Ivano Tavernelli, and Stefan Woerner

Panagiotis Kl. Barkoutsos, Giacomo Nannicini, Anton Robert, Ivano Tavernelli, and Stefan Woerner. Improving variational quantum opti- mization using cvar.Quantum, 4:256, April 2020

work page 2020
[4]

Egger, Jakub Mare ˇcek, and Stefan Woerner

Daniel J. Egger, Jakub Mare ˇcek, and Stefan Woerner. Warm-starting quantum optimization.Quantum, 5:479, June 2021

work page 2021
[5]

Lotshaw, James Ostrowski, Travis S

Rebekah Herrman, Phillip C. Lotshaw, James Ostrowski, Travis S. Hum- ble, and George Siopsis. Multi-angle quantum approximate optimization algorithm, 2021

work page 2021
[6]

Glick, Takashi Imamichi, Toshinari Itoko, Richard J

Bryce Fuller, Charles Hadfield, Jennifer R. Glick, Takashi Imamichi, Toshinari Itoko, Richard J. Thompson, Yang Jiao, Marna M. Kagele, Adriana W. Blom-Schieber, Rudy Raymond, and Antonio Mezzacapo. Approximate solutions of combinatorial problems via quantum relax- ations, 2021

work page 2021
[7]

Patti, Diego Garc ´ıa-Mart´ın, Giancarlo Camilo, Anima Anandkumar, and Leandro Aolita

Marco Sciorilli, Lucas Borges, Taylor L. Patti, Diego Garc ´ıa-Mart´ın, Giancarlo Camilo, Anima Anandkumar, and Leandro Aolita. Towards large-scale quantum optimization solvers with few qubits.Nature Communications, 16(1), January 2025

work page 2025
[8]

Adams, and Nando de Freitas

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2016

work page 2016
[9]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023

work page 2023
[10]

The ai scientist: Towards fully automated open-ended scientific discovery, 2024

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery, 2024

work page 2024
[11]

Mlagentbench: Evaluating language agents on machine learning experimentation, 2024

Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. Mlagentbench: Evaluating language agents on machine learning experimentation, 2024

work page 2024
[12]

autoresearch: Ai agents running au- tonomous ml experiments

Andrej Karpathy. autoresearch: Ai agents running au- tonomous ml experiments. GitHub repository. Available at: https://github.com/karpathy/autoresearch, 2026. MIT License. Accessed: 2026-03-23

work page 2026
[13]

Garey and David S

Michael R. Garey and David S. Johnson.Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979

work page 1979
[14]

MOS- SIAM Series on Optimization

Paolo Toth and Daniele Vigo.The Vehicle Routing Problem. MOS- SIAM Series on Optimization. SIAM, Philadelphia, PA, 2002

work page 2002
[15]

Qubit-scalable cvrp via lagrangian knapsack decomposition and noise-aware quantum execution, 2026

Monit Sharma and Hoong Chuin Lau. Qubit-scalable cvrp via lagrangian knapsack decomposition and noise-aware quantum execution, 2026

work page 2026
[16]

Fisher and Ramchandran Jaikumar

Marshall L. Fisher and Ramchandran Jaikumar. A generalized assign- ment heuristic for vehicle routing.Networks, 11(2):109–124, 1981

work page 1981
[17]

A comparative study of quantum optimization techniques for solving combinatorial optimization bench- mark problems, 2025

Monit Sharma and Hoong Chuin Lau. A comparative study of quantum optimization techniques for solving combinatorial optimization bench- mark problems, 2025

work page 2025
[18]

New benchmark instances for the capacitated vehicle routing problem.European Journal of Operational Research, 257(3):845–858, 2017

Eduardo Uchoa, Diego Pecin, Artur Pessoa, Marcus Poggi, Thibaut Vidal, and Anand Subramanian. New benchmark instances for the capacitated vehicle routing problem.European Journal of Operational Research, 257(3):845–858, 2017

work page 2017

[1] [1]

Love, Al ´an Aspuru-Guzik, and Jeremy L

Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Al ´an Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvalue solver on a photonic quantum processor.Nature Communications, 5(1), July 2014

work page 2014

[2] [2]

A quantum approximate optimization algorithm, 2014

Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm, 2014

work page 2014

[3] [3]

Barkoutsos, Giacomo Nannicini, Anton Robert, Ivano Tavernelli, and Stefan Woerner

Panagiotis Kl. Barkoutsos, Giacomo Nannicini, Anton Robert, Ivano Tavernelli, and Stefan Woerner. Improving variational quantum opti- mization using cvar.Quantum, 4:256, April 2020

work page 2020

[4] [4]

Egger, Jakub Mare ˇcek, and Stefan Woerner

Daniel J. Egger, Jakub Mare ˇcek, and Stefan Woerner. Warm-starting quantum optimization.Quantum, 5:479, June 2021

work page 2021

[5] [5]

Lotshaw, James Ostrowski, Travis S

Rebekah Herrman, Phillip C. Lotshaw, James Ostrowski, Travis S. Hum- ble, and George Siopsis. Multi-angle quantum approximate optimization algorithm, 2021

work page 2021

[6] [6]

Glick, Takashi Imamichi, Toshinari Itoko, Richard J

Bryce Fuller, Charles Hadfield, Jennifer R. Glick, Takashi Imamichi, Toshinari Itoko, Richard J. Thompson, Yang Jiao, Marna M. Kagele, Adriana W. Blom-Schieber, Rudy Raymond, and Antonio Mezzacapo. Approximate solutions of combinatorial problems via quantum relax- ations, 2021

work page 2021

[7] [7]

Patti, Diego Garc ´ıa-Mart´ın, Giancarlo Camilo, Anima Anandkumar, and Leandro Aolita

Marco Sciorilli, Lucas Borges, Taylor L. Patti, Diego Garc ´ıa-Mart´ın, Giancarlo Camilo, Anima Anandkumar, and Leandro Aolita. Towards large-scale quantum optimization solvers with few qubits.Nature Communications, 16(1), January 2025

work page 2025

[8] [8]

Adams, and Nando de Freitas

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2016

work page 2016

[9] [9]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023

work page 2023

[10] [10]

The ai scientist: Towards fully automated open-ended scientific discovery, 2024

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery, 2024

work page 2024

[11] [11]

Mlagentbench: Evaluating language agents on machine learning experimentation, 2024

Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. Mlagentbench: Evaluating language agents on machine learning experimentation, 2024

work page 2024

[12] [12]

autoresearch: Ai agents running au- tonomous ml experiments

Andrej Karpathy. autoresearch: Ai agents running au- tonomous ml experiments. GitHub repository. Available at: https://github.com/karpathy/autoresearch, 2026. MIT License. Accessed: 2026-03-23

work page 2026

[13] [13]

Garey and David S

Michael R. Garey and David S. Johnson.Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979

work page 1979

[14] [14]

MOS- SIAM Series on Optimization

Paolo Toth and Daniele Vigo.The Vehicle Routing Problem. MOS- SIAM Series on Optimization. SIAM, Philadelphia, PA, 2002

work page 2002

[15] [15]

Qubit-scalable cvrp via lagrangian knapsack decomposition and noise-aware quantum execution, 2026

Monit Sharma and Hoong Chuin Lau. Qubit-scalable cvrp via lagrangian knapsack decomposition and noise-aware quantum execution, 2026

work page 2026

[16] [16]

Fisher and Ramchandran Jaikumar

Marshall L. Fisher and Ramchandran Jaikumar. A generalized assign- ment heuristic for vehicle routing.Networks, 11(2):109–124, 1981

work page 1981

[17] [17]

A comparative study of quantum optimization techniques for solving combinatorial optimization bench- mark problems, 2025

Monit Sharma and Hoong Chuin Lau. A comparative study of quantum optimization techniques for solving combinatorial optimization bench- mark problems, 2025

work page 2025

[18] [18]

New benchmark instances for the capacitated vehicle routing problem.European Journal of Operational Research, 257(3):845–858, 2017

Eduardo Uchoa, Diego Pecin, Artur Pessoa, Marcus Poggi, Thibaut Vidal, and Anand Subramanian. New benchmark instances for the capacitated vehicle routing problem.European Journal of Operational Research, 257(3):845–858, 2017

work page 2017