pith. sign in

arxiv: 2607.00939 · v1 · pith:7A6POPENnew · submitted 2026-07-01 · 💻 cs.SE · quant-ph

Leveraging LLM-Based Agentic Systems to Generate Quantum Applications for Test Optimization

Pith reviewed 2026-07-02 08:25 UTC · model grok-4.3

classification 💻 cs.SE quant-ph
keywords LLM-based agentsquantum applicationstest optimizationmulti-agent systemscode generationsoftware engineeringquantum computing
0
0 comments X

The pith

QPipe uses an LLM multi-agent system to convert natural language requirements into executable quantum applications for test optimization with high success rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents QPipe, a multi-agent architecture based on large language models, designed to translate natural language task requirements into quantum computing applications aimed at software test optimization. Through agents handling parsing, formulation, code generation, review, execution, and verification, the system aims to reduce the need for specialized quantum expertise. Evaluation on 20 requirements from real-world benchmarks shows perfect compilation rates and near-complete execution success, with generated solutions often surpassing a genetic algorithm baseline. The work demonstrates that coordinated LLM agents can produce functional quantum workflows at manageable computational cost. This approach could make quantum methods more accessible for optimization problems in software engineering.

Core claim

QPipe autonomously transforms natural-language requirements into traceable quantum-application workflows via specialized agents for requirement parsing, formulation, code generation, review, execution, and verification. Across 20 NL requirements linked to real-world test-optimization benchmarks, it achieves 100% code compilation and 96.7% application execution and result combination rates. The generated quantum applications outperform an offline genetic algorithm baseline in most cases, at average costs of 260.1 seconds and 1.89 million tokens per requirement. Ablation studies indicate that the system's performance depends on code-generation skills, task knowledge, review feedback, and multi

What carries the argument

QPipe, an LLM-based multi-agent architecture with agents specialized in requirement parsing, quantum formulation, code generation, review, execution, and verification.

If this is right

  • QPipe completes quantum-application generation with 100% code compilation and 96.7% execution success across the 20 requirements.
  • The generated solutions outperform the offline genetic algorithm baseline in most cases.
  • The advantage of QPipe depends on retaining code-generation skills, task knowledge, review feedback, and multi-agent decomposition.
  • Average generation requires 260.1 seconds and 1.89M tokens per requirement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method might generalize to other software engineering tasks involving optimization if the benchmark requirements are similar to real-world ones.
  • Performance could vary with different quantum hardware back-ends not tested in the current evaluation.
  • Token and time costs might be reduced through improved agent coordination or model choices.
  • This opens the possibility of fully automated pipelines for quantum-assisted software testing in industrial settings.

Load-bearing premise

The 20 chosen natural language requirements from existing benchmarks represent the full range of real-world test optimization tasks and the agents produce correct outputs without any human intervention across different quantum hardware.

What would settle it

Running QPipe on a fresh collection of natural language requirements from additional test-optimization benchmarks and observing execution success below 90 percent or baseline outperformance in fewer than half the cases would challenge the reported results.

Figures

Figures reproduced from arXiv: 2607.00939 by Aitor Arrieta Marcos, Man Zhang, Ming Tao, Tao Yue, Yuechen Li.

Figure 1
Figure 1. Figure 1: Overview of QPipe. implementations from the blueprint and encoding. Next, Review Agent inspects the artifacts from previous agents and requests repairs when it detects inconsistencies, execution risks, etc. After review passes, the gener￾ated quantum application is executed in Execution Sandbox, and Combination Agent/Script combines decomposed subproblem outputs when needed. In the end, Verification Agent … view at source ↗
Figure 2
Figure 2. Figure 2: Relation between lengths of 20 requirements and RQ1 metrics. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of QRR across successful generations of 10 TCS and 10 TCM requirements 10 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pie charts for six patterns observed in 58 successful runs of [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation results: the left part reports pass@1 for [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Quantum computing is increasingly explored for software engineering (SE) optimization, but translating natural-language (NL) task-level requirements into executable quantum applications still demands substantial quantum and programming expertise. We present QPipe, a large language model (LLM)-based multi-agent architecture that autonomously turns NL requirements into traceable quantum-application workflows through specialized agents for requirement parsing, formulation, code generation, review, execution, and verification. We evaluate QPipe on 20 NL requirements, each associated with a real-world benchmark and a test-optimization problem. QPipe successfully completes the key stages of quantum-application generation across requirements, achieving average rates of 100% for code compilation and 96.7% for application execution and final-result combination, with average generation costs of 260.1 seconds and 1.89M tokens per requirement. Among the generated quantum applications that execute successfully, the returned solutions outperform the offline genetic algorithm baseline in most cases. Ablation results further show that QPipe's advantage depends on retaining code-generation skills, task knowledge, review feedback, and multi-agent decomposition. These results indicate that agentic coordination can support generation of executable quantum applications for tackling test optimization problems from real-world benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces QPipe, an LLM-based multi-agent architecture with specialized agents for parsing, formulation, code generation, review, execution, and verification that autonomously converts natural-language requirements into executable quantum applications for test-optimization problems. It reports evaluation on 20 NL requirements drawn from real-world benchmarks, claiming 100% average code-compilation success, 96.7% average execution and result-combination success, average costs of 260.1 seconds and 1.89M tokens per requirement, and outperformance versus an offline genetic-algorithm baseline in most successful cases, with ablations attributing gains to code-generation skills, task knowledge, review feedback, and multi-agent decomposition.

Significance. If the reported success rates and baseline comparisons hold under broader conditions, the work would demonstrate a practical route to reducing quantum and programming expertise barriers for SE optimization tasks, with potential impact on automated quantum-application pipelines; the ablation results on component contributions add internal validity to the agentic design.

major comments (3)
  1. [Evaluation] Evaluation (20 NL requirements): the reported aggregate success rates (100% compilation, 96.7% execution) lack per-instance error bars, variance measures, or breakdown of the four failure cases, making it impossible to assess stability or identify systematic weaknesses in the pipeline.
  2. [Evaluation] Evaluation (baseline comparison): no description is given of how the offline genetic-algorithm baseline was configured (population size, generations, mutation rates, or termination criteria), preventing reproduction or assessment of whether the reported outperformance is robust.
  3. [Evaluation] Evaluation (generalizability): all 20 requirements are drawn from existing benchmarks; the manuscript provides no experiments on novel NL inputs, different quantum hardware back-ends, or real-world test-optimization tasks outside the benchmark set, leaving the claim that QPipe supports generation "for tackling test optimization problems from real-world benchmarks" without external-validity evidence.
minor comments (2)
  1. [Abstract] The abstract states "outperform the offline genetic algorithm baseline in most cases" but does not quantify how many of the successful executions were compared or report the magnitude of improvement.
  2. [Ablation study] Ablation results are summarized at a high level; a table listing per-component success-rate drops would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the evaluation section of our manuscript. We address each major comment below and outline the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation (20 NL requirements): the reported aggregate success rates (100% compilation, 96.7% execution) lack per-instance error bars, variance measures, or breakdown of the four failure cases, making it impossible to assess stability or identify systematic weaknesses in the pipeline.

    Authors: We agree that per-instance details would improve transparency and allow better assessment of stability. In the revised version, we will add a table presenting compilation and execution outcomes for each of the 20 requirements individually, along with standard deviation measures for the aggregate success rates and costs. This will also include a brief analysis of the four failure cases to identify any systematic patterns. revision: yes

  2. Referee: [Evaluation] Evaluation (baseline comparison): no description is given of how the offline genetic-algorithm baseline was configured (population size, generations, mutation rates, or termination criteria), preventing reproduction or assessment of whether the reported outperformance is robust.

    Authors: We acknowledge the omission of the genetic algorithm configuration details. The revised manuscript will include a complete description of the baseline setup, specifying population size, number of generations, mutation and crossover rates, selection method, and termination criteria, to support reproducibility and allow readers to evaluate the robustness of the outperformance claims. revision: yes

  3. Referee: [Evaluation] Evaluation (generalizability): all 20 requirements are drawn from existing benchmarks; the manuscript provides no experiments on novel NL inputs, different quantum hardware back-ends, or real-world test-optimization tasks outside the benchmark set, leaving the claim that QPipe supports generation "for tackling test optimization problems from real-world benchmarks" without external-validity evidence.

    Authors: The 20 requirements were selected from established benchmarks that reflect real-world test-optimization scenarios. While the current study does not include experiments on novel natural-language inputs or additional hardware back-ends, we will revise the manuscript to explicitly qualify the scope of our claims, acknowledge the limitation on external validity, and outline planned extensions for future work on novel inputs and varied back-ends. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical success rates measured on external benchmarks with independent baseline

full rationale

The paper presents an LLM multi-agent system (QPipe) and reports measured success rates (100% compilation, 96.7% execution) plus outperformance versus an offline GA baseline on 20 NL requirements drawn from existing benchmarks. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The evaluation uses external benchmarks and an independent baseline; ablation studies address internal components but do not reduce the reported metrics to self-definition or self-citation. The central claims are direct empirical observations rather than derivations that collapse to their inputs by construction. This matches the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or new physical entities are introduced; the work rests on the empirical performance of existing LLM models and standard quantum simulators.

pith-pipeline@v0.9.1-grok · 5748 in / 1177 out tokens · 25632 ms · 2026-07-02T08:25:51.338249+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 18 canonical work pages · 3 internal anchors

  1. [1]

    When software engineering meets quantum computing.Communi- cations of the ACM, 65(4):84–88, 2022

    Shaukat Ali, Tao Yue, and Rui Abreu. When software engineering meets quantum computing.Communi- cations of the ACM, 65(4):84–88, 2022

  2. [2]

    Qpipe workflow empirical study, July 2026

    Anonymous Authors. Qpipe workflow empirical study, July 2026. URLhttps://doi.org/10.5281/zeno do.21094908

  3. [3]

    Qpipe workflow tool, July 2026

    Anonymous Authors. Qpipe workflow tool, July 2026. URLhttps://doi.org/10.5281/zenodo.2109483 7

  4. [4]

    Claude models overview.https://platform.claude.com/docs/en/about-claude/models/o verview, 2026

    Anthropic. Claude models overview.https://platform.claude.com/docs/en/about-claude/models/o verview, 2026. Accessed: 2026-06-22. 14

  5. [5]

    Qhackbench: Benchmarking large language models for quantum code gener- ation using pennylane hackathon challenges

    AbdulBasit, MinghaoShao, MuhammadHaiderAsif, NouhailaInnan, MuhammadKashif, AlbertoMarchi- sio, and Muhammad Shafique. Qhackbench: Benchmarking large language models for quantum code gener- ation using pennylane hackathon challenges. In2025 IEEE International Conference on Quantum Artificial Intelligence (QAI), pages 316–322. IEEE, 2025

  6. [6]

    Pattern-based generation and adaptation of quantum workflows

    Martin Beisel, Johanna Barzen, Frank Leymann, Lavinia Stiliadou, Daniel Vietz, and Benjamin Weder. Pattern-based generation and adaptation of quantum workflows. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 3072–3084. IEEE, 2025

  7. [7]

    Enhancing LLM-based quantum code generation with multi-agent optimization and quantum error correction

    Charlie Campbell, Wayne Luk, Hao Chen, and Hongxiang Fan. Enhancing LLM-based quantum code generation with multi-agent optimization and quantum error correction. In2025 62nd ACM/IEEE Design Automation Conference, DAC. IEEE, 2025. doi: 10.1109/DAC63849.2025.11133316

  8. [8]

    Evaluating Large Language Models Trained on Code

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021

  9. [9]

    The smelly eight: An empirical study on the prevalence of code smells in quantum computing

    Qihong Chen, Rúben Câmara, José Campos, André Souto, and Iftekhar Ahmed. The smelly eight: An empirical study on the prevalence of code smells in quantum computing. In2023 IEEE/ACM 45th Inter- national Conference on Software Engineering (ICSE), pages 358–370. IEEE, 2023

  10. [10]

    Qai4ase: Quantum artifi- cial intelligence for automotive software engineering

    Mirko De Vincentiis, Fabio Cassano, Alessandro Pagano, and Antonio Piccinno. Qai4ase: Quantum artifi- cial intelligence for automotive software engineering. InProceedings of the 1st International Workshop on Quantum Programming for Software Engineering, pages 19–21, 2022

  11. [11]

    Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact.Empirical Software Engineering, 10(4): 405–435, 2005

    Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact.Empirical Software Engineering, 10(4): 405–435, 2005

  12. [12]

    Qagent: Anllm-basedmulti-agentsystemforautonomousopenqasm programming.arXiv preprint arXiv:2508.20134, 2025

    ZhenxiaoFu, FanChen, andLeiJiang. Qagent: Anllm-basedmulti-agentsystemforautonomousopenqasm programming.arXiv preprint arXiv:2508.20134, 2025. URLhttps://arxiv.org/abs/2508.20134

  13. [13]

    Google shared dataset of test suite results.https://code.google.com/archive/p/google-sha red-dataset-of-test-suite-results, 2011

    Google. Google shared dataset of test suite results.https://code.google.com/archive/p/google-sha red-dataset-of-test-suite-results, 2011. Accessed: 2026-06-27

  14. [14]

    Quanbench: Benchmarking quantum code generation with large language models

    Xiaoyu Guo, Minggu Wang, and Jianjun Zhao. Quanbench: Benchmarking quantum code generation with large language models. In2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 2657–2669, 2025. doi: 10.1109/ASE63991.2025.00218

  15. [15]

    Search-based software engineering: Trends, techniques and applications.ACM Computing Surveys (CSUR), 45(1):1–61, 2012

    Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. Search-based software engineering: Trends, techniques and applications.ACM Computing Surveys (CSUR), 45(1):1–61, 2012

  16. [16]

    Test optimization in dnn testing: a survey.ACM Transactions on Software Engineering and Methodology, 33 (4):1–42, 2024

    Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, and Yves Le Traon. Test optimization in dnn testing: a survey.ACM Transactions on Software Engineering and Methodology, 33 (4):1–42, 2024

  17. [17]

    A methodological analysis of empirical studies in quantum software testing.ACM Transactions on Software Engineering and Methodology, 2026

    Yuechen Li, Minqi Shao, Jianjun Zhao, and Qichen Wang. A methodological analysis of empirical studies in quantum software testing.ACM Transactions on Software Engineering and Methodology, 2026

  18. [18]

    Llama 4: Model Cards and Prompt Formats.https://www.llama.com/docs/model-cards-and-p rompt-formats/llama4/, 2025

    Meta. Llama 4: Model Cards and Prompt Formats.https://www.llama.com/docs/model-cards-and-p rompt-formats/llama4/, 2025. Accessed: 2026-06-25

  19. [19]

    Quantum software engineering: Roadmap and challenges ahead.ACM Transactions on Software Engineering and Methodology, 34(5):1–48, 2025

    Juan Manuel Murillo, Jose Garcia-Alonso, Enrique Moguel, Johanna Barzen, Frank Leymann, Shaukat Ali, Tao Yue, Paolo Arcaini, Ricardo Pérez-Castillo, Ignacio García-Rodríguez de Guzmán, et al. Quantum software engineering: Roadmap and challenges ahead.ACM Transactions on Software Engineering and Methodology, 34(5):1–48, 2025

  20. [20]

    A 2030 roadmap for software engineering.ACM Transactions on Software Engineering and Methodology, 34(5):1–55, 2025

    Mauro Pezzè, Silvia Abrahão, Birgit Penzenstadler, Denys Poshyvanyk, Abhik Roychoudhury, and Tao Yue. A 2030 roadmap for software engineering.ACM Transactions on Software Engineering and Methodology, 34(5):1–55, 2025

  21. [21]

    An B. B. Pham, Hoa T. Nguyen, and Muhammad Usman. Qbuglm: An agentic benchmarking framework for llm-based quantum software debugging.arXiv preprint arXiv:2606.07314, 2026. URLhttps://arxi v.org/abs/2606.07314

  22. [22]

    Atcs data.https://bitbucket.or g/HelgeS/atcs-data/src/master/, 2018

    Helge Spieker, Arnaud Gotlieb, Dusica Marijan, and Morten Mossige. Atcs data.https://bitbucket.or g/HelgeS/atcs-data/src/master/, 2018. Public dataset release used for continuous integration test-case selection studies; accessed: 2026-06-27. 15

  23. [23]

    Reformu- lating regression test suite optimization using quantum annealing-an empirical study.International Journal on Software Tools for Technology Transfer, 26(6):767–780, 2024

    Antonio Trovato, Manuel De Stefano, Fabiano Pecorelli, Dario Di Nucci, and Andrea De Lucia. Reformu- lating regression test suite optimization using quantum annealing-an empirical study.International Journal on Software Tools for Technology Transfer, 26(6):767–780, 2024

  24. [24]

    Qaoa-gpt: Efficient generation of adaptive and regular quantum approximate optimization algorithm circuits

    Ilya Tyagin, Marwa H Farag, Kyle Sherbert, Karunya Shirali, Yuri Alexeev, and Ilya Safro. Qaoa-gpt: Efficient generation of adaptive and regular quantum approximate optimization algorithm circuits. In 2025 IEEE International Conference on Quantum Computing and Engineering (QCE), volume 1, pages 1505–1515. IEEE, 2025

  25. [25]

    Pablo Valle, Aitor Arrieta, and Maite Arratibel. Applying and extending the delta debugging algorithm for elevator dispatching algorithms (experience paper).Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis, pages 1055–1067, 2023

  26. [26]

    Qiskit humaneval: An evaluation benchmark for quantum code generative models

    Sanjay Vishwakarma, Francis Harkins, Siddharth Golecha, Vishal Sharathchandra Bajpe, Nicolas Dupuis, Luca Buratti, David Kremer, Ismael Faro, Ruchir Puri, and Juan Cruz-Benito. Qiskit humaneval: An evaluation benchmark for quantum code generative models. In2024 IEEE International Conference on Quantum Computing and Engineering (QCE), volume 1, pages 1169–...

  27. [27]

    Quantum approximate optimization algorithm for test case optimization.IEEE Transactions on Software Engineering, 50(12):3249–3264, 2024

    Xinyi Wang, Shaukat Ali, Tao Yue, and Paolo Arcaini. Quantum approximate optimization algorithm for test case optimization.IEEE Transactions on Software Engineering, 50(12):3249–3264, 2024

  28. [28]

    Test case minimization with quantum annealers.ACM Transactions on Software Engineering and Methodology, 34(1):1–24, 2024

    Xinyi Wang, Asmar Muqeet, Tao Yue, Shaukat Ali, and Paolo Arcaini. Test case minimization with quantum annealers.ACM Transactions on Software Engineering and Methodology, 34(1):1–24, 2024

  29. [29]

    Quantum artificial intelligence for software engineering: the road ahead.arXiv preprint arXiv:2505.04797, 2025

    Xinyi Wang, Shaukat Ali, and Paolo Arcaini. Quantum artificial intelligence for software engineering: the road ahead.arXiv preprint arXiv:2505.04797, 2025. URLhttps://arxiv.org/abs/2505.04797

  30. [30]

    Quantum neural network classifier for cancer registry system testing: A feasibility study.ACM Transactions on Software Engineering and Methodology, 35(5):1–24, 2026

    Xinyi Wang, Shaukat Ali, Paolo Arcaini, Narasimha Raghavan Veeraragavan, and Jan F Nygård. Quantum neural network classifier for cancer registry system testing: A feasibility study.ACM Transactions on Software Engineering and Methodology, 35(5):1–24, 2026

  31. [31]

    Integrating quantum computing into workflow modeling and execution

    Benjamin Weder, Uwe Breitenbücher, Frank Leymann, and Karoline Wild. Integrating quantum computing into workflow modeling and execution. In2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC), pages 279–291. IEEE, 2020. doi: 10.1109/UCC48980.2020.00046

  32. [32]

    Deepseek-v4: Towards highly efficient million-token context intelligence

    Anyi Xu, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, Chenchen Ling, et al. Deepseek-v4: Towards highly efficient million-token context intelligence. arXiv preprint arXiv:2606.19348, 2026. URLhttps://arxiv.org/abs/2606.19348

  33. [33]

    Ising-based Test Optimization and Benchmarking

    Yige Yang, Man Zhang, and Tao Yue. Ising-based test optimization and benchmarking.arXiv preprint arXiv:2604.10450, 2026

  34. [34]

    Regression testing minimization, selection and prioritization: a survey

    Shin Yoo and Mark Harman. Regression testing minimization, selection and prioritization: a survey. Software testing, verification and reliability, 22(2):67–120, 2012

  35. [35]

    Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl.arXiv preprint arXiv:2510.00967, 2025

    Cong Yu, Valter Uotila, Shilong Deng, Qingyuan Wu, Tuo Shi, Songlin Jiang, Lei You, and Bo Zhao. Quasar: Quantum assembly code generation using tool-augmented llms via agentic rl.arXiv preprint arXiv:2510.00967, 2025. URLhttps://arxiv.org/abs/2510.00967

  36. [36]

    Vista: Verifier-in-the-loop agentic reinforcement learning for quantum program synthesis

    Cong Yu, Tuo Shi, Valter Uotila, Shilong Deng, Lei You, and Bo Zhao. Vista: Verifier-in-the-loop agentic reinforcement learning for quantum program synthesis. InProceedings of the ACM Conference on AI and Agentic Systems, CAIS ’26, pages 239–252. Association for Computing Machinery, 2026. doi: 10.1145/37 86335.3813148

  37. [37]

    Q-ready: Predictive feasibility assessment for hybrid quantum-classical applica- tions.arXiv preprint arXiv:2606.16201, 2026

    Tao Yue and Man Zhang. Q-ready: Predictive feasibility assessment for hybrid quantum-classical applica- tions.arXiv preprint arXiv:2606.16201, 2026. URLhttps://arxiv.org/abs/2606.16201

  38. [38]

    Llm-qubo: An end-to-end framework for automated qubo transformation from natural language problem descriptions

    Huixiang Zhang, Mahzabeen Emu, and Salimur Choudhury. Llm-qubo: An end-to-end framework for automated qubo transformation from natural language problem descriptions. InProceedings of the AAAI Symposium Series, volume 7, pages 411–418, 2025

  39. [39]

    Empirical studies on quantum optimization for software engineering: A systematic analysis.arXiv preprint arXiv:2510.27113, 2025

    Man Zhang, Yuechen Li, Tao Yue, and Kai-Yuan Cai. Empirical studies on quantum optimization for software engineering: A systematic analysis.arXiv preprint arXiv:2510.27113, 2025. URLhttps://arxi v.org/abs/2510.27113

  40. [40]

    Quantum optimization for software engineering: A survey.ACM Trans

    Man Zhang, Yuechen Li, Tao Yue, and Kai-Yuan Cai. Quantum optimization for software engineering: A survey.ACM Trans. Softw. Eng. Methodol., May 2026. ISSN 1049-331X. doi: 10.1145/3816147. URL https://doi.org/10.1145/3816147. Just Accepted. 16

  41. [42]

    URLhttps://arxiv.org/abs/2007.07047

  42. [43]

    Quantum-based software engineering.arXiv preprint arXiv:2505.23674, 2025

    Jianjun Zhao. Quantum-based software engineering.arXiv preprint arXiv:2505.23674, 2025. URLhttps: //arxiv.org/abs/2505.23674. 17