Large Language Model-Based Agents for Software Engineering: A Survey
Pith reviewed 2026-05-17 12:29 UTC · model grok-4.3
The pith
This survey gathers 124 papers on LLM-based agents for software engineering and sorts them by software engineering tasks and agent structures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs with the capabilities of perceiving and utilizing external resources and tools. To date, LLM-based agents have been applied and shown remarkable effectiveness in Software Engineering (SE). The synergy between multiple agents and human interaction brings further promise in tackling complex real-world SE problems. In this work, we present a comprehensive and systematic survey on LLM-based agents for SE. We collect 124 papers and categorize them from two the SE
What carries the argument
The two-perspective categorization system that organizes papers according to software engineering tasks on one side and agent architectures and interactions on the other.
If this is right
- Developers gain a structured way to find relevant work on using agents for specific SE activities like coding or testing.
- Insights into how agent collaboration and human-in-the-loop setups can address more difficult problems in software development.
- Identification of gaps that point toward research on improving agent reliability and integration with existing SE tools.
Where Pith is reading between the lines
- Such a categorization might help in creating taxonomies that could be applied to LLM agents in other engineering domains beyond software.
- Future surveys could track how the field evolves by updating the paper list and reapplying the same perspectives.
- The emphasis on external resources and tools suggests potential for agents that integrate with version control systems or testing frameworks in novel ways.
Load-bearing premise
The 124 papers collected represent the main body of work in this area without major omissions and the chosen categorization from SE and agent perspectives covers the key distinctions without significant overlaps or missing categories.
What would settle it
A review of recent publications that reveals many important papers on LLM-based agents in software engineering that were not included in the survey or that do not align well with either the SE or agent perspective categories.
read the original abstract
The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs with the capabilities of perceiving and utilizing external resources and tools. To date, LLM-based agents have been applied and shown remarkable effectiveness in Software Engineering (SE). The synergy between multiple agents and human interaction brings further promise in tackling complex real-world SE problems. In this work, we present a comprehensive and systematic survey on LLM-based agents for SE. We collect 124 papers and categorize them from two perspectives, i.e., the SE and agent perspectives. In addition, we discuss open challenges and future directions in this critical domain. The repository of this survey is at https://github.com/FudanSELab/Agent4SE-Paper-List.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a survey on LLM-based agents for Software Engineering. It collects 124 papers from the literature and categorizes them using two perspectives: an SE perspective (covering tasks such as requirements, design, coding, testing, and maintenance) and an agent perspective (covering components such as perception, planning, memory, and tool use, along with multi-agent and human-in-the-loop setups). The survey also identifies open challenges and outlines future directions, accompanied by a public GitHub repository listing the papers.
Significance. If the paper collection is shown to be representative and the dual categorization is applied consistently without major gaps or overlaps, the survey would provide a useful map of an emerging interdisciplinary area. The public repository strengthens reproducibility and allows the community to extend the list. However, the overall significance is limited by the absence of a documented, reproducible selection protocol, which is a standard requirement for systematic surveys in this field.
major comments (2)
- [Section 2] Collection methodology (Section 2): The claim of a 'comprehensive and systematic survey' rests on the collection of 124 papers, yet no search strings, databases (arXiv, ACM DL, IEEE Xplore, etc.), date range, or inclusion/exclusion criteria are stated. This omission prevents verification that the sample is representative and free of venue or temporal bias.
- [Section 4] Categorization framework (Section 4): The two-perspective taxonomy is presented as the core organizational device, but the manuscript provides no explicit discussion or examples of how papers that span multiple SE tasks and agent features are assigned, nor any check for category overlap or unclassified work. Without such validation, the taxonomy's completeness and non-redundancy cannot be assessed.
minor comments (3)
- [Abstract] The abstract would benefit from a single sentence stating the time window of the literature search.
- [Figure 1] Figure 1 (or the taxonomy diagram) should include a small number of concrete paper examples placed in each leaf category to illustrate classification decisions.
- [Repository] The GitHub repository is a clear asset; adding a last-updated date and a brief description of how new papers will be incorporated would further improve its utility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our survey. We address each major comment below and will revise the manuscript to enhance methodological transparency and taxonomy clarity.
read point-by-point responses
-
Referee: [Section 2] Collection methodology (Section 2): The claim of a 'comprehensive and systematic survey' rests on the collection of 124 papers, yet no search strings, databases (arXiv, ACM DL, IEEE Xplore, etc.), date range, or inclusion/exclusion criteria are stated. This omission prevents verification that the sample is representative and free of venue or temporal bias.
Authors: We acknowledge that the current manuscript does not provide an explicit description of the collection protocol in Section 2. In the revision, we will add a dedicated subsection detailing the search process: databases queried include arXiv, Google Scholar, ACM Digital Library, and IEEE Xplore; search strings combined terms such as 'LLM-based agent' with SE task keywords (e.g., 'requirements engineering', 'code generation', 'testing'); the time range covers January 2022 to August 2024 to capture the post-ChatGPT emergence of the topic; and inclusion criteria require papers to propose, implement, or evaluate LLM agents for concrete SE tasks, while excluding standalone LLM studies without agent or SE focus and non-English publications. This addition will allow independent verification of representativeness. The existing public GitHub repository will be updated with the full search log and paper metadata to support reproducibility. revision: yes
-
Referee: [Section 4] Categorization framework (Section 4): The two-perspective taxonomy is presented as the core organizational device, but the manuscript provides no explicit discussion or examples of how papers that span multiple SE tasks and agent features are assigned, nor any check for category overlap or unclassified work. Without such validation, the taxonomy's completeness and non-redundancy cannot be assessed.
Authors: We agree that the manuscript would benefit from explicit guidance on taxonomy application. We will insert a new paragraph in Section 4 describing the assignment rules: each paper is classified by its primary SE task (determined by the core empirical contribution) and primary agent component (e.g., planning when reasoning chains dominate), with secondary aspects noted via cross-references or table footnotes. We will provide three concrete examples of multi-category papers and explain their placement. We will also state that all 124 collected papers fit within the taxonomy after review, with no unclassified items, and briefly discuss how the hierarchical structure reduces overlap. These additions will allow readers to evaluate completeness and non-redundancy. revision: yes
Circularity Check
No circularity: survey reports external literature without derivations or self-referential reductions
full rationale
This is a survey paper that collects 124 external papers from the literature and organizes them under two perspectives (SE and agent). It contains no equations, parameter fittings, predictions, or derivations that could reduce to the paper's own inputs by construction. The central claim of comprehensiveness is a descriptive assertion about the collection process rather than a mathematical or fitted result; no self-citation chain or ansatz is used to justify any quantitative output. Per the guidelines, a self-contained descriptive survey against external benchmarks receives score 0 with no steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 124 papers identified through the authors' search constitute a sufficiently complete and unbiased sample of relevant LLM-agent SE research.
Forward citations
Cited by 17 Pith papers
-
Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis
ADI equips AI debugging agents with function-level interaction via a new execution trace structure, raising SWE-bench Verified resolution to 63.8% at $1.28 per task and delivering 6-18% gains when added to existing agents.
-
The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE
Software engineering scope expands beyond executable code to semi-executable artifacts best diagnosed by the new six-ring Semi-Executable Stack model.
-
ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories
ReCodeAgent uses a multi-agent system to translate and validate large code repositories across multiple programming languages, achieving 60.8% higher test pass rates than prior neuro-symbolic and agentic methods on 11...
-
Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios
A new benchmark for 0-to-1 CLI tool generation shows state-of-the-art LLMs achieve under 43% success rate with black-box equivalence testing against real oracles.
-
An End-to-End Approach for Fixing Concurrency Bugs via SHB-Based Context Extractor
ConFixAgent repairs diverse concurrency bugs end-to-end by using Static Happens-Before graphs to extract relevant code context for LLMs, outperforming prior tools in benchmarks.
-
FLARE: Agentic Coverage-Guided Fuzzing for LLM-Based Multi-Agent Systems
FLARE extracts specifications from multi-agent LLM code and applies coverage-guided fuzzing to achieve 96.9% inter-agent and 91.1% intra-agent coverage while uncovering 56 new failures across 16 applications.
-
Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure
Large-scale trajectory analysis of 19 coding agents on 500 tasks finds that LLM choice drives outcomes more than framework design and that context-gathering plus validation behaviors improve success beyond task diffic...
-
Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering
StackRepoQA shows LLMs reach only moderate accuracy on multi-file Java QA tasks, with gains from graph-based retrieval but frequent reliance on verbatim answer reproduction.
-
Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
A rubric-based generative reward model improves reinforced fine-tuning of SWE agents by supplying richer behavioral guidance than binary terminal rewards alone.
-
Revisiting DAgger in the Era of LLM-Agents
DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
-
Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads
Combining local routing with prompt compression saves 45-79% cloud tokens on edit and explanation workloads, while a fuller set including draft-review saves 51% on RAG-heavy tasks.
-
EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents
EvoDev introduces an iterative feature-driven framework with a DAG-based Feature Map for context propagation that improves LLM agent performance on end-to-end software development tasks by 56.8% over the best baseline.
-
Agentless: Demystifying LLM-based Software Engineering Agents
Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.
-
From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines
The central challenge in AI-augmented CI/CD is designing authority transfer from humans to agents under constraints, as current systems remain limited to bounded data-plane autonomy backed by external governance.
-
Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering
LLM judges for code tasks show high sensitivity to prompt biases that systematically favor certain options, changing accuracy and model rankings even when code is unchanged.
-
An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models
Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.
-
LLM-Based Multi-Agent Systems for Code Generation: A Multi-Vocal Literature Review
A review of 114 studies classifies motivations into nine categories, analyzes common models and benchmarks, synthesizes challenges into six categories with 26 subcategories and solutions, and identifies six future res...
Reference graph
Works this paper leans on
-
[1]
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language models.CoRR, abs/2303.18223, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Large language models for software engineering: A systematic literature review.ACM Trans
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large language models for software engineering: A systematic literature review.ACM Trans. Softw. Eng. Methodol., 33(8):220:1– 220:79, 2024
work page 2024
-
[3]
Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M. Zhang. Large language models for software engineering: Survey and open problems. In IEEE/ACM International Conference on Software Engineering: Future of Software Engineering, ICSE-FoSE 2023, Melbourne, Australia, May 14-20, 2023, pages 31–53. IEEE, 2023
work page 2023
-
[4]
Self-collaboration code generation via chatgpt.ACM Trans
Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. Self-collaboration code generation via chatgpt.ACM Trans. Softw. Eng. Methodol., 33(7):189:1–189:38, 2024
work page 2024
-
[5]
Burak Yetistiren, Isik ¨Ozsoy, Miray Ayerdem, and Eray T ¨uz ¨un. Evaluating the code quality of ai-assisted code generation tools: An empirical study on github copilot, amazon codewhisperer, and chatgpt.CoRR, abs/2304.10778, 2023
-
[6]
To- wards enhancing in-context learning for code generation.CoRR, abs/2303.17780, 2023
Jia Li, Yunfei Zhao, Yongmin Li, Ge Li, and Zhi Jin. To- wards enhancing in-context learning for code generation.CoRR, abs/2303.17780, 2023
-
[7]
Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, and Yiling Lou. STALL+: boosting llm-based repository-level code comple- tion with static analysis.CoRR, abs/2406.10018, 2024. SEPTEMBER 2024 48
-
[8]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Ne...
work page 2023
-
[9]
Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, Chenyuan Yang, and Lingming Zhang. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language mod- els. In Ren ´e Just and Gordon Fraser, editors,Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July...
work page 2023
-
[10]
Software testing with large language models: Survey, landscape, and vision.IEEE Trans
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. Software testing with large language models: Survey, landscape, and vision.IEEE Trans. Software Eng., 50(4):911–936, 2024
work page 2024
-
[11]
Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 919–931. IEEE, 2023
work page 2023
-
[12]
Less training, more repairing please: Revisiting automated program repair via zero- shot learning
Chunqiu Steven Xia and Lingming Zhang. Less training, more repairing please: Revisiting automated program repair via zero- shot learning. In Abhik Roychoudhury, Cristian Cadar, and Miryung Kim, editors,Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Founda- tions of Software Engineering, ESEC/FSE 2022, Singa...
work page 2022
-
[13]
A quantitative and qualitative evaluation of llm-based explainable fault localization
Sungmin Kang, Gabin An, and Shin Yoo. A quantitative and qualitative evaluation of llm-based explainable fault localization. Proc. ACM Softw. Eng., 1(FSE):1424–1446, 2024
work page 2024
-
[14]
Repair is nearly generation: Multilingual program repair with llms
Harshit Joshi, Jos ´e Pablo Cambronero S ´anchez, Sumit Gulwani, Vu Le, Gust Verbruggen, and Ivan Radicek. Repair is nearly generation: Multilingual program repair with llms. In Brian Williams, Yiling Chen, and Jennifer Neville, editors,Thirty- Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Application...
work page 2023
-
[15]
Prompting is all you need: Automated android bug replay with large language models
Sidong Feng and Chunyang Chen. Prompting is all you need: Automated android bug replay with large language models. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024, pages 67:1–67:13. ACM, 2024
work page 2024
-
[16]
Auto- mated program repair in the era of large pre-trained language models
Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. Auto- mated program repair in the era of large pre-trained language models. In45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 1482–1494. IEEE, 2023
work page 2023
-
[17]
Impact of code language models on automated program repair
Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. Impact of code language models on automated program repair. In45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 1430–1442. IEEE, 2023
work page 2023
-
[18]
Benchmarking and enhancing LLM agents in localizing linux kernel bugs.CoRR, abs/2505.19489, 2025
Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, and Yiling Lou. Benchmarking and enhancing LLM agents in localizing linux kernel bugs.CoRR, abs/2505.19489, 2025
-
[20]
Alexander Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob R. Gardner, Yiming Yang, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, and Amir Yazdan- bakhsh. Learning performance-improving code edits. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024
work page 2024
-
[21]
Ai-assisted coding: Experiments with GPT-4.CoRR, abs/2304.13187, 2023
Russell A Poldrack, Thomas Lu, and Gasper Begus. Ai-assisted coding: Experiments with GPT-4.CoRR, abs/2304.13187, 2023
-
[22]
Llm com- piler: Foundation language models for compiler optimization
Chris Cummins, Volker Seeker, Dejan Grubisic, Baptiste Roziere, Jonas Gehring, Gabriel Synnaeve, and Hugh Leather. Llm com- piler: Foundation language models for compiler optimization. In Proceedings of the 34th ACM SIGPLAN International Conference on Compiler Construction, CC ’25, page 141–153, New York, NY, USA,
-
[23]
Association for Computing Machinery
-
[24]
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment
Zhiqiang Yuan, Weitong Chen, Hanlin Wang, Kai Yu, Xin Peng, and Yiling Lou. TRANSAGENT: an llm-based multi-agent sys- tem for code translation.CoRR, abs/2409.19894, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
The rise and potential of large language model based agents: A survey.Sci
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Qi Zhang, and Tao Gui. Th...
work page 2025
-
[26]
Carlos H. C. Ribeiro. Reinforcement learning agents.Artif. Intell. Rev., 17(3):223–250, 2002
work page 2002
-
[27]
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey.J. Artif. Intell. Res., 4:237–285, 1996
work page 1996
-
[28]
Steps toward artificial intelligence.Proceedings of the IRE, 49(1):8–30, 1961
Marvin Minsky. Steps toward artificial intelligence.Proceedings of the IRE, 49(1):8–30, 1961
work page 1961
-
[29]
Charles Lee Isbell Jr., Christian R. Shelton, Michael J. Kearns, Satinder Singh, and Peter Stone. A social reinforcement learning agent. In Elisabeth Andr ´e, Sandip Sen, Claude Frasson, and J¨org P . M¨uller, editors,Proceedings of the Fifth International Con- ference on Autonomous Agents, AGENTS 2001, Montreal, Canada, May 28 - June 1, 2001, pages 377–3...
work page 2001
-
[30]
A survey on large language model based autonomous agents.Frontiers Comput
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. A survey on large language model based autonomous agents.Frontiers Comput. Sci., 18(6):186345, 2024
work page 2024
-
[31]
A survey on the memory mechanism of large language model based agents.ACM Trans
Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model based agents.ACM Trans. Inf. Syst., July 2025. Just Accepted
work page 2025
-
[32]
1968, Brussels, Scientific Affairs Division, NATO
Peter Naur and Brian Randell.Software Engineering: Report of a conference sponsored by the NATO Science Committee, Garmisch, Germany, 7-11 Oct. 1968, Brussels, Scientific Affairs Division, NATO. 1969
work page 1968
-
[33]
Dictionary of Computer Science, Engineering and Technology
Philip A Laplante, Naoufel Werghi, Christopher Lee Kuszmavl, Chris Verhof, Brian Henderson-Sellers, Joseph L Ganley, Ian Sommerville, Amos R Omondi, Ling Guan, Marco Gori, et al. Dictionary of Computer Science, Engineering and Technology. CRC Press, 2017
work page 2017
-
[34]
Barry W. Boehm. A view of 20th and 21st century software engineering. In Leon J. Osterweil, H. Dieter Rombach, and Mary Lou Soffa, editors,28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, May 20-28, 2006, pages 12–29. ACM, 2006
work page 2006
-
[35]
Chawla, Olaf Wiest, and Xiangliang Zhang
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pages 8048–8057. ij...
work page 2024
-
[36]
Exploring large language model based intelligent agents: Definitions, methods, and prospects
Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Hong, Wenhao Li, Zihao Wang, Zekai Wang, Feng Yin, Junhua Zhao, and Xiuqiang He. Exploring large language model based intelligent agents: Definitions, methods, and prospects. CoRR, abs/2401.03428, 2024
-
[37]
Augmented language models: A survey.Trans
Gr ´egoire Mialon, Roberto Dess `ı, Maria Lomeli, Christoforos Nalmpantis, Ramakanth Pasunuru, Roberta Raileanu, Bap- tiste Rozi `ere, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, and Thomas Scialom. Augmented language models: A survey.Trans. Mach. Learn. Res., 2023, 2023
work page 2023
-
[38]
Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, and Jon Whittle. Agent design pattern catalogue: A collection of architectural patterns for foundation model based agents.J. Syst. Softw., 220:112278, 2025
work page 2025
-
[39]
Saikat Barua. Exploring autonomous agents through the lens of large language models: A review.CoRR, abs/2404.04442, 2024
-
[40]
A survey on large language models for code generation
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. ACM Trans. Softw. Eng. Methodol., July 2025. Just Accepted
work page 2025
-
[41]
Junda He, Christoph Treude, and David Lo. Llm-based multi- agent systems for software engineering: Literature review, vision, SEPTEMBER 2024 49 and the road ahead.ACM Trans. Softw. Eng. Methodol., 34(5), May 2025
work page 2024
-
[42]
Zhang, Max Hort, Mark Harman, and Federica Sarro
Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, and Federica Sarro. Fairness testing: A comprehensive survey and analysis of trends.ACM Trans. Softw. Eng. Methodol., 33(5):137:1– 137:59, 2024
work page 2024
-
[43]
Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. A survey of compiler testing. ACM Comput. Surv., 53(1):4:1–4:36, 2021
work page 2021
-
[44]
Find- ing trends in software research.IEEE Trans
George Mathew, Amritanshu Agrawal, and Tim Menzies. Find- ing trends in software research.IEEE Trans. Software Eng., 49(4):1397–1410, 2023
work page 2023
-
[45]
Empirical research in software engineering - A literature survey.J
Li Zhang, Jia-Hao Tian, Jing Jiang, Yi-Jun Liu, Meng-Yuan Pu, and Tao Yue. Empirical research in software engineering - A literature survey.J. Comput. Sci. Technol., 33(5):876–899, 2018
work page 2018
-
[46]
Zhang, Mark Harman, Lei Ma, and Yang Liu
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. Machine learning testing: Survey, landscapes and horizons.IEEE Trans. Software Eng., 48(2):1–36, 2022
work page 2022
- [47]
-
[48]
https://blog.dblp.org/2024/01/01/ 7-million-publications/, 2024
7 million publications. https://blog.dblp.org/2024/01/01/ 7-million-publications/, 2024
work page 2024
- [49]
-
[50]
Opinion mining for software development: A systematic literature review.ACM Trans
Bin Lin, Nathan Cassee, Alexander Serebrenik, Gabriele Bavota, Nicole Novielli, and Michele Lanza. Opinion mining for software development: A systematic literature review.ACM Trans. Softw. Eng. Methodol., 31(3):38:1–38:41, 2022
work page 2022
-
[51]
RWTH, Fachgruppe Informatik Aachen, 1996
Klaus Pohl.Requirements Engineering: An Overview. RWTH, Fachgruppe Informatik Aachen, 1996
work page 1996
-
[52]
Bashar Nuseibeh and Steve M. Easterbrook. Requirements engi- neering: A roadmap. In Anthony Finkelstein, editor,22nd Inter- national Conference on on Software Engineering, Future of Software Engineering Track, ICSE 2000, Limerick Ireland, June 4-11, 2000, pages 35–46. ACM, 2000
work page 2000
-
[53]
Requirements engineering: A survey.Communications on Applied Electronics, 3(5):28–31, 2015
Vivek Shukla, Dhirendra Pandey, and Raj Shree. Requirements engineering: A survey.Communications on Applied Electronics, 3(5):28–31, 2015
work page 2015
-
[54]
The unified modeling language.Unix Review, 14(13):5, 1996
Grady Booch, Ivar Jacobson, James Rumbaugh, et al. The unified modeling language.Unix Review, 14(13):5, 1996
work page 1996
-
[55]
Michael Johnson, Robert Rosebrugh, and RJ Wood. Entity- relationship-attribute designs and sketches.Theory and Applica- tions of Categories, 10(3):94–112, 2002
work page 2002
-
[56]
Alejandro Rago, Claudia A. Marcos, and J. Andr ´es D ´ıaz Pace. Uncovering quality-attribute concerns in use case specifications via early aspect mining.Requir. Eng., 18(1):67–84, 2013
work page 2013
-
[57]
Farhana Nazir, Wasi Haider Butt, Muhammad Waseem Anwar, and Muazzam Ali Khan Khattak. The applications of natural language processing (NLP) for software requirement engineering - A systematic literature review. In Kuinam Kim and Nikolai Joukov, editors,Information Science and Applications 2017 - ICISA 2017, Macau, China, 20-23 March 2017, volume 424 ofLec...
work page 2017
-
[58]
Automatically classifying user requests in crowdsourcing requirements engineering.J
Chuanyi Li, Liguo Huang, Jidong Ge, Bin Luo, and Vincent Ng. Automatically classifying user requests in crowdsourcing requirements engineering.J. Syst. Softw., 138:108–123, 2018
work page 2018
-
[59]
Advances in au- tomated support for requirements engineering: A systematic literature review.Requir
Muhammad Aminu Umar and Kevin Lano. Advances in au- tomated support for requirements engineering: A systematic literature review.Requir. Eng., 29(2):177–207, 2024
work page 2024
-
[60]
PRCBERT: prompt learning for requirement classification using bert-based pretrained language models
Xianchang Luo, Yinxing Xue, Zhenchang Xing, and Jiamou Sun. PRCBERT: prompt learning for requirement classification using bert-based pretrained language models. In37th IEEE/ACM Inter- national Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022, pages 75:1–75:13. ACM, 2022
work page 2022
-
[61]
Using llms in software requirements specifications: An empirical evaluation
Madhava Krishna, Bhagesh Gaur, Arsh Verma, and Pankaj Jalote. Using llms in software requirements specifications: An empirical evaluation. In Grischa Liebel, Irit Hadar, and Paola Spoletini, ed- itors,32nd IEEE International Requirements Engineering Conference, RE 2024, Reykjavik, Iceland, June 24-28, 2024, pages 475–483. IEEE, 2024
work page 2024
-
[62]
Empirical evaluation of chatgpt on requirements information retrieval under zero-shot setting
Jianzhang Zhang, Yiyang Chen, Chuang Liu, Nan Niu, and Yinglin Wang. Empirical evaluation of chatgpt on requirements information retrieval under zero-shot setting. In2023 Inter- national Conference on Intelligent Computing and Next Generation Networks (ICNGN), pages 1–6. IEEE, 2023
work page 2023
-
[63]
Krishna Ronanki, Beatriz Cabrero Daniel, and Christian Berger. Chatgpt as a tool for user story quality evaluation: Trustworthy out of the box? In Philippe Kruchten and Peggy Gregory, editors, Agile Processes in Software Engineering and Extreme Programming - Workshops - XP 2022 Workshops, Copenhagen, Denmark, June 13-17, 2022, and XP 2023 Workshops, Amste...
work page 2022
-
[64]
Improving requirements completeness: Automated assistance through large language models.Requir
Dipeeka Luitel, Shabnam Hassani, and Mehrdad Sabetzadeh. Improving requirements completeness: Automated assistance through large language models.Requir. Eng., 29(1):73–95, 2024
work page 2024
-
[65]
Mohammadmehdi Ataei, Hyunmin Cheong, Daniele Grandi, Ye Wang, Nigel Morris, and Alexander Tessier. Elicitron: A large language model agent-based simulation framework for design requirements elicitation.Journal of Computing and Information Science in Engineering, 25(2):021012, 01 2025
work page 2025
-
[66]
Specgen: Automated generation of formal program specifications via large language models
Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. Specgen: Automated generation of formal program specifications via large language models. In47th IEEE/ACM International Conference on Software Engineering, ICSE 2025, Ottawa, ON, Canada, April 26 - May 6, 2025, pages 16–28. IEEE, 2025
work page 2025
-
[67]
Springer Nature Switzerland, Cham, 2024
Chetan Arora, John Grundy, and Mohamed Abdelrazek.Advanc- ing Requirements Engineering Through Generative AI: Assessing the Role of LLMs, pages 129–148. Springer Nature Switzerland, Cham, 2024
work page 2024
-
[68]
Mare: Multi-agents col- laboration framework for requirements engineering,
Dongming Jin, Zhi Jin, Xiaohong Chen, and Chunhui Wang. MARE: multi-agents collaboration framework for requirements engineering.CoRR, abs/2405.03256, 2024
-
[69]
David R. Cok. Openjml: JML for java 7 by extending openjdk. In Mihaela Gheorghiu Bobaru, Klaus Havelund, Gerard J. Holz- mann, and Rajeev Joshi, editors,NASA Formal Methods - Third International Symposium, NFM 2011, Pasadena, CA, USA, April 18- 20, 2011. Proceedings, volume 6617 ofLecture Notes in Computer Science, pages 472–479. Springer, 2011
work page 2011
-
[70]
Cormac Flanagan and K. Rustan M. Leino. Houdini, an anno- tation assistant for esc/java. In Jos ´e Nuno Oliveira and Pamela Zave, editors,FME 2001: Formal Methods for Increasing Software Productivity, International Symposium of Formal Methods Europe, Berlin, Germany, March 12-16, 2001, Proceedings, volume 2021 of Lecture Notes in Computer Science, pages 5...
work page 2001
-
[71]
Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCa- mant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. The daikon system for dynamic detection of likely invariants.Sci. Comput. Program., 69(1-3):35–45, 2007
work page 2007
-
[72]
Yaoqi Guo, Zhenpeng Chen, Jie M. Zhang, Yang Liu, and Yun Ma. Personality-guided code generation using large language models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1068– 1080, Vienna, Austria, July 2025. Association for Computational Linguistics
work page 2025
-
[73]
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. Siren’s song in the AI ocean: A survey on hallucination in large language models.CoRR, abs/2309.01219, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[74]
Reflexion: Language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing System...
work page 2023
-
[75]
Fully autonomous programming with large language models
Vadim Liventsev, Anastasiia Grishina, Aki H ¨arm¨a, and Leon Moonen. Fully autonomous programming with large language models. In Sara Silva and Lu ´ıs Paquete, editors,Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, pages 1146–1155. ACM, 2023
work page 2023
-
[76]
Olausson, Jeevana Priya Inala, Chenglong Wang, Jian- feng Gao, and Armando Solar-Lezama
Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang, Jian- feng Gao, and Armando Solar-Lezama. Is self-repair a silver bullet for code generation? InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11,
work page 2024
-
[78]
Autogen: Enabling next-gen LLM applications via multi- agent conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen LLM applications via multi- agent conversations. InFirst Conference on Language Modeling, 2024
work page 2024
-
[79]
Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, and Ge Yu. INTERVENOR: prompting the coding ability of large language models with the interactive chain of repair. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, SEPTEMBER 2024 50 editors,Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and vi...
work page 2024
-
[80]
Test-driven development and llm-based code generation
Noble Saji Mathews and Meiyappan Nagappan. Test-driven development and llm-based code generation. In Vladimir Filkov, Baishakhi Ray, and Minghui Zhou, editors,Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engi- neering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024, pages 1583–1594. ACM, 2024
work page 2024
-
[81]
Autocoder: Enhancing code large language model with aiev-instruct.CoRR, abs/2405.14906, 2024
Bin Lei, Yuchen Li, and Qiuwu Chen. Autocoder: Enhancing code large language model with aiev-instruct.CoRR, abs/2405.14906, 2024
-
[82]
Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, and Shafiq Joty. Codechain: Towards modular code generation through chain of self-revisions with representative sub-modules. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.