pith. sign in

arxiv: 2505.19625 · v3 · submitted 2025-05-26 · 💻 cs.SE · cs.AI

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap

Pith reviewed 2026-05-19 14:43 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords search-based software engineeringfoundation modelsresearch roadmapsoftware engineering lifecycleAI integrationlarge language models
0
0 comments X p. Extension

The pith

The paper presents a research roadmap for advancing search-based software engineering through its synergy with foundation models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tries to establish a structured overview of how search-based software engineering, which uses metaheuristic search to solve software problems across the lifecycle, interacts with foundation models such as large language models. It cares about this because these models are rapidly changing AI capabilities and guiding their combination could improve software development practices in multiple domains. The roadmap covers using models to boost search methods, using search to improve models, and their direct integration. A sympathetic reader would care because it highlights open challenges and future directions in an emerging area at a moment when the relationship between the two remains open.

Core claim

The authors claim that analyzing the current landscape reveals opportunities for foundation models to enhance search-based software engineering, for search-based methods to advance foundation models, and for integrated approaches, leading to a forward-looking perspective on their combined future in emerging domains.

What carries the argument

The research roadmap, which organizes the discussion around three core aspects of SBSE and FM interaction to identify challenges and research directions.

If this is right

  • Search techniques in software engineering could be enhanced by incorporating foundation models for better solution generation and optimization.
  • Foundation models could benefit from search-based optimization in areas like model fine-tuning or architecture search.
  • Integrated systems may emerge that apply both to complex software engineering problems in new domains.
  • Future research can focus on specific challenges like scalability and domain adaptation in their synergy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such a roadmap might encourage more interdisciplinary work between AI and software engineering communities.
  • Practical tools could develop that use foundation models to automate search-based testing or repair tasks.
  • This approach could extend to other AI techniques beyond foundation models in the long term.

Load-bearing premise

The assumption that the relationship between search-based software engineering and foundation models is still evolving and can be guided by a timely roadmap.

What would settle it

Future publications or implementations that either successfully follow the outlined directions to achieve new results or show that the identified challenges are not the main barriers would test the roadmap's value.

Figures

Figures reproduced from arXiv: 2505.19625 by Andrea Arcuri, Hassan Sartaj, Paolo Arcaini, Shaukat Ali.

Figure 1
Figure 1. Figure 1: Roadmap overview showing the discussion flow: strengths and weaknesses (Section 3), SBSE–FM synergies [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Key aspects of the potential synergy between SBSE and FMs. The abbreviations used are: FMs (Foundation Models), SBSE (Search-Based Software Engineering), and SE (Software Engineering). transformer-based architectures, such as GPT, BERT, and CLIP [21]. These models are trained on vast amounts of data and are capable of performing a wide range of analytical tasks. For specific application domains, these mode… view at source ↗
Figure 3
Figure 3. Figure 3: Key aspects of employing FMs to enhance SBSE. The abbreviations used are: FMs (Foundation Models), LLMs (Large Language Models), VLMs (Vision-Language Models), MMs (Multimodal Models), SBSE (Search-Based Software Engineering), and SE (Software Engineering). actions (FM-S4). This makes VLAs particularly valuable in dynamic and interactive environments, such as autonomous vehicles or robotics, where they ena… view at source ↗
Figure 4
Figure 4. Figure 4: Key aspects of applying SBSE to enhance FMs. The abbreviations used are: FMs (Foundation Models), SBSE (Search-Based Software Engineering), and SE (Software Engineering). applied in prompt engineering [197]. Recent efforts have targeted various aspects of prompt engineering, including optimizing textual prompts using search algorithms such as genetic algorithms and differential evolution to improve prompt … view at source ↗
Figure 5
Figure 5. Figure 5: Integration between FMs and SBSE (two-way interactions). The abbreviations used are: FMs (Foundation Models), SBSE (Search-Based Software Engineering), SE (Software Engineering), SDLC (Software Development Lifecycle), and ADS (Autonomous Driving Systems). deeper insights into how these combined approaches can systematically improve software quality and performance by optimizing software configurations. Sea… view at source ↗
Figure 6
Figure 6. Figure 6: An overview of the tetrad illustrating the disruptive effects of FMs on SBSE. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

Search-based software engineering (SBSE), which integrates metaheuristic search techniques with software engineering, has been an active area of research for about 25 years. It has been applied to solve numerous problems across the entire software engineering lifecycle and has demonstrated its versatility in multiple domains. With recent advances in Artificial Intelligence (AI), particularly the emergence of foundation models (FMs) such as large language models (LLMs), the evolution of SBSE alongside these models remains undetermined. In this window of opportunity, we present a research roadmap that articulates the current landscape of SBSE in relation to FMs, identifies open challenges, and outlines potential research directions to advance SBSE through its synergy with FMs. Specifically, we analyze three core aspects: utilizing FMs to enhance SBSE, applying SBSE to advance FMs, and exploring the integration of SBSE and FMs. Furthermore, we present a forward-thinking perspective that envisions the future of SBSE in the era of FMs, highlighting promising research opportunities to address challenges in emerging domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a research roadmap on Search-Based Software Engineering (SBSE) in relation to AI Foundation Models (FMs). It articulates the current landscape by analyzing three core aspects—FMs enhancing SBSE, SBSE advancing FMs, and their integration—while identifying open challenges and outlining potential research directions to advance SBSE through synergy with FMs, along with a forward-thinking perspective on the future of SBSE in the era of FMs.

Significance. If the literature synthesis is balanced and comprehensive, the roadmap could usefully guide researchers toward productive integrations between metaheuristic search techniques and large-scale AI models, a timely topic given rapid FM progress. The contribution lies in its structured framing of bidirectional opportunities rather than in new empirical results or proofs.

major comments (2)
  1. The central claim that the paper articulates a 'current landscape' and 'research roadmap' rests on the analysis of the three core aspects; however, the manuscript does not describe the literature search strategy, inclusion criteria, or number of papers reviewed per aspect, which makes it difficult to evaluate whether the synthesis is representative or systematically derived.
  2. In the forward-thinking perspective, the outlined research opportunities in emerging domains are presented at a high level; to be load-bearing for the roadmap, they should be explicitly mapped back to the open challenges identified in the three-aspect analysis so readers can see how the proposed directions address specific gaps.
minor comments (2)
  1. Consider adding a summary table that cross-references the identified challenges with the proposed research directions to improve readability and traceability.
  2. Ensure citations in the landscape sections include the most recent 2024–2025 publications on foundation models applied to software engineering tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of our manuscript as a timely research roadmap. We address each major comment below and will incorporate revisions to strengthen the paper.

read point-by-point responses
  1. Referee: The central claim that the paper articulates a 'current landscape' and 'research roadmap' rests on the analysis of the three core aspects; however, the manuscript does not describe the literature search strategy, inclusion criteria, or number of papers reviewed per aspect, which makes it difficult to evaluate whether the synthesis is representative or systematically derived.

    Authors: We appreciate this point on transparency. Our roadmap synthesizes insights from prominent and representative publications in SBSE and foundation models, selected based on relevance to the three core aspects and the authors' domain expertise, rather than a formal systematic literature review with explicit search strings and inclusion/exclusion criteria. To address the comment, we will add a dedicated subsection (likely in the introduction or a new 'Approach' section) describing our literature selection rationale, key sources and venues considered, and how papers were chosen for each aspect. This will clarify the scope without altering the roadmap nature of the work. revision: yes

  2. Referee: In the forward-thinking perspective, the outlined research opportunities in emerging domains are presented at a high level; to be load-bearing for the roadmap, they should be explicitly mapped back to the open challenges identified in the three-aspect analysis so readers can see how the proposed directions address specific gaps.

    Authors: We agree that explicit linkages would improve the coherence and utility of the roadmap. In the revision, we will enhance the forward-thinking perspective section by adding direct mappings: for each proposed research opportunity, we will include inline references or a summary table connecting it to the specific open challenges previously identified in the FMs-enhancing-SBSE, SBSE-advancing-FMs, and integration analyses. This will make clear how the future directions target the gaps. revision: yes

Circularity Check

0 steps flagged

No significant circularity: roadmap synthesis without derivations or reductions

full rationale

The paper is a forward-looking survey and research roadmap that articulates the SBSE-FM landscape, identifies challenges, and outlines directions across three aspects (FMs enhancing SBSE, SBSE advancing FMs, and their integration). No equations, derivations, fitted parameters, or technical predictions exist in the provided abstract and structure. Central claims rest on literature synthesis and author framing rather than any self-referential reduction, self-citation load-bearing premise, or ansatz smuggled via prior work. The content is self-contained as a synthesis; no load-bearing step reduces by construction to inputs. This matches the default expectation for non-circular survey/roadmap papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on domain background facts about SBSE history and the emergence of foundation models; no free parameters, mathematical axioms, or invented entities are introduced.

axioms (1)
  • domain assumption SBSE has been an active area of research for about 25 years and applied across the software engineering lifecycle.
    Stated directly in the abstract as established context for the roadmap.

pith-pipeline@v0.9.0 · 5717 in / 1059 out tokens · 55177 ms · 2026-05-19T14:43:07.137149+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

213 extracted references · 213 canonical work pages · 3 internal anchors

  1. [2]

    A systematic review of the application and empirical investigation of search-based test case generation.IEEE Transactions on Software Engineering, 36(6):742–762, 2009

    Shaukat Ali, Lionel C Briand, Hadi Hemmati, and Rajwinder Kaur Panesar-Walawege. A systematic review of the application and empirical investigation of search-based test case generation.IEEE Transactions on Software Engineering, 36(6):742–762, 2009. doi: 10.1109/TSE.2009.52

  2. [3]

    Generating test data from ocl constraints with search techniques.IEEE Transactions on Software Engineering, 39(10):1376–1402, 2013

    Shaukat Ali, Muhammad Zohaib Iqbal, Andrea Arcuri, and Lionel C Briand. Generating test data from ocl constraints with search techniques.IEEE Transactions on Software Engineering, 39(10):1376–1402, 2013. doi: 10.1109/TSE.2013.17

  3. [4]

    Learning how to search: generating effective test cases through adaptive fitness function selection.Empirical Software Engineering, 27(2):38, 2022

    Hussein Almulla and Gregory Gay. Learning how to search: generating effective test cases through adaptive fitness function selection.Empirical Software Engineering, 27(2):38, 2022. doi: 10.1007/s10664-021-10048-8

  4. [5]

    Deploying search based software engineering with Sapienz at Facebook

    Nadia Alshahwan, Xinbo Gao, Mark Harman, Yue Jia, Ke Mao, Alexander Mols, Taijin Tei, and Ilya Zorin. Deploying search based software engineering with Sapienz at Facebook. InInternational Symposium on Search Based Software Engineering, pages 3–45. Springer, 2018. doi: 10.1007/978-3-319-99241-9 1

  5. [6]

    Targeting patterns of driving characteristics in testing autonomous driving systems

    Paolo Arcaini, Xiao-Yi Zhang, and Fuyuki Ishikawa. Targeting patterns of driving characteristics in testing autonomous driving systems. In2021 IEEE 14th International Conference on Software Testing, Validation and Verification (ICST), pages 295–305, 2021. doi: 10.1109/ICST49551.2021.00042

  6. [7]

    On the automation of fixing software bugs

    Andrea Arcuri. On the automation of fixing software bugs. InCompanion of the 30th International Conference on Software Engineering, ICSE Companion ’08, page 1003–1006, New York, NY , USA, 2008. Association for Computing Machinery. ISBN 9781605580791. doi: 10.1145/1370175.1370223. URL https://doi.org/10. 1145/1370175.1370223

  7. [8]

    Test suite generation with the many independent objective (MIO) algorithm.Information and Software Technology, 104:195–206, 2018

    Andrea Arcuri. Test suite generation with the many independent objective (MIO) algorithm.Information and Software Technology, 104:195–206, 2018. doi: 10.1016/j.infsof.2018.05.003

  8. [9]

    RESTful API automated test case generation with EvoMaster.ACM Transactions on Software Engineering and Methodology (TOSEM), 28(1):1–37, 2019

    Andrea Arcuri. RESTful API automated test case generation with EvoMaster.ACM Transactions on Software Engineering and Methodology (TOSEM), 28(1):1–37, 2019. doi: 10.1145/3293455

  9. [10]

    A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering.Software Testing, Verification and Reliability, 24(3):219–250, 2014

    Andrea Arcuri and Lionel Briand. A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering.Software Testing, Verification and Reliability, 24(3):219–250, 2014. doi: 10.1002/stvr.1486

  10. [11]

    Parameter tuning or default values? an empirical investigation in search-based software engineering.Empirical Software Engineering, 18(3):594–623, 2013

    Andrea Arcuri and Gordon Fraser. Parameter tuning or default values? an empirical investigation in search-based software engineering.Empirical Software Engineering, 18(3):594–623, 2013. doi: 10.1007/s10664-013-9249-9

  11. [12]

    Theoretical runtime analyses of search algorithms on the test data generation for the triangle classification problem

    Andrea Arcuri, Per Kristian Lehre, and Xin Yao. Theoretical runtime analyses of search algorithms on the test data generation for the triangle classification problem. In2008 IEEE International Conference on Software Testing Verification and Validation Workshop, pages 161–169. IEEE, 2008. doi: 10.1109/ICSTW.2008.48

  12. [13]

    Widening The Adoption of Web API Fuzzing: Docker, GitHub Action and Python Support for EvoMaster

    Andrea Arcuri, Philip Garrett, Juan Pablo Galeotti, and Man Zhang. Widening The Adoption of Web API Fuzzing: Docker, GitHub Action and Python Support for EvoMaster. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, FSE Companion ’25, pages 1084–1088, New York, NY , USA, 2025. Association for Computing Machi...

  13. [14]

    Introducing black-box fuzz testing for rest apis in industry: Challenges and solutions

    Andrea Arcuri, Alexander Poth, and Olsi Rrjolli. Introducing black-box fuzz testing for rest apis in industry: Challenges and solutions. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 382–393. IEEE, 2025. doi: 10.1109/ICST62969.2025.10988923

  14. [15]

    RESTler: Stateful REST API Fuzzing

    Vaggelis Atlidakis, Patrice Godefroid, and Marina Polishchuk. RESTler: Stateful REST API Fuzzing. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 748–758. IEEE, 2019. doi: 10.1109/ICSE.2019.00083

  15. [16]

    Search-based DNN testing and retraining with GAN-enhanced simulations.IEEE Transactions on Software Engineering, 51(4):1086–1103, 2025

    Mohammed Oualid Attaoui, Fabrizio Pastore, and Lionel C Briand. Search-based DNN testing and retraining with GAN-enhanced simulations.IEEE Transactions on Software Engineering, 51(4):1086–1103, 2025. doi: 10.1109/TSE.2025.3540549. 23 Sartaj et al

  16. [17]

    StableYolo: Optimizing Image Generation for Large Language Models

    Harel Berger, Aidan Dakhama, Zishuo Ding, Karine Even-Mendoza, David Kelly, Hector Menendez, Rebecca Moussa, and Federica Sarro. StableYolo: Optimizing Image Generation for Large Language Models. In International Symposium on Search Based Software Engineering, pages 133–139. Springer, 2023. doi: 10.1007/ 978-3-031-48796-5 10

  17. [19]

    LLM fault localisation within evolutionary computation based automated program repair

    Sardar Bin Murtaza, Aidan Mccoy, Zhiyuan Ren, Aidan Murphy, and Wolfgang Banzhaf. LLM fault localisation within evolutionary computation based automated program repair. InProceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’24 Companion, pages 1824–1829, New York, NY , USA,

  18. [20]

    ISBN 9798400704956

    Association for Computing Machinery. ISBN 9798400704956. doi: 10.1145/3638530.3664174. URL https://doi.org/10.1145/3638530.3664174

  19. [21]

    Pymoo: Multi-objective optimization in Python.IEEE Access, 8:89497–89509,

    Julian Blank and Kalyanmoy Deb. Pymoo: Multi-objective optimization in Python.IEEE Access, 8:89497–89509,

  20. [22]

    doi: 10.1109/ACCESS.2020.2990567

  21. [23]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie S. Chen, Kathleen Creel, Jared Quincy Davis, Dorottya Demszky, Chris Donahue, Moussa Doumbouya, Esin...

  22. [24]

    LLM-assisted crossover in genetic improve- ment of software

    Dimitrios Stamatios Bouras, Sergey Mechtaev, and Justyna Petke. LLM-assisted crossover in genetic improve- ment of software. In2025 IEEE/ACM International Workshop on Genetic Improvement (GI), pages 19–26. IEEE,

  23. [25]

    doi: 10.1109/GI66624.2025.00012

  24. [26]

    A survey on search-based model-driven engineering

    Ilhem Boussa¨ıd, Patrick Siarry, and Mohamed Ahmed-Nacer. A survey on search-based model-driven engineering. Automated Software Engineering, 24(2):233–294, 2017. doi: 10.1007/s10515-017-0215-4

  25. [27]

    Large language model based mutations in genetic improvement.Automated Software Engineering, 32(1):15, 2025

    Alexander EI Brownlee, James Callan, Karine Even-Mendoza, Alina Geiger, Carol Hanna, Justyna Petke, Federica Sarro, and Dominik Sobania. Large language model based mutations in genetic improvement.Automated Software Engineering, 32(1):15, 2025. doi: 10.1007/s10515-024-00473-6

  26. [28]

    Web application tests with Selenium.IEEE Software, 26(5):88–91, 2009

    Andreas Bruns, Andreas Kornstadt, and Dennis Wichmann. Web application tests with Selenium.IEEE Software, 26(5):88–91, 2009. doi: 10.1109/MS.2009.144

  27. [29]

    Automatic generation of atomic multiplicity-preserving search operators for search-based model engineering.Software and Systems Modeling, 20(6):1857–1887, 2021

    Alexandru Burdusel, Steffen Zschaler, and Stefan John. Automatic generation of atomic multiplicity-preserving search operators for search-based model engineering.Software and Systems Modeling, 20(6):1857–1887, 2021. doi: 10.1007/s10270-021-00914-w

  28. [30]

    Generating avoidable collision scenarios for testing autonomous driving systems

    Alessandro Cal`o, Paolo Arcaini, Shaukat Ali, Florian Hauer, and Fuyuki Ishikawa. Generating avoidable collision scenarios for testing autonomous driving systems. In2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), pages 375–386, 2020. doi: 10.1109/ICST46399.2020.00045

  29. [31]

    Simultaneously searching and solving multiple avoidable collisions for testing autonomous driving systems

    Alessandro Cal`o, Paolo Arcaini, Shaukat Ali, Florian Hauer, and Fuyuki Ishikawa. Simultaneously searching and solving multiple avoidable collisions for testing autonomous driving systems. InProceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pages 1055–1063, New York, NY , USA,

  30. [32]

    Proceedings of the 2020 Genetic and Evolutionary Computation Conference , publisher =

    Association for Computing Machinery. ISBN 9781450371285. doi: 10.1145/3377930.3389827. URL https://doi.org/10.1145/3377930.3389827

  31. [33]

    Continuous test generation: Enhancing continuous integration with automated test generation

    Jos´e Campos, Andrea Arcuri, Gordon Fraser, and Rui Abreu. Continuous test generation: Enhancing continuous integration with automated test generation. InProceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pages 55–66, New York, NY , USA, 2014. Association for Computing Machinery. doi: 10.1145/2642937.2643...

  32. [34]

    Rodrigo Casamayor, Carlos Cetina, Oscar Pastor, and Francisca P´erez. Studying the influence and distribution of the human effort in a hybrid fitness function for search-based model-driven engineering.IEEE Transactions on Software Engineering, 49(12):5189–5202, 2023. doi: 10.1109/TSE.2023.3329730

  33. [35]

    Hallucination detection in foundation models for decision-making: A flexible definition and review of the state of the art.ACM Computing Surveys, 57 (7):1–35, 2025

    Neeloy Chakraborty, Melkior Ornik, and Katherine Driggs-Campbell. Hallucination detection in foundation models for decision-making: A flexible definition and review of the state of the art.ACM Computing Surveys, 57 (7):1–35, 2025. doi: 10.1145/3716846. 24 Sartaj et al

  34. [36]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond´e de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavaria...

  35. [38]

    Iterative refactoring of real-world open-source programs with large language models

    Jinsu Choi, Gabin An, and Shin Yoo. Iterative refactoring of real-world open-source programs with large language models. InInternational Symposium on Search Based Software Engineering, pages 49–55. Springer,

  36. [40]

    The symposium on search-based software engineering: Past, present and future.Information and Software Technology, 127:106372, 2020

    Thelma Elita Colanzi, Wesley KG Assun c ¸˜ao, Silvia R Vergilio, Paulo Roberto Farah, and Giovani Guizzo. The symposium on search-based software engineering: Past, present and future.Information and Software Technology, 127:106372, 2020. doi: 10.1016/j.infsof.2020.106372

  37. [41]

    Replication and comparison of computational experiments in applied evolutionary computing: common pitfalls and guidelines to avoid them.Applied Soft Computing, 19: 161–170, 2014

    Matej ˇCrepinˇsek, Shih-Hsi Liu, and Marjan Mernik. Replication and comparison of computational experiments in applied evolutionary computing: common pitfalls and guidelines to avoid them.Applied Soft Computing, 19: 161–170, 2014. doi: 10.1016/j.asoc.2014.02.009

  38. [42]

    Malin, and Sricharan Kumar

    Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley A. Malin, and Sricharan Kumar. SEE: Strategic exploration and exploitation for cohesive in-context prompt optimization. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Comp...

  39. [44]

    Enhancing search-based testing with llms for finding bugs in system simulators.Automated Software Engineering, 32(2): 1–45, 2025

    Aidan Dakhama, Karine Even-Mendoza, William B Langdon, H´ector D Men´endez, and Justyna Petke. Enhancing search-based testing with llms for finding bugs in system simulators.Automated Software Engineering, 32(2): 1–45, 2025. doi: 10.1007/s10515-025-00531-7

  40. [46]

    An adaptive re-evaluation method for evolution strategy under additive noise

    Catalin-Viorel Dinu, Yash J Patel, Xavier Bonet-Monroig, and Hao Wang. An adaptive re-evaluation method for evolution strategy under additive noise. InProceedings of the Genetic and Evolutionary Computation Conference, pages 710–718, New York, NY , USA, 2025. Association for Computing Machinery. doi: 10.1145/3712256. 3726352. URLhttps://doi.org/10.1145/37...

  41. [47]

    Ant colony optimization: a new meta-heuristic

    Marco Dorigo and Gianni Di Caro. Ant colony optimization: a new meta-heuristic. InProceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. No. 99TH8406), volume 2, pages 1470–1477. IEEE, 1999. doi: 10.1109/CEC.1999.782657

  42. [48]

    What to blame? on the granularity of fault localization for deep neural networks

    Matias Duran, Xiao-Yi Zhang, Paolo Arcaini, and Fuyuki Ishikawa. What to blame? on the granularity of fault localization for deep neural networks. In2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), pages 264–275, 2021. doi: 10.1109/ISSRE52982.2021.00037

  43. [49]

    Durillo and Antonio J

    Juan J. Durillo and Antonio J. Nebro. jMetal: A java framework for multi-objective optimization.Adv. Eng. Softw., 42(10):760–771, October 2011. ISSN 0965-9978. doi: 10.1016/j.advengsoft.2011.05.014. URL https://doi.org/10.1016/j.advengsoft.2011.05.014

  44. [50]

    DeepFault: Fault localization for deep neural networks

    Hasan Ferit Eniser, Simos Gerasimou, and Alper Sen. DeepFault: Fault localization for deep neural networks. In Reiner H¨ahnle and Wil van der Aalst, editors,Fundamental Approaches to Software Engineering, pages 171–191, Cham, 2019. Springer International Publishing. ISBN 978-3-030-16722-6. doi: 10.1007/978-3-030-16722-6 10. 25 Sartaj et al

  45. [52]

    Large Language Models for Software Engineering: Survey and Open Problems

    Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M Zhang. Large Language Models for Software Engineering: Survey and Open Problems. In2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), pages 31–53. IEEE, 2023. doi: 10.1109/ICSE-FoSE59343.2023.00008

  46. [53]

    Putting the smarts into robot bodies.Communications of the ACM, 68(3):6–8,

    Wang Fan and Shaoshan Liu. Putting the smarts into robot bodies.Communications of the ACM, 68(3):6–8,

  47. [54]

    URLhttps://doi.org/10.1145/3703761

    doi: 10.1145/3703761. URLhttps://doi.org/10.1145/3703761

  48. [55]

    Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research, 44(5):701–739, 2025

    Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, Brian Ichter, Danny Driess, Jiajun Wu, Cewu Lu, and Mac Schwager. Foundation models in robotics: Applications, challenges, and the future.The International Journal of Robotics Research, 44(5):701–739, 2025. doi: 10....

  49. [56]

    Search-based software testing driven by automatically generated and manually defined fitness functions.ACM Transactions on Software Engineering and Methodology, 33(2):1–37, 2023

    Federico Formica, Tony Fan, and Claudio Menghi. Search-based software testing driven by automatically generated and manually defined fitness functions.ACM Transactions on Software Engineering and Methodology, 33(2):1–37, 2023. doi: 10.1145/3624745

  50. [57]

    EvoSuite: automatic test suite generation for object-oriented software

    Gordon Fraser and Andrea Arcuri. EvoSuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, pages 416–419, New York, NY , USA, 2011. Association for Computing Machinery. ISBN 9781450304436. doi: 10.1145/20251...

  51. [58]

    Whole test suite generation.IEEE Trans

    Gordon Fraser and Andrea Arcuri. Whole test suite generation.IEEE Trans. Softw. Eng., 39(2):276–291, February

  52. [59]

    doi: 10.1109/TSE.2012.14

    ISSN 0098-5589. doi: 10.1109/TSE.2012.14. URLhttps://doi.org/10.1109/TSE.2012.14

  53. [60]

    A large-scale evaluation of automated unit test generation using EvoSuite

    Gordon Fraser and Andrea Arcuri. A large-scale evaluation of automated unit test generation using EvoSuite. ACM Trans. Softw. Eng. Methodol., 24(2), December 2014. ISSN 1049-331X. doi: 10.1145/2685612. URL https://doi.org/10.1145/2685612

  54. [61]

    A Retrospective on Whole Test Suite Generation: On the Role of SBST in the Age of LLMs.IEEE Transactions on Software Engineering, pages 1–5, 2025

    Gordon Fraser and Andrea Arcuri. A Retrospective on Whole Test Suite Generation: On the Role of SBST in the Age of LLMs.IEEE Transactions on Software Engineering, pages 1–5, 2025. doi: 10.1109/TSE.2025.3539458

  55. [62]

    Large language model-based suggestion of objective functions for search-based product line architecture design

    Willian M Freire, Murilo Boccardo, Daniel Nouchi, Aline MMM Amaral, Silvia R Vergilio, Thiago Ferreira, and Thelma E Colanzi. Large language model-based suggestion of objective functions for search-based product line architecture design. InSimp ´osio Brasileiro de Componentes, Arquiteturas e Reutilizac ¸˜ao de Software (SBCARS), pages 21–30. SBC, 2024. do...

  56. [63]

    Automatically testing self-driving cars with search-based procedural content generation

    Alessio Gambi, Marc Mueller, and Gordon Fraser. Automatically testing self-driving cars with search-based procedural content generation. InProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, pages 318–328, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450362245. doi: 10.1145/3...

  57. [64]

    Vulnerability detection with code language models: How far are we? In47th IEEE/ACM International Conference on Software Engineering, pages 1729–1741, 2025

    Shuzheng Gao, Cuiyun Gao, Wenchao Gu, and Michael Lyu. Search-Based LLMs for Code Optimization. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 254–266, Los Alamitos, CA, USA, 2024. IEEE Computer Society. doi: 10.1109/ICSE55347.2025.00021

  58. [66]

    Differential regression testing for REST APIs

    Patrice Godefroid, Daniel Lehmann, and Marina Polishchuk. Differential regression testing for REST APIs. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2020, pages 312–323, New York, NY , USA, 2020. Association for Computing Machinery. doi: 10.1145/3395363. 3397374

  59. [67]

    Testing RESTful APIs: A survey.ACM Trans

    Amid Golmohammadi, Man Zhang, and Andrea Arcuri. Testing RESTful APIs: A survey.ACM Trans. Softw. Eng. Methodol., 33(1), November 2023. ISSN 1049-331X. doi: 10.1145/3617175. URL https: //doi.org/10.1145/3617175. 26 Sartaj et al

  60. [69]

    Can LLMs make robots smarter?Communications of the ACM, 68(2):11–13, 2025

    Samuel Greengard. Can LLMs make robots smarter?Communications of the ACM, 68(2):11–13, 2025. doi: 10.1145/3701227

  61. [72]

    Connecting large language models with evolutionary algorithms yields powerful prompt optimizers

    Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. URLhttps://openrevi...

  62. [73]

    Reinforcement learning for mutation operator selection in auto- mated program repair.Automated Software Engineering, 32(2):1–33, 2025

    Carol Hanna, Aymeric Blot, and Justyna Petke. Reinforcement learning for mutation operator selection in auto- mated program repair.Automated Software Engineering, 32(2):1–33, 2025. doi: 10.1007/s10515-025-00501-z

  63. [74]

    Search-based software engineering.Information and software Technology, 43 (14):833–839, 2001

    Mark Harman and Bryan F Jones. Search-based software engineering.Information and software Technology, 43 (14):833–839, 2001. doi: 10.1016/S0950-5849(01)00189-6

  64. [75]

    A theoretical and empirical study of search-based testing: Local, global, and hybrid search.IEEE Transactions on Software Engineering, 36(2):226–247, 2009

    Mark Harman and Phil McMinn. A theoretical and empirical study of search-based testing: Local, global, and hybrid search.IEEE Transactions on Software Engineering, 36(2):226–247, 2009. doi: 10.1109/TSE.2009.71

  65. [76]

    Search-based software engineering: Trends, techniques and applications.ACM Computing Surveys (CSUR), 45(1):1–61, 2012

    Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. Search-based software engineering: Trends, techniques and applications.ACM Computing Surveys (CSUR), 45(1):1–61, 2012. doi: 10.1145/2379776.2379787

  66. [77]

    Achievements, open problems and challenges for search based software testing

    Mark Harman, Yue Jia, and Yuanyuan Zhang. Achievements, open problems and challenges for search based software testing. In2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), pages 1–12, 2015. doi: 10.1109/ICST.2015.7102580

  67. [78]

    Ahmed E. Hassan, Dayi Lin, Gopi Krishnan Rajbahadur, Keheliya Gallaba, Filipe Roseiro Cogo, Boyuan Chen, Haoxiang Zhang, Kishanthan Thangarajah, Gustavo Oliva, Jiahuei (Justina) Lin, Wali Mohammad Abdullah, and Zhen Ming (Jack) Jiang. Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of t...

  68. [79]

    Genetic algorithms.Scientific american, 267(1):66–73, 1992

    John H Holland. Genetic algorithms.Scientific american, 267(1):66–73, 1992

  69. [80]

    Evolving paradigms in automated program repair: Taxonomy, challenges, and opportunities.ACM Computing Surveys, 57(2):1–43,

    Kai Huang, Zhengzi Xu, Su Yang, Hongyu Sun, Xuejun Li, Zheng Yan, and Yuqing Zhang. Evolving paradigms in automated program repair: Taxonomy, challenges, and opportunities.ACM Computing Surveys, 57(2):1–43,

  70. [81]

    doi: 10.1145/3696450

  71. [82]

    Cost Reduction on Testing Evolving Cancer Registry System

    Erblin Isaku, Hassan Sartaj, Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F Nyg˚ard. Cost Reduction on Testing Evolving Cancer Registry System. In2023 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 508–518. IEEE, 2023. doi: 10.1109/ICSME58846.2023.00065

  72. [83]

    LLMs in the Heart of Differential Testing: A Case Study on a Medical Rule Engine

    Erblin Isaku, Christoph Laaber, Hassan Sartaj, Shaukat Ali, Thomas Schwitalla, and Jan F Nyg ˚ard. LLMs in the Heart of Differential Testing: A Case Study on a Medical Rule Engine. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 429–440. IEEE, 2025. doi: 10.1109/ICST62969. 2025.10989025

  73. [84]

    A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35 (2):1–72, 2026

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Trans. Softw. Eng. Methodol., July 2025. ISSN 1049-331X. doi: 10.1145/3747588. URL https://doi.org/10.1145/3747588

  74. [85]

    An automated search-based test model generation approach for structural testing of model transformations.Journal of Software: Evolution and Process, 34(11):e2461, 2022

    Atif Aftab Jilani, Muhammad Uzair Khan, Muhammad Zohaib Iqbal, and Muhammad Usman. An automated search-based test model generation approach for structural testing of model transformations.Journal of Software: Evolution and Process, 34(11):e2461, 2022. doi: 10.1002/smr.2461. 27 Sartaj et al

  75. [86]

    Towards objective-tailored genetic improvement through large language models

    Sungmin Kang and Shin Yoo. Towards objective-tailored genetic improvement through large language models. In2023 IEEE/ACM International Workshop on Genetic Improvement (GI), pages 19–20. IEEE, 2023. doi: 10.1109/GI59320.2023.00013

  76. [87]

    Deceiving humans and machines alike: Search-based test input generation for DNNs using variational autoencoders.ACM Trans

    Sungmin Kang, Robert Feldt, and Shin Yoo. Deceiving humans and machines alike: Search-based test input generation for DNNs using variational autoencoders.ACM Trans. Softw. Eng. Methodol., 33(4), April 2024. ISSN 1049-331X. doi: 10.1145/3635706. URLhttps://doi.org/10.1145/3635706

  77. [88]

    Evaluating diverse large language models for automatic and general bug reproduction.IEEE Transactions on Software Engineering, 50(10):2677–2694,

    Sungmin Kang, Juyeon Yoon, Nargiz Askarbekkyzy, and Shin Yoo. Evaluating diverse large language models for automatic and general bug reproduction.IEEE Transactions on Software Engineering, 50(10):2677–2694,

  78. [89]

    doi: 10.1109/TSE.2024.3450837

  79. [90]

    Real-world robot applications of foundation models: A review.Advanced Robotics, 38(18):1232–1254, 2024

    Kento Kawaharazuka, Tatsuya Matsushima, Andrew Gambardella, Jiaxian Guo, Chris Paxton, and Andy Zeng. Real-world robot applications of foundation models: A review.Advanced Robotics, 38(18):1232–1254, 2024. doi: 10.1080/01691864.2024.2408593

  80. [91]

    Kephart and David M

    Jeffrey O. Kephart and David M. Chess. The vision of autonomic computing.Computer, 36(1):41–50, January

Showing first 80 references.