pith. machine review for the scientific record. sign in

arxiv: 2604.13103 · v1 · submitted 2026-04-10 · 💻 cs.SE · cs.MA

Recognition: 1 theorem link

· Lean Theorem

Fairness in Multi-Agent Systems for Software Engineering: An SDLC-Oriented Rapid Review

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:07 UTC · model grok-4.3

classification 💻 cs.SE cs.MA
keywords fairnessmulti-agent systemssoftware engineeringsoftware development lifecyclelarge language modelsbiasrapid review
0
0 comments X

The pith

Fairness research on multi-agent systems for software engineering remains too fragmented and limited to support reliable fair systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper conducts a rapid review of studies on fairness in multi-agent systems that use large language models within software development. It shows that fairness gets defined through bias reduction and group interaction rules, yet measurements differ across papers and often rely on narrow test cases rather than full workflows. The authors map reported harms to stages of software creation and find few tested fixes that fit real development processes. A reader would care because these systems already shape code writing, review, and release, so unresolved fairness issues risk producing biased tools and products. The review ends by urging more consistent benchmarks and governance that cover the entire development cycle.

Core claim

Screening 350 papers to 18 relevant studies reveals that fairness in LLM-enabled multi-agent systems combines trustworthy AI principles, bias reduction across groups, and interactional dynamics in collectives, with evaluation using accuracy metrics, demographic disparities, and notions like conformity and bias amplification. Reported harms span representational, quality-of-service, security and privacy, and governance failures, yet the field shows fragmented evaluation practices, limited generalization from simplified environments, and scarce mitigation mechanisms aligned to actual software workflows.

What carries the argument

The rapid review's synthesis of three gaps—fragmented evaluation that blocks comparison, limited generalization from narrow setups, and underdeveloped mitigation and governance tied to real software processes—drawn from the 18 analyzed studies.

If this is right

  • MAS-aware benchmarks would allow direct comparison of fairness results across different agent systems and settings.
  • Standardized evaluation protocols would replace the current mix of accuracy checks and disparity measures.
  • Governance approaches that run through all stages of software creation would address the current scarcity of practical fixes.
  • Specific attention to harms such as bias amplification in agent groups and privacy failures would become part of future software engineering work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Without progress on these gaps, teams adopting multi-agent systems for code tasks may embed undetected biases into production software.
  • Early fairness checks built into agent-based tools could limit the spread of quality-of-service harms to later development stages.
  • Closer ties between fairness evaluation methods and standard software testing practices would make mitigation easier to apply in daily workflows.

Load-bearing premise

The screening of 350 papers down to 18 and the qualitative reading of their content accurately reflects the main patterns and shortfalls in current fairness work on multi-agent systems for software engineering.

What would settle it

A set of studies that apply the same fairness measures and tested fixes across multiple real software development stages in multi-agent systems would challenge the finding that the research cannot yet support deployable fair systems.

Figures

Figures reproduced from arXiv: 2604.13103 by Ahmad Abdellatif, Corey Yang-Smith, Ronnie de Souza Santos.

Figure 1
Figure 1. Figure 1: Search strings to use across all databases in primary search. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Search string used across Google Scholar during [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Study selection flow for the rapid review. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Transformer-based large language models (LLMs) and multi-agent systems (MAS) are increasingly embedded across the software development lifecycle (SDLC), yet their fairness implications for developer-facing tools remain underexplored despite their growing role in shaping what code is written, reviewed, and released. We present a rapid review of recent work on fairness in MAS, emphasizing LLM-enabled settings and relevance to software engineering. Starting from an initial set of 350 papers, we screened and filtered the corpus for relevance, retaining 18 studies for final analysis. Across these 18 studies, fairness is framed as a combination of trustworthy AI principles, bias reduction across groups, and interactional dynamics in collectives, while evaluation spans accuracy metrics on bias benchmarks, demographic disparity measures, and emergent MAS-specific notions such as conformity and bias amplification. Reported harms include representational, quality-of-service, security and privacy, and governance failures, which we relate to SDLC stages where evidence is most and least developed. We identify three persistent gaps: (1) fragmented, rarely MAS-specific evaluation practices that limit comparability, (2) limited generalization due to simplified environments and narrow attribute coverage, and (3) scarce, weakly evaluated mitigation and governance mechanisms aligned to real software workflows. These findings suggest MAS fairness research is not yet ready to support deployable, fairness-assured software systems, motivating MAS-aware benchmarks, consistent protocols, and lifecycle-spanning governance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents a rapid review of fairness in multi-agent systems (MAS), with emphasis on LLM-enabled agents and relevance to the software development lifecycle (SDLC). From an initial corpus of 350 papers, 18 studies are retained after screening. The review synthesizes fairness framings (trustworthy AI principles, group bias reduction, and collective interaction dynamics), evaluation approaches (accuracy on bias benchmarks, demographic disparity metrics, and MAS-specific notions such as conformity and bias amplification), reported harms (representational, quality-of-service, security/privacy, and governance failures), and their mapping to SDLC stages. Three gaps are identified: fragmented and rarely MAS-specific evaluation practices, limited generalization from simplified environments and narrow attribute coverage, and scarce weakly-evaluated mitigation and governance mechanisms. The central claim is that MAS fairness research is not yet ready to support deployable, fairness-assured software systems, motivating MAS-aware benchmarks, consistent protocols, and lifecycle-spanning governance.

Significance. If the identified gaps prove representative, the work is significant as a timely structured synthesis at the intersection of AI fairness, multi-agent systems, and software engineering. It usefully relates harms to specific SDLC stages and articulates concrete, actionable directions for future research. The review's contribution is strengthened by its focus on LLM-enabled MAS, an area of growing practical importance, though its overall impact hinges on the transparency and completeness of the underlying literature selection.

major comments (1)
  1. [Methods / rapid review description] The rapid review methodology (abstract and methods section): the process that reduces 350 papers to 18 is stated at a high level but provides no details on search strings, databases, date ranges, inclusion/exclusion criteria, or inter-rater agreement metrics. This is load-bearing for the central claim because the three gaps and the conclusion that the field is 'not yet ready' rest on the assumption that the selected studies accurately represent the state of LLM-enabled MAS fairness work; without a documented protocol it is impossible to rule out systematic under-sampling of mitigation or benchmark papers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our rapid review. We address the single major comment below and will revise the manuscript to improve methodological transparency.

read point-by-point responses
  1. Referee: [Methods / rapid review description] The rapid review methodology (abstract and methods section): the process that reduces 350 papers to 18 is stated at a high level but provides no details on search strings, databases, date ranges, inclusion/exclusion criteria, or inter-rater agreement metrics. This is load-bearing for the central claim because the three gaps and the conclusion that the field is 'not yet ready' rest on the assumption that the selected studies accurately represent the state of LLM-enabled MAS fairness work; without a documented protocol it is impossible to rule out systematic under-sampling of mitigation or benchmark papers.

    Authors: We agree that the current manuscript describes the reduction from 350 papers to 18 at a high level without sufficient protocol details. As this is a rapid review, the main text was kept concise, but we recognize that this limits assessment of representativeness and supports the referee's point that it is load-bearing for our conclusions. In the revised version we will expand the Methods section with a dedicated protocol subsection that specifies the search strings, databases queried, date ranges, full inclusion/exclusion criteria, and any inter-rater agreement statistics. We will also add a PRISMA-style flow diagram and make the complete search protocol available as supplementary material. These changes will directly address the concern and allow readers to evaluate potential sampling biases. revision: yes

Circularity Check

0 steps flagged

Descriptive literature review exhibits no circularity

full rationale

This rapid review paper contains no mathematical derivations, equations, predictions, fitted parameters, or ansatzes. Its central synthesis of gaps (fragmented evaluation, limited generalization, scarce mitigation) is derived from qualitative analysis of 18 externally sourced studies screened from an initial corpus of 350 papers. No self-citation chains, self-definitional loops, or renaming of known results are present in the provided text or abstract; the methodology and conclusions remain independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature review with no new mathematical models, empirical measurements, or postulates. No free parameters, axioms, or invented entities are introduced; the central claim rests entirely on the authors' interpretation and selection of existing studies.

pith-pipeline@v0.9.0 · 5558 in / 1156 out tokens · 54572 ms · 2026-05-10T17:07:29.958168+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 46 canonical work pages · 4 internal anchors

  1. [1]

    [n. d.]. ACM Digital Library. https://dl.acm.org/. Accessed: 2025-10-11

  2. [2]

    [n. d.]. EU Artificial Intelligence Act: Up-to-date developments and analyses. https://artificialintelligenceact.eu/. Accessed: 2025-11-29

  3. [3]

    [n. d.]. Google Scholar. https://scholar.google.ca/. Accessed: 2025-10-11

  4. [4]

    [n. d.]. IEEE Xplore Digital Library. https://ieeexplore.ieee.org. Accessed: 2025-10-11

  5. [5]

    Anthropic. 2024. Introducing the Model Context Protocol. https://www.anthropic. com/news/model-context-protocol. Accessed: April 16, 2026

  6. [6]

    Aishwarya Bandaru, Fabian Bindley, Trevor Bluth, Nandini Chavda, Baixu Chen, and Ethan Law. 2025. Revealing Political Bias in LLMs through Structured Multi-Agent Debate. arXiv:2506.11825 [cs.AI] https://arxiv.org/abs/2506.11825

  7. [7]

    Angana Borah and Rada Mihalcea. 2024. Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions. arXiv:2410.02584 [cs.CL] https: //arxiv.org/abs/2410.02584

  8. [8]

    Bruno Cartaxo, Gustavo Pinto, and Sergio Soares. 2020. Rapid Reviews in Soft- ware Engineering. arXiv:2003.10006 [cs.SE] https://arxiv.org/abs/2003.10006

  9. [9]

    Hyeong Kyu Choi, Xiaojin Zhu, and Sharon Li. 2025. Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization. arXiv:2510.07517 [cs.AI] https://arxiv.org/abs/2510.07517

  10. [10]

    Min Choi, Keonwoo Kim, Sungwon Chae, and Sangyeop Baek. 2025. An Empirical Study of Group Conformity in Multi-Agent Systems. InFindings of the Associ- ation for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Compu- tational Linguistics, Vienna, Austria, 5123–5139. do...

  11. [11]

    Erica Coppolillo, Giuseppe Manco, and Luca Maria Aiello. 2025. Unmasking Conversational Bias in AI Multiagent Systems. arXiv:2501.14844 [cs.CL] https: //arxiv.org/abs/2501.14844

  12. [12]

    Cursor. 2025. Cursor — The AI Code Editor. https://cursor.com/. Accessed: April 16, 2026

  13. [13]

    José Antonio Siqueira de Cerqueira, Mamia Agbese, Rebekah Rousi, Nan- nan Xi, Juho Hamari, and Pekka Abrahamsson. 2025. Can We Trust AI Agents? A Case Study of an LLM-Based Multi-Agent System for Ethical AI. arXiv:2411.08881 [cs.CY] https://arxiv.org/abs/2411.08881

  14. [14]

    Shangbin Feng, Chan Young Park, Yuhan Liu, and Yulia Tsvetkov. 2023. From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoa...

  15. [15]

    Ariel Flint, Luca Maria Aiello, Romualdo Pastor-Satorras, and Andrea Baronchelli

  16. [16]

    Flint, L

    Group size effects and collective misalignment in LLM multi-agent systems. arXiv:2510.22422 [cs.MA] https://arxiv.org/abs/2510.22422

  17. [17]

    Gallegos, Ryan A

    Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K. Ahmed. 2024. Bias and Fairness in Large Language Models: A Survey.Computational Linguistics50, 3 (Sept. 2024), 1097–1179. doi:10.1162/coli_a_00524

  18. [18]

    Manuel B. Garcia. 2025. Teaching and learning computer programming using ChatGPT: A rapid review of literature amid the rise of generative AI technologies. Education and Information Technologies30, 12 (2025), 16721–16745. doi:10.1007/ s10639-025-13452-5

  19. [19]

    GitHub, Inc. 2025. GitHub Copilot: Your AI Pair Programmer. https://github. com/features/copilot. Accessed: April 16, 2026

  20. [20]

    Diego Gosmar and Deborah A. Dahl. 2025. Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems. arXiv:2509.14956 [cs.AI] https: //arxiv.org/abs/2509.14956

  21. [21]

    Hassan, Hao Li, Dayi Lin, Bram Adams, Tse-Hsun Chen, Yutaro Kashiwa, and Dong Qiu

    Ahmed E. Hassan, Hao Li, Dayi Lin, Bram Adams, Tse-Hsun Chen, Yutaro Kashiwa, and Dong Qiu. 2025. Agentic Software Engineering: Foundational Pillars and a Research Roadmap. arXiv:2509.06216 [cs.SE] https://arxiv.org/abs/ 2509.06216

  22. [22]

    Hassan, Gustavo A

    Ahmed E. Hassan, Gustavo A. Oliva, Dayi Lin, Boyuan Chen, and Zhen Ming Jiang. 2024. Towards AI-Native Software Engineering (SE 3.0): A Vision and a Challenge Roadmap. arXiv:2410.06107 [cs.SE] https://arxiv.org/abs/2410.06107

  23. [23]

    Junda He, Christoph Treude, and David Lo. 2025. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Trans. Softw. Eng. Methodol.34, 5, Article 124 (May 2025), 30 pages. doi:10.1145/3712003

  24. [24]

    Sirui Hong, Mingchen Zhuge, Jiaqi Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. 2024. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. arXiv:2308.00352 [cs.AI] https://arxiv.org/abs/2308.00352

  25. [25]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 [cs.SE] https://arxiv.org/abs/2308.10620

  26. [26]

    Bias testing and mitigation in LLM -based code generation

    Dong Huang, Jie M. Zhang, Qingwen Bu, Xiaofei Xie, Junjie Chen, and Hem- ing Cui. 2025. Bias Testing and Mitigation in LLM-based Code Generation. arXiv:2309.14345 [cs.SE] https://arxiv.org/abs/2309.14345

  27. [27]

    International Organization for Standardization (ISO) and IEC. 2023. Information technology — Artificial intelligence — Management system. Published December 2023; 51 pages

  28. [28]

    Gabriele Cesar Iwashima, Claudia Susie Rodrigues, Claudio Dipolitto, and Ger- aldo Xexéo. 2025. Factors That Support Grounded Responses in LLM Conversa- tions: A Rapid Review. arXiv:2511.21762 [cs.CL] https://arxiv.org/abs/2511.21762

  29. [29]

    Haolin Jin, Linghan Huang, Haipeng Cai, Jun Yan, Bo Li, and Huaming Chen. 2025. From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future. arXiv:2408.02479 [cs.SE] https://arxiv.org/abs/2408.02479

  30. [30]

    Marcin Kawalerowicz, Marcin Pietranik, and Krzysztof Stępniak. 2026. LLMs as Code Review Agents: A Rapid Review and Experimental Evaluation with Hu- man Expert Judges. InComputational Collective Intelligence, Ngoc Thanh Nguyen, Vu Dinh Duc Anh, Adrianna Kozierkiewicz, Sinh Nguyen Van, Manuel Núñez, Jan Treur, and Gottfried Vossen (Eds.). Springer Nature S...

  31. [31]

    Rana Nameer Hussain Khan, Dawood Wasif, Jin-Hee Cho, and Ali Butt. 2025. Multi-Agent Code-Orchestrated Generation for Reliable Infrastructure-as-Code. arXiv:2510.03902 [cs.SE] https://arxiv.org/abs/2510.03902

  32. [32]

    Hadas Kotek, Rikker Dockum, and David Sun. 2023. Gender bias and stereotypes in Large Language Models. InProceedings of The ACM Collective Intelligence Conference (CI ’23). ACM, 12–24. doi:10.1145/3582269.3615599

  33. [33]

    Hao Li, Haoxiang Zhang, and Ahmed E. Hassan. 2025. The Rise of AI Team- mates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Corey Yang-Smith, Ronnie de Souza Santos, and Ahmad Abdellatif Reshaping Software Engineering. arXiv:2507.15003 [cs.SE] https://arxiv.org/ abs/2507.15003

  34. [34]

    Feng Lin, Dong Jae Kim, and Tse-Husn Chen. 2024. SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents. arXiv:2403.15852 [cs.SE] https://arxiv.org/abs/2403.15852

  35. [35]

    Lin Ling, Fazle Rabbi, Song Wang, and Jinqiu Yang. 2025. Bias Unveiled: Investigating Social Bias in LLM-Generated Code. arXiv:2411.10351 [cs.SE] https://arxiv.org/abs/2411.10351

  36. [36]

    Yan Liu, Xiaokang Chen, Yan Gao, Zhe Su, Fengji Zhang, Daoguang Zan, Jian- Guang Lou, Pin-Yu Chen, and Tsung-Yi Ho. 2023. Uncovering and Quantifying Social Biases in Code Generation. arXiv:2305.15377 [cs.CL] https://arxiv.org/ abs/2305.15377

  37. [37]

    Jens Lünstedt and Tim Schlippe. 2025. Mitigating Bias in Large Language Models Leveraging Multi-Agent Scenarios. In2025 7th International Conference on Natural Language Processing (ICNLP). 14–18. doi:10.1109/ICNLP65360.2025.11108428

  38. [38]

    Deep Mehta, Kartik Rawool, Subodh Gujar, and Bowen Xu. 2023. Automated DevOps Pipeline Generation for Code Repositories using Large Language Models. arXiv:2312.13225 [cs.SE] https://arxiv.org/abs/2312.13225

  39. [39]

    Imran Mirza, Cole Huang, Ishwara Vasista, Rohan Patil, Asli Akalin, Sean O’Brien, and Kevin Zhu. 2025. MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered. arXiv:2507.01019 [cs.CL] https://arxiv.org/abs/2507.01019

  40. [40]

    n8n. 2025. n8n — AI Workflow Automation Platform & Tools. https://n8n.io/. Accessed: April 16, 2026

  41. [41]

    Nguyen, and Nghi D

    Minh Huynh Nguyen, Thang Phan Chau, Phong X. Nguyen, and Nghi D. Q. Bui

  42. [42]

    AgileCoder: Dynamic collaborative agents for software development based on agile methodology

    AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. arXiv:2406.11912 [cs.SE] https://arxiv.org/abs/ 2406.11912

  43. [43]

    Thi-Nhung Nguyen, Linhao Luo, Thuy-Trang Vu, and Dinh Phung. 2025. The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems. arXiv:2510.10943 [cs.MA] https: //arxiv.org/abs/2510.10943

  44. [44]

    OpenAI. 2025. Introducing AgentKit. https://openai.com/index/introducing- agentkit/. Accessed: April 16, 2026

  45. [45]

    Marc Oriol, Quim Motger, Jordi Marco, and Xavier Franch. 2025. Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models . In2025 IEEE 33rd International Requirements Engineering Conference (RE). IEEE Computer Society, Los Alamitos, CA, USA, 527–534. doi:10.1109/ RE63999.2025.00063

  46. [46]

    Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, and Samuel R. Bowman. 2022. BBQ: A Hand- Built Bias Benchmark for Question Answering. arXiv:2110.08193 [cs.CL] https: //arxiv.org/abs/2110.08193

  47. [47]

    Federica Pepe, Vittoria Nardone, Antonio Mastropaolo, Gabriele Bavota, Gerardo Canfora, and Massimiliano Di Penta. 2024. How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study. InProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension(Lisbon, Portugal)(ICPC ’24). Association for Computing Machinery,...

  48. [48]

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. 2024. ChatDev: Communicative Agents for Software Development. arXiv:2307.07924 [cs.SE] https://arxiv.org/abs/2307.07924

  49. [49]

    Shaina Raza, Ranjan Sapkota, Manoj Karkee, and Christos Emmanouilidis. 2025. TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems. arXiv:2506.04133 [cs.AI] https: //arxiv.org/abs/2506.04133

  50. [50]

    Replit. 2025. Replit — Build apps and sites with AI. https://replit.com/. Accessed: April 16, 2026

  51. [51]

    2025.Facilitating Trustworthy Human-Agent Collaboration in LLM-based Multi-Agent System oriented Software Engineering

    Krishna Ronanki. 2025.Facilitating Trustworthy Human-Agent Collaboration in LLM-based Multi-Agent System oriented Software Engineering. Association for Computing Machinery, New York, NY, USA, 1333–1337. https://doi.org/10.1145/ 3696630.3728717

  52. [52]

    Mark Ryan and Bernd Carsten Stahl. 2020. Artificial intelligence ethics guidelines for developers and users: clarifying their content and normative implications. Journal of Information, Communication and Ethics in Society19, 1 (06 2020), 61–86. arXiv:https://www.emerald.com/jices/article-pdf/19/1/61/1616450/jices- 12-2019-0138.pdf doi:10.1108/JICES-12-2019-0138

  53. [53]

    Manish Sanwal. 2025. Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Mod- els. arXiv:2501.18645 [cs.CL] https://arxiv.org/abs/2501.18645

  54. [54]

    Tanush Sharanarthi. 2025. Adaptive Multi-Agent AI Framework for Real-Time Energy Optimization and Context-Aware Code Review in Software Development. In2025 5th International Symposium on Computer Technology and Information Science (ISCTIS). 353–358. doi:10.1109/ISCTIS65944.2025.11066037

  55. [55]

    Ron Solomon, Yarin Yerushalmi Levi, Lior Vaknin, Eran Aizikovich, Amit Baras, Etai Ohana, Amit Giloni, Shamik Bose, Chiara Picardi, Yuval Elovici, and Asaf Shabtai. 2025. LumiMAS: A Comprehensive Framework for Real-Time Monitoring and Enhanced Observability in Multi-Agent Systems. arXiv:2508.12412 [cs.CR] https://arxiv.org/abs/2508.12412

  56. [56]

    Rao Surapaneni, Miku Jha, Michael Vakoc, and Todd Segal. 2025. Announcing the Agent2Agent Protocol (A2A): A new era of Agent Interoperability. https: //developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/. Ac- cessed: April 16, 2026

  57. [57]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010

  58. [58]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems(New Orleans, LA, USA) (NIPS ’22). Curran Associates Inc., Red Hook, NY...

  59. [59]

    Zhiqiu Xia, Lang Zhu, Bingzhe Li, Feng Chen, Qiannan Li, Chunhua Liao, Feiyi Wang, and Hang Liu. 2025. Analyzing 16,193 LLM Papers for Fun and Profits. arXiv:2504.08619 [cs.DL] https://arxiv.org/abs/2504.08619

  60. [60]

    Qinghua Xu, Guancheng Wang, Lionel Briand, and Kui Liu. 2025. Hallu- cination to Consensus: Multi-Agent LLMs for End-to-End Test Generation. arXiv:2506.02943 [cs.SE] https://arxiv.org/abs/2506.02943

  61. [61]

    Zhenjie Xu, Wenqing Chen, Yi Tang, Xuanying Li, Cheng Hu, Zhixuan Chu, Kui Ren, Zibin Zheng, and Zhichao Lu. 2025. Mitigating Social Bias in Large Lan- guage Models: A Multi-Objective Approach Within a Multi-Agent Framework. Proceedings of the AAAI Conference on Artificial Intelligence39, 24 (Apr. 2025), 25579–25587. doi:10.1609/aaai.v39i24.34748

  62. [62]

    Bissyandé, Yang Liu, and Haoye Tian

    Boyang Yang, Zijian Cai, Feng Liu, Bach Le, Lingming Zhang, Tégawendé F. Bissyandé, Yang Liu, and Haoye Tian. 2025. A Survey of LLM-based Auto- mated Program Repair: Taxonomies, Design Paradigms, and Applications.ArXiv abs/2506.23749 (2025). https://api.semanticscholar.org/CorpusID:280010745

  63. [63]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR)

  64. [64]

    Gerding, Sebastian Stein, Corina Cirstea, M

    Vahid Yazdanpanah, Enrico H. Gerding, Sebastian Stein, Corina Cirstea, M. C. Schraefel, Timothy J. Norman, and Nicholas R. Jennings. 2021. Different Forms of Responsibility in Multiagent Systems: Sociotechnical Characteristics and Requirements.IEEE Internet Computing25, 6 (2021), 15–22. doi:10.1109/MIC. 2021.3107334

  65. [65]

    2025.PATCHAGENT: a practical program repair agent mimicking human expertise

    Zheng Yu, Ziyi Guo, Yuhang Wu, Jiahao Yu, Meng Xu, Dongliang Mu, Yan Chen, and Xinyu Xing. 2025.PATCHAGENT: a practical program repair agent mimicking human expertise. USENIX Association, USA

  66. [66]

    Zapier Inc. 2025. Zapier: Automate AI Workflows, Agents, and Apps. https: //zapier.com/. Accessed: April 16, 2026

  67. [67]

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL] https://arxiv.org/abs/2306.05685