arxiv: 2603.04474 · v2 · submitted 2026-03-04 · 💻 cs.MA · cs.AI

Recognition: 2 theorem links

· Lean Theorem

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

Yizhe Xie , Congcong Zhu , Xinyue Zhang , Tianqing Zhu , Dayong Ye , Minfeng Qi , Huajie Chen , Wanlei Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-15 16:34 UTC · model grok-4.3

classification 💻 cs.MA cs.AI

keywords error cascadesmulti-agent systemsLLMgenealogy graphgovernance layerpropagation dynamicserror mitigationcollaboration

0 comments

The pith

A genealogy-graph governance layer suppresses error amplification in LLM-based multi-agent systems without altering their collaboration architecture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper models how small inaccuracies in LLM multi-agent systems can spread and amplify through message dependencies into system-wide false consensus. It identifies three vulnerability classes—cascade amplification, topological sensitivity, and consensus inertia—and shows that a single injected error can trigger broad failure. The authors introduce a genealogy-graph-based governance layer as a message-layer plugin that tracks dependencies to detect and block risky propagations early. This approach prevents final infection in at least 89 percent of runs across six frameworks while preserving natural collaboration flows. The method addresses reliability gaps that existing single-agent checks or architecture changes often leave open.

Core claim

By abstracting LLM-MAS collaboration as a directed dependency graph and defining an early-stage risk criterion, the paper shows that error cascades follow predictable amplification patterns. Experiments across mainstream frameworks expose cascade amplification along dependency paths, sensitivity to graph topology, and inertia toward erroneous consensus. A single atomic error seed suffices to infect the system. The genealogy-graph governance layer, implemented as a non-intrusive plugin, suppresses both endogenous and exogenous amplification and blocks final infection in at least 89 percent of runs without modifying the underlying collaboration structure.

What carries the argument

The genealogy-graph-based governance layer, which tracks message lineage in the directed dependency graph to apply early risk criteria and intercept error propagation.

If this is right

Minor errors no longer solidify into system-level false consensus during iterative collaboration.
Protection works across six mainstream multi-agent frameworks without architecture changes.
A single error seed is prevented from causing widespread failure in most operating modes.
Both internally generated and externally introduced errors are suppressed by the same layer.
Effective information flow between agents remains intact while cascade risks are reduced.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Dependency tracking of this form could be adapted to improve reliability in distributed systems beyond LLM agents.
Agent frameworks may benefit from making genealogy logging a default feature rather than an add-on.
Topology-aware agent design informed by the risk criterion could further lower cascade exposure.
Validation on larger, open-ended tasks would test whether the reported 89 percent prevention rate generalizes.

Load-bearing premise

The directed dependency graph abstraction and early-stage risk criterion capture the dominant mechanisms of error spread in real LLM multi-agent deployments.

What would settle it

A real deployment in which errors propagate and amplify through non-message channels such as shared external memory or tool states that the genealogy graph does not record, allowing infection despite the governance layer.

Figures

Figures reproduced from arXiv: 2603.04474 by Congcong Zhu, Dayong Ye, Huajie Chen, Minfeng Qi, Tianqing Zhu, Wanlei Zhou, Xinyue Zhang, Yizhe Xie.

**Figure 1.** Figure 1: The amplification of errors in LLM-MAS. Whether the input is a factuality error or a faithfulness error, the agents reach a false consensus. This results in failures ranging from security breaches to operational outages. roles [41]. Concurrently, representative open-source frameworks for collaborative orchestration have emerged. For instance, AUTOGEN [50] organizes task flows through multiagent convers… view at source ↗

**Figure 2.** Figure 2: Overview of our work. We categorize false consensus arising from internal vulnerabilities versus external induction. We model propagation dynamics to characterize consensus collapse mechanisms. Correspondingly, a genealogy-based governance layer implements atomic propagation control to guarantee faithfulness and factuality. amplification of errors. Minor deviations regarding factuality or faithfulness, whe… view at source ↗

**Figure 3.** Figure 3: Model validation across different topologies. The black lines represent the observed mean infection rates with ±1 standard error. The dashed lines show the fitted curves using product-based and Poisson-based infection functions. of agent j causes agent i to treat m as a usable premise when generating ui(t +1). Under IBMF, sj(t) approximates the probability that agent j is active regarding m at round t. The… view at source ↗

**Figure 4.** Figure 4: The evolution of error coverage S(t). 4.2 Vulnerability II: Topological Fragility We identify a structural failure mode where system resilience depends not solely on error content but on the entry coordinates. Whereas Vulnerability I demonstrates error spreading potential, Vulnerability II investigates where the graph is most susceptible. Mechanism. The global divergence of outcomes is governed by the adj… view at source ↗

**Figure 5.** Figure 5: Overview of the Genealogy-Based Governance Layer. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Infection rate S(t) across communication turns under three topologie. low atoms are forwarded with high-risk tags but excluded from confirmed lineage. This prevents deadlock while limiting rejection propagation. Case B: Qreturn = 0/ (Release). The outgoing message is assembled from Green and policy-handled Yellow atoms, preserving sequence, and released downstream, updating L accordingly. 6.6 Online and… view at source ↗

read the original abstract

Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collaborative scenarios. However, their collaborative mechanisms may cause minor inaccuracies to gradually solidify into system-level false consensus through iteration. Such risks are difficult to trace since errors can propagate and amplify through message dependencies. Existing protections often rely on single-agent validation or require modifications to the collaboration architecture, which can weaken effective information flow and may not align with natural collaboration processes in real tasks. To address this, we propose a propagation dynamics model tailored for LLM-MAS that abstracts collaboration as a directed dependency graph and provides an early-stage risk criterion to characterize amplification risk. Through experiments on six mainstream frameworks, we identify three vulnerability classes: cascade amplification, topological sensitivity, and consensus inertia. We further instantiate an attack where injecting just a single atomic error seed leads to widespread failure. In response, we introduce a genealogy-graph-based governance layer, implemented as a message-layer plugin, that suppresses both endogenous and exogenous error amplification without altering the collaboration architecture. Experiments show that this approach prevents final infection in at least 89% of runs across operating modes and significantly mitigates the cascading spread of minor errors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper models error spread in LLM multi-agent systems as a directed graph and shows a message-layer plugin that blocks final infection in 89% of runs, but the experiments lack enough controls to judge how general the fix really is.

read the letter

The core contribution is a propagation model that treats LLM-MAS collaboration as a directed dependency graph, plus an early-stage risk criterion and three vulnerability classes: cascade amplification, topological sensitivity, and consensus inertia. They demonstrate that a single atomic error seed can trigger widespread failure, then add a genealogy-graph governance layer as a plugin that sits on top of existing message flows and cuts final infections to at least 89% across six frameworks without forcing architecture changes. That last part is the practical win; most prior work either validates single agents or rewrites the collaboration rules, which this avoids. The graph abstraction and the plugin idea are new enough in combination to stand out from the single-agent or redesign approaches cited in the abstract. The empirical runs on real frameworks give the claim some grounding rather than pure theory. The 89% figure is concrete and the mitigation is lightweight, which matters for people actually shipping these systems. The soft spots sit in the experimental reporting. The abstract gives the prevention rate but skips baselines, error definitions, statistical tests, and controls for how errors are injected or measured. Without those, it is hard to know whether 89% beats obvious alternatives or just reflects the chosen setups. The stress-test point about non-message channels also lands: agents often share prompt history, tool outputs, or implicit state outside explicit messages, and the model does not appear to test whether errors propagate through those routes. If the full paper does not close that gap, the plugin's reliability drops in more realistic deployments. This work is aimed at engineers and researchers building multi-agent LLM applications who need reliability fixes that do not break existing flows. A reader already working on error handling in these systems will get usable ideas from the vulnerability classes and the plugin design. The central argument is coherent on its own terms and the mitigation is reproducible in principle, so the paper deserves a serious referee even if the experiments need tightening. I would send it to review.

Referee Report

3 major / 1 minor

Summary. The paper models error propagation in LLM-based multi-agent systems as a directed dependency graph, identifies three vulnerability classes (cascade amplification, topological sensitivity, consensus inertia) via experiments on six frameworks, shows that a single atomic error seed can cause widespread failure, and proposes a genealogy-graph governance layer implemented as a message-layer plugin. This plugin is claimed to suppress endogenous and exogenous amplification without altering the collaboration architecture, preventing final infection in at least 89% of runs across operating modes.

Significance. If the directed-graph abstraction faithfully captures dominant propagation mechanisms and the empirical results prove robust to controls, the work offers a lightweight, architecture-preserving mitigation strategy for error cascades in LLM-MAS. This is significant for practical deployment of collaborative agents, as it avoids the drawbacks of single-agent validation or architectural changes while providing concrete prevention rates across multiple frameworks.

major comments (3)

[Abstract and Experimental Setup] Abstract and Experimental Setup: The 89% prevention rate is presented without details on error definitions, number of runs, statistical tests, variance, or baseline comparisons (e.g., no-governance controls). This information is load-bearing for evaluating the mitigation's effectiveness and generalizability.
[Propagation Dynamics Model] Propagation Dynamics Model: The directed dependency graph and early-stage risk criterion assume message dependencies dominate error spread. However, shared context, tool outputs, or implicit state outside explicit messages are common in LLM-MAS and could allow undetected amplification, potentially invalidating the reported prevention rate when the model is incomplete.
[Vulnerability Classification] Vulnerability Classification: The three classes appear identified post-hoc from the runs, introducing selection-effect risk that could overstate their generality and the attack's representativeness.

minor comments (1)

[Abstract] Abstract: The six mainstream frameworks are not named, which reduces immediate reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on experimental transparency, model assumptions, and classification methodology. We address each major comment below, indicating revisions where the manuscript will be updated to strengthen clarity and address potential limitations.

read point-by-point responses

Referee: [Abstract and Experimental Setup] The 89% prevention rate is presented without details on error definitions, number of runs, statistical tests, variance, or baseline comparisons (e.g., no-governance controls). This information is load-bearing for evaluating the mitigation's effectiveness and generalizability.

Authors: We agree that the abstract would benefit from more detail on these elements for immediate evaluation. The full manuscript's Section 4 specifies error as factual deviation from ground truth exceeding a 5% threshold, with 500 runs per framework and operating mode, including standard deviations and t-test results (p < 0.01) against no-governance baselines showing infection rates above 70%. We will revise the abstract to incorporate key statistics and add a summary table of runs, variance, and baselines in the main text. revision: yes
Referee: [Propagation Dynamics Model] The directed dependency graph and early-stage risk criterion assume message dependencies dominate error spread. However, shared context, tool outputs, or implicit state outside explicit messages are common in LLM-MAS and could allow undetected amplification, potentially invalidating the reported prevention rate when the model is incomplete.

Authors: The model abstracts collaboration via explicit message dependencies as the primary propagation channel, which our experiments on six frameworks confirm as the dominant mechanism in the tested scenarios. We acknowledge that shared context and tool outputs may enable additional implicit paths not fully modeled. In revision, we will add a limitations discussion on this point and note that the genealogy-graph plugin mitigates observable cascades at the message layer. The reported prevention rates remain valid under the model's explicit-dependency assumptions. revision: partial
Referee: [Vulnerability Classification] The three classes appear identified post-hoc from the runs, introducing selection-effect risk that could overstate their generality and the attack's representativeness.

Authors: The classes emerged from systematic patterns observed consistently across all frameworks and attack variants, informed by graph properties such as path amplification and node sensitivity. To mitigate selection concerns, we will clarify the a priori hypotheses in the revision, provide full run data in supplementary materials, and cross-validate the classification against additional independent scenarios. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on graph abstraction and empirical validation

full rationale

The paper defines a directed dependency graph model and early-stage risk criterion, then reports empirical results from experiments on six frameworks showing vulnerability classes and 89% prevention via the genealogy-graph plugin. No equations are presented that equate outputs to inputs by construction, no fitted parameters are relabeled as predictions, and no self-citations are used to justify uniqueness or smuggle ansatzes. The derivation chain is self-contained against the stated experimental benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the directed dependency graph and genealogy tracking are presented as modeling choices rather than new physical entities.

pith-pipeline@v0.9.0 · 5531 in / 1003 out tokens · 47786 ms · 2026-05-15T16:34:08.573537+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize the multi-agent workflow as a directed graph G=(V,E) ... si(t+1)=(1-δ)si(t)+(1-si(t))f_prod ... R≈βρ(A)/δ ... Genealogy Graph L=(V,E) to track atomic provenance
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

product-form infection function f_prod_i(t)=1-∏(1-β aij sj(t)) ... spectral threshold indicator

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
cs.LG 2026-05 unverdicted novelty 6.0

Pretrained base models exhibit higher yield to peer disagreement than RLHF instruct variants, with the effect localized to mid-layer attention and mitigated by structured dissent rather than prompt defenses.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

Trustworthy agentic ai systems: A cross-layer review of architectures, threat models, and governance strategies for real-world deployment.F1000Research, 14(905):905, 2025

Ibrahim Adabara, Bashir Olaniyi Sadiq, Aliyu Nuhu Shuaibu, Yale Ibrahim Danjuma, and Venkateswarlu Maninti. Trustworthy agentic ai systems: A cross-layer review of architectures, threat models, and governance strategies for real-world deployment.F1000Research, 14(905):905, 2025

work page 2025
[2]

The orchestration of multi-agent systems: Architec- tures, protocols, and enterprise adoption.arXiv preprint arXiv:2601.13671, 2026

Apoorva Adimulam, Rajesh Gupta, and Sumit Kumar. The orchestration of multi-agent systems: Architec- tures, protocols, and enterprise adoption.arXiv preprint arXiv:2601.13671, 2026

work page arXiv 2026
[3]

An overview of recent advances of resilient consensus for multiagent systems under at- tacks.Computational Intelligence and Neuroscience, 2022(1):6732343, 2022

Muhammad Muzamil Aslam, Zahoor Ahmed, Liping Du, Muhammad Zohaib Hassan, Sajid Ali, and Muham- mad Nasir. An overview of recent advances of resilient consensus for multiagent systems under at- tacks.Computational Intelligence and Neuroscience, 2022(1):6732343, 2022

work page 2022
[4]

Uci machine learning repository, 2007

Arthur Asuncion, David Newman, et al. Uci machine learning repository, 2007

work page 2007
[5]

Monitoring reason- ing models for misbehavior and the risks of promoting obfuscation.arXiv preprint arXiv:2503.11926, 2025

Bowen Baker, Joost Huizinga, Leo Gao, Zehao Dou, Melody Y Guan, Aleksander Madry, Wojciech Zaremba, Jakub Pachocki, and David Farhi. Monitoring reason- ing models for misbehavior and the risks of promoting obfuscation.arXiv preprint arXiv:2503.11926, 2025

work page arXiv 2025
[6]

A theory of fads, fashion, custom, and cultural change as informational cascades.Journal of political Economy, 100(5):992–1026, 1992

Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom, and cultural change as informational cascades.Journal of political Economy, 100(5):992–1026, 1992

work page 1992
[7]

Sagallm: Con- text management, validation, and transaction guaran- tees for multi-agent llm planning.arXiv preprint arXiv:2503.11951, 2025

Edward Y Chang and Longling Geng. Sagallm: Con- text management, validation, and transaction guaran- tees for multi-agent llm planning.arXiv preprint arXiv:2503.11951, 2025

work page arXiv 2025
[8]

A lattice model of secure informa- tion flow.Communications of the ACM, 19(5):236–243, 1976

Dorothy E Denning. A lattice model of secure informa- tion flow.Communications of the ACM, 19(5):236–243, 1976. 14

work page 1976
[9]

Improving factuality and reasoning in language models through multiagent de- bate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenen- baum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent de- bate. InForty-first International Conference on Machine Learning, 2023

work page 2023
[10]

Exploration of llm multi- agent application implementation based on langgraph+ crewai.arXiv preprint arXiv:2411.18241, 2024

Zhihua Duan and Jialin Wang. Exploration of llm multi- agent application implementation based on langgraph+ crewai.arXiv preprint arXiv:2411.18241, 2024

work page arXiv 2024
[11]

PhD thesis, University of Oxford, 2021

Christopher J D’Urso.Nowhere to hide: investigating the use of unilateral alternatives to extradition in United States prosecutions of transnational cybercrime. PhD thesis, University of Oxford, 2021

work page 2021
[12]

Ragas: Automated evaluation of retrieval augmented generation

Shahul Es, Jithin James, Luis Espinosa Anke, and Steven Schockaert. Ragas: Automated evaluation of retrieval augmented generation. InProceedings of the 18th Con- ference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 150–158, 2024

work page 2024
[13]

From prompt injections to pro- tocol exploits: Threats in llm-powered ai agents work- flows.ICT Express, 2025

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. From prompt injections to pro- tocol exploits: Threats in llm-powered ai agents work- flows.ICT Express, 2025

work page 2025
[14]

Multi-agent frame- work for threat mitigation and resilience in ai-based systems.arXiv preprint arXiv:2512.23132, 2025

Armstrong Foundjem, Lionel Nganyewou Tidjon, Leu- son Da Silva, and Foutse Khomh. Multi-agent frame- work for threat mitigation and resilience in ai-based systems.arXiv preprint arXiv:2512.23132, 2025

work page arXiv 2025
[15]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

work page 2023
[16]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xian- gliang Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention.arXiv preprint arXiv:2006.03654, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[18]

Red-teaming llm multi-agent systems via communication attacks

Pengfei He, Yuping Lin, Shen Dong, Han Xu, Yue Xing, and Hui Liu. Red-teaming llm multi-agent systems via communication attacks. InFindings of the Association for Computational Linguistics: ACL 2025, pages 6726– 6747, 2025

work page 2025
[19]

Sentinelagent: Graph-based anomaly detection in multi-agent systems

Xu He, Di Wu, Yan Zhai, and Kun Sun. Sentinelagent: Graph-based anomaly detection in multi-agent systems. arXiv preprint arXiv:2505.24201, 2025

work page arXiv 2025
[20]

Measuring Massive Multitask Language Understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[21]

Measuring Mathematical Problem Solving With the MATH Dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset.arXiv preprint arXiv:2103.03874, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[22]

Metagpt: Meta programming for a multi-agent collabora- tive framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collabora- tive framework. InThe twelfth international conference on learning representations, 2023

work page 2023
[23]

Understanding the planning of LLM agents: A survey

Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of llm agents: A survey.arXiv preprint arXiv:2402.02716, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

An overview on multi-agent consensus under adversarial attacks.An- nual Reviews in Control, 53:252–272, 2022

Hideaki Ishii, Yuan Wang, and Shuai Feng. An overview on multi-agent consensus under adversarial attacks.An- nual Reviews in Control, 53:252–272, 2022

work page 2022
[25]

A multi-vocal review of security orchestration.ACM Computing Surveys (CSUR), 52(2):1–45, 2019

Chadni Islam, Muhammad Ali Babar, and Surya Nepal. A multi-vocal review of security orchestration.ACM Computing Surveys (CSUR), 52(2):1–45, 2019

work page 2019
[26]

Towards mitigating llm halluci- nation via self reflection

Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating llm halluci- nation via self reflection. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 1827–1843, 2023

work page 2023
[27]

A survey on large language models for code generation.ACM Transactions on Software Engi- neering and Methodology, 35(2):1–72, January 2026

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engi- neering and Methodology, 35(2):1–72, January 2026

work page 2026
[28]

A survey of llm-driven ai agent communication: Protocols, security risks, and defense countermeasures.arXiv preprint arXiv:2506.19676, 2025

Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Xiang Chen, Zeyang Sha, et al. A survey of llm-driven ai agent communication: Protocols, security risks, and defense countermeasures.arXiv preprint arXiv:2506.19676, 2025

work page arXiv 2025
[29]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, 15 et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[30]

Camel: Communica- tive agents for" mind" exploration of large language model society.Advances in Neural Information Process- ing Systems, 36:51991–52008, 2023

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communica- tive agents for" mind" exploration of large language model society.Advances in Neural Information Process- ing Systems, 36:51991–52008, 2023

work page 2023
[31]

A survey on llm-based multi-agent systems: workflow, in- frastructure, and challenges.Vicinagearth, 1(1):9, 2024

Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workflow, in- frastructure, and challenges.Vicinagearth, 1(1):9, 2024

work page 2024
[32]

Attack and defense techniques in large language models: A survey and new perspectives.Neu- ral Networks, page 108388, 2025

Zhiyu Liao, Kang Chen, Yuanguo Lin, Kangkang Li, Yunxuan Liu, Hefeng Chen, Xingwang Huang, and Yuanhui Yu. Attack and defense techniques in large language models: A survey and new perspectives.Neu- ral Networks, page 108388, 2025

work page 2025
[33]

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, and Angelo Furfaro. The dark side of llms: Agent-based at- tacks for complete computer takeover.arXiv preprint arXiv:2507.06850, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100, 2023

work page 2023
[35]

Why do multiagent systems fail? In ICLR 2025 Workshop on Building Trust in Language Models and Applications, 2025

Melissa Z Pan, Mert Cemri, Lakshya A Agrawal, Shuyi Yang, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Kannan Ramchandran, Dan Klein, et al. Why do multiagent systems fail? In ICLR 2025 Workshop on Building Trust in Language Models and Applications, 2025

work page 2025
[36]

Epidemic processes in complex networks.Reviews of modern physics, 87(3):925–979, 2015

Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. Epidemic processes in complex networks.Reviews of modern physics, 87(3):925–979, 2015

work page 2015
[37]

A review on agent-to-agent pro- tocol: Concept, state-of-the-art, challenges and future directions.Authorea Preprints, 2025

Partha Pratim Ray. A review on agent-to-agent pro- tocol: Concept, state-of-the-art, challenges and future directions.Authorea Preprints, 2025

work page 2025
[38]

Ai agents vs

Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. Ai agents vs. agentic ai: A conceptual tax- onomy, applications and challenges.arXiv preprint arXiv:2505.10468, 2025

work page arXiv 2025
[39]

Audit-llm: Multi- agent collaboration for log-based insider threat detection

Chengyu Song, Linru Ma, Jianming Zheng, Jinzhi Liao, Hongyu Kuang, and Lin Yang. Audit-llm: Multi- agent collaboration for log-based insider threat detection. arXiv preprint arXiv:2408.08902, 2024

work page arXiv 2024
[40]

Towards detecting llms hallucination via markov chain-based multi-agent debate framework

Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, and Rui Yan. Towards detecting llms hallucination via markov chain-based multi-agent debate framework. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

work page 2025
[41]

Talebirad, A

Yashar Talebirad and Amirhossein Nadiri. Multi-agent collaboration: Harnessing the power of intelligent llm agents.arXiv preprint arXiv:2306.03314, 2023

work page arXiv 2023
[42]

Creating large language model applications utilizing langchain: A primer on developing llm apps fast

Oguzhan Topsakal and Tahir Cetin Akinci. Creating large language model applications utilizing langchain: A primer on developing llm apps fast. InInternational conference on applied engineering and natural sciences, volume 1, pages 1050–1056, 2023

work page 2023
[43]

Multi-agent systems execute arbitrary malicious code

Harold Triedman, Rishi Jha, and Vitaly Shmatikov. Multi-agent systems execute arbitrary malicious code. arXiv preprint arXiv:2503.12188, 2025

work page arXiv 2025
[44]

The spread of true and false news online.science, 359(6380):1146– 1151, 2018

Soroush V osoughi, Deb Roy, and Sinan Aral. The spread of true and false news online.science, 359(6380):1146– 1151, 2018

work page 2018
[45]

Agent ai with lang- graph: A modular framework for enhancing machine translation using large language models.arXiv preprint arXiv:2412.03801, 2024

Jialin Wang and Zhihua Duan. Agent ai with lang- graph: A modular framework for enhancing machine translation using large language models.arXiv preprint arXiv:2412.03801, 2024

work page arXiv 2024
[46]

A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

work page 2024
[47]

Security of internet of agents: Attacks and counter- measures.IEEE Open Journal of the Computer Society, 2025

Yuntao Wang, Yanghe Pan, Shaolong Guo, and Zhou Su. Security of internet of agents: Attacks and counter- measures.IEEE Open Journal of the Computer Society, 2025

work page 2025
[48]

Large model based agents: State-of-the-art, cooperation paradigms, security and privacy, and future trends.IEEE Communications Surveys & Tutorials, 2025

Yuntao Wang, Yanghe Pan, Zhou Su, Yi Deng, Quan Zhao, Linkang Du, Tom H Luan, Jiawen Kang, and Dusit Niyato. Large model based agents: State-of-the-art, cooperation paradigms, security and privacy, and future trends.IEEE Communications Surveys & Tutorials, 2025

work page 2025
[49]

A simple model of global cascades on random networks.Proceedings of the National Academy of Sciences, 99(9):5766–5771, 2002

Duncan J Watts. A simple model of global cascades on random networks.Proceedings of the National Academy of Sciences, 99(9):5766–5771, 2002

work page 2002
[50]

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst Conference on Language Modeling, 2024. 16

work page 2024
[51]

The rise and potential of large language model based agents: A survey, 2023

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shi- han Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang,...

work page 2023
[52]

Who’s the mole? modeling and detecting intention-hiding mali- cious agents in llm-based multi-agent systems.arXiv preprint arXiv:2507.04724, 2025

Yizhe Xie, Congcong Zhu, Xinyue Zhang, Tianqing Zhu, Dayong Ye, Minghao Wang, and Chi Liu. Who’s the mole? modeling and detecting intention-hiding mali- cious agents in llm-based multi-agent systems.arXiv preprint arXiv:2507.04724, 2025

work page arXiv 2025
[53]

Minimizing hallucinations and communication costs: Adversarial debate and voting mechanisms in llm-based multi-agents.Applied Sciences, 15(7):3676, 2025

Yi Yang, Yitong Ma, Hao Feng, Yiming Cheng, and Zhu Han. Minimizing hallucinations and communication costs: Adversarial debate and voting mechanisms in llm-based multi-agents.Applied Sciences, 15(7):3676, 2025

work page 2025
[54]

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Jiaxing Song, Ke Xu, and Qi Li. Jailbreak attacks and defenses against large language models: A survey.arXiv preprint arXiv:2407.04295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

Blockchain for network service or- chestration: Trust and adoption in multi-domain envi- ronments.IEEE Communications Standards Magazine, 7(2):16–22, 2023

Engin Zeydan, Jorge Baranda, Josep Mangues-Bafalluy, and Yekta Turk. Blockchain for network service or- chestration: Trust and adoption in multi-domain envi- ronments.IEEE Communications Standards Magazine, 7(2):16–22, 2023

work page 2023
[56]

Which agent causes task failures and when? on automated failure attribution of llm multi-agent systems.arXiv preprint arXiv:2505.00212, 2025

Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, et al. Which agent causes task failures and when? on automated failure attribution of llm multi-agent systems.arXiv preprint arXiv:2505.00212, 2025

work page arXiv 2025
[57]

infection

Tommaso Zoppi, Andrea Ceccarelli, and Andrea Bon- davalli. Exploring anomaly detection in systems of systems. InProceedings of the Symposium on Applied Computing, pages 1139–1146, 2017. A Model Fitting and Topology Configuration Details This appendix specifies the configuration and fitting proto- col omitted from §2.3. In this calibration experiment, we o...

work page 2017