GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems

Alfonso S\'anchez-Maci\'an; Pablo Mateo-Torrej\'on

arxiv: 2604.24477 · v1 · submitted 2026-04-27 · 💻 cs.CR · cs.AI· cs.MA

GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems

Pablo Mateo-Torrej\'on , Alfonso S\'anchez-Maci\'an This is my paper

Pith reviewed 2026-05-08 02:38 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.MA

keywords graph-based anomaly detectionLLM multi-agent systemsbenchmarking frameworksynthetic datasetsprompt infectionattack remediationoperational costsnetwork topologies

0 comments

The pith

Gammaf is a benchmarking framework that generates synthetic graphs of LLM agent debates to evaluate anomaly detectors and shows remediation cuts costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Gammaf as an open-source platform to generate synthetic multi-agent interaction datasets modeled as attributed graphs and to benchmark graph-based anomaly detection methods for LLM multi-agent systems. It operates through a training data generation pipeline that simulates debates across varied network topologies and a defense benchmarking pipeline that evaluates models by isolating flagged adversarial nodes during live inference. Experiments with baselines such as XG-Guard and BlindGuard on tasks including MMLU-Pro and GSM8K demonstrate the framework's scalability and efficiency. If the central claims hold, researchers gain a reproducible way to train and compare protections against attacks like prompt infection, leading to more secure and lower-cost collaborative LLM systems.

Core claim

Gammaf is not a new defense but a comprehensive evaluation architecture with two interdependent pipelines: a Training Data Generation stage that simulates debates to capture interactions as robust attributed graphs, and a Defense System Benchmarking stage that actively evaluates models by dynamically isolating flagged adversarial nodes during live inference rounds. Rigorous evaluation across multiple knowledge tasks and network topologies confirms high utility, topological scalability, and execution efficiency. The results further establish that equipping an LLM-MAS with effective attack remediation recovers system integrity while substantially reducing operational costs by facilitating早共识和切

What carries the argument

The Gammaf framework's two pipelines for generating synthetic attributed graphs from simulated multi-agent debates and for benchmarking defenses by isolating flagged nodes in live rounds.

If this is right

Researchers gain a standardized, reproducible environment to train graph-based anomaly detectors for LLM multi-agent systems.
Defense models can be tested dynamically by isolating adversarial nodes during ongoing agent interactions.
Effective remediation leads to early consensus and lower overall token consumption in LLM multi-agent operations.
The framework scales across different network topologies and standard knowledge tasks while maintaining execution efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the synthetic data proves representative, Gammaf could accelerate practical deployment of secure LLM multi-agent applications beyond lab settings.
The observed cost reductions suggest that integrating graph-based monitoring may yield economic benefits for large-scale LLM systems.
Extending the framework to additional attack vectors could help address vulnerabilities not fully captured in the current debate simulations.

Load-bearing premise

The synthetic multi-agent interaction datasets generated by simulating debates across varied network topologies accurately represent real-world adversarial behaviors and vulnerabilities such as prompt infection in actual LLM-MAS deployments.

What would settle it

Running the same defense models on Gammaf-generated data versus real deployment logs of LLM multi-agent systems containing documented prompt infections and measuring whether detection accuracy and cost savings match.

Figures

Figures reproduced from arXiv: 2604.24477 by Alfonso S\'anchez-Maci\'an, Pablo Mateo-Torrej\'on.

**Figure 1.** Figure 1: Example of debate setup for collaboration in a LLM-MAS. Agents exchange natural language discourse to reach a consensus on a specific task. The diagram illustrates how the communication structure constrains information flow, requiring agents to synthesize the logical reasoning of their neighbors to update their internal context. The increase in performance of large language models (LLMs) led to their wides… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed framework for multi-agent adversarial synthesis and defense benchmarking. The pipeline is divided into (Left) a training data generation phase using varied network topologies (Chain, Tree, Random) to produce text and numerical embeddings, and (Right) a defense evaluation phase where models are benchmarked through iterative rounds of live debate and malicious agent pruning. The comp… view at source ↗

**Figure 3.** Figure 3: Flow of a debate cycle in a defense-enabled LLM-MAS collaboration execution. During the Inference Phase, agents (benign in blue and adversarial in red) generate initial responses. In the Anomaly Evaluation phase, the Defender model marks the suspicious nodes (indicated with flags). The Pruning Phase isolates those agents identified by the Defender by removing all their incoming and outgoing communication e… view at source ↗

**Figure 4.** Figure 4: Architectural overview of the GAMMAF evaluation pipeline. The process begins within the Evaluation Space, where tasks and network topologies are paired. The resulting agent discourse and adjacency matrix are processed by the Defender, which generates anomaly scores and assigns flags to suspicious nodes. Finally, the system isolates flagged agents by pruning their edges, updating the topology for the subseq… view at source ↗

**Figure 5.** Figure 5: Evolution of the integrity of the system over three dialogue turns, measured in terms of Attack Success Rate (left), Un-Flagged Attack Success Rate (center) and Attack Infection Rate (right). Results shown correspond to the evaluation stage over the MMLU-Pro dataset. Agents that fail to provide the correct answer at the end of each round are considered under attack. 11 view at source ↗

**Figure 6.** Figure 6: Inference metrics fetched from vLLM during the execution of GAMMAF for random MAS setups with different number of agents. During the evaluation stage the number of attacker agents was set right below 50% of the size of the network. The upper plot shows the sum of inference time, where the time taken to process concurrent requests gets aggregated. The lower plot shows the total token usage, aggregating prom… view at source ↗

**Figure 7.** Figure 7: Total number of request to vLLM during the execution of GAMMAF for random MAS setups with different number of agents. During the data generation stage no malicious agents are present in the MAS, and during the evaluation stage the number of attacking agents is set to right below 50% of the total. As illustrated in view at source ↗

**Figure 8.** Figure 8: Execution efficiency statistics across varying concurrency levels. The concurrency level denotes the maximum number of simultaneous requests permitted to the LLM API (vLLM). (Left): Comparative evolution of total vLLM inference time and the measured duration of the evaluation stage as concurrency increases. The gray bars represent the KV prefill memory token share recorded for each experiment. Dashed line … view at source ↗

**Figure 9.** Figure 9: LLM inference cost measured in total inference tokens at vLLM for each of the defense methods benchmarked during the evaluation stage. The different subplots show the results for networks with a growing number of attacker agents. The different bars show the evolution of the inference cost share over three debate rounds. All the tests were performed over random agent networks with a total of 10 members. The… view at source ↗

read the original abstract

The rapid integration of Large Language Models (LLMs) into Multi-Agent Systems (MAS) has significantly enhanced their collaborative problem-solving capabilities, but it has also expanded their attack surfaces, exposing them to vulnerabilities such as prompt infection and compromised inter-agent communication. While emerging graph-based anomaly detection methods show promise in protecting these networks, the field currently lacks a standardized, reproducible environment to train these models and evaluate their efficacy. To address this gap, we introduce Gammaf (Graph-based Anomaly Monitoring for LLM Multi-Agent systems Framework), an open-source benchmarking platform. Gammaf is not a novel defense mechanism itself, but rather a comprehensive evaluation architecture designed to generate synthetic multi-agent interaction datasets and benchmark the performance of existing and future defense models. The proposed framework operates through two interdependent pipelines: a Training Data Generation stage, which simulates debates across varied network topologies to capture interactions as robust attributed graphs, and a Defense System Benchmarking stage, which actively evaluates defense models by dynamically isolating flagged adversarial nodes during live inference rounds. Through rigorous evaluation using established defense baselines (XG-Guard and BlindGuard) across multiple knowledge tasks (such as MMLU-Pro and GSM8K), we demonstrate Gammaf's high utility, topological scalability, and execution efficiency. Furthermore, our experimental results reveal that equipping an LLM-MAS with effective attack remediation not only recovers system integrity but also substantially reduces overall operational costs by facilitating early consensus and cutting off the extensive token generation typical of adversarial agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAMMAF supplies a synthetic graph generation and benchmarking platform for anomaly detection in LLM multi-agent systems, but its cost-saving claims rest on uncalibrated simulations.

read the letter

Your colleague should know that this paper introduces GAMMAF as an open-source evaluation architecture rather than a new defense. It includes two pipelines—one that simulates multi-agent debates over varied topologies and turns them into attributed graphs for training data, and another that runs live benchmarking by isolating flagged nodes with existing detectors like XG-Guard and BlindGuard on tasks such as MMLU-Pro and GSM8K. The work reports that the setup scales, runs efficiently, and that remediation cuts operational costs by enabling early consensus and limiting adversarial token output.

Referee Report

3 major / 2 minor

Summary. The paper introduces GAMMAF, an open-source benchmarking framework for graph-based anomaly monitoring in LLM multi-agent systems. It consists of two pipelines: a training data generation stage that simulates debates across varied network topologies to produce attributed graphs capturing agent interactions, and a defense benchmarking stage that evaluates existing anomaly detection models (e.g., XG-Guard, BlindGuard) by dynamically isolating flagged adversarial nodes during live inference on tasks such as MMLU-Pro and GSM8K. The work claims the framework demonstrates high utility, topological scalability, and execution efficiency, and that effective attack remediation recovers system integrity while substantially reducing operational costs via early consensus and curtailed token generation by adversarial agents.

Significance. If the synthetic data generation accurately captures real adversarial dynamics, GAMMAF could fill a needed gap by providing a reproducible platform for training and evaluating graph-based defenses in LLM-MAS, an area with growing practical importance. The open-source release and focus on both data synthesis and dynamic benchmarking are strengths that could accelerate progress; the reported cost-reduction observation, if substantiated, would add practical value by linking anomaly remediation to efficiency gains.

major comments (3)

[Abstract] Abstract and evaluation description: the central claim that remediation 'substantially reduces overall operational costs' by 'facilitating early consensus and cutting off the extensive token generation' is load-bearing for the paper's utility argument, yet no quantitative details are supplied on cost measurement (e.g., token counts, latency, or monetary proxies), percentage savings, or statistical tests comparing remediated vs. unremediated runs.
[Training Data Generation pipeline] Training Data Generation pipeline: the framework's value for benchmarking rests on the assumption that simulated debates and adversarial behaviors (excessive token output, consensus disruption) faithfully model real-world prompt infection and inter-agent compromise, but the manuscript contains no calibration against actual LLM-MAS attack traces, production logs, or sensitivity analysis on simulation parameters.
[Defense System Benchmarking stage] Defense System Benchmarking stage and results: positive outcomes are asserted for baselines on MMLU-Pro and GSM8K, but the text provides no concrete metrics (precision/recall/F1 for anomaly detection, post-remediation task accuracy, runtime overhead), variance across topologies or random seeds, or ablation on isolation mechanisms, preventing assessment of the claimed scalability and efficiency.

minor comments (2)

[Abstract] The expansion of the GAMMAF acronym is given in the title but could be restated explicitly on first use in the abstract for standalone readability.
[Figures/Tables] Figure and table captions (if present in the full manuscript) should explicitly state the number of simulation runs and topology variants used to support claims of topological scalability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript on GAMMAF. The feedback highlights important areas for strengthening the presentation of our claims, particularly around quantitative evidence and validation of the simulation approach. We address each major comment below and commit to revisions that will enhance the rigor and clarity of the work without altering its core contributions.

read point-by-point responses

Referee: [Abstract] Abstract and evaluation description: the central claim that remediation 'substantially reduces overall operational costs' by 'facilitating early consensus and cutting off the extensive token generation' is load-bearing for the paper's utility argument, yet no quantitative details are supplied on cost measurement (e.g., token counts, latency, or monetary proxies), percentage savings, or statistical tests comparing remediated vs. unremediated runs.

Authors: We agree that the cost-reduction claim requires stronger quantitative backing to support the utility argument. While the full experimental results section includes comparative efficiency observations from the benchmarking pipeline, we will revise both the abstract and the results to incorporate explicit measurements: average token counts per agent and per run, latency reductions, monetary cost proxies based on standard LLM API pricing, percentage savings, and statistical significance tests (e.g., paired t-tests) across remediated and unremediated conditions. These additions will be presented in tables for direct comparison. revision: yes
Referee: [Training Data Generation pipeline] Training Data Generation pipeline: the framework's value for benchmarking rests on the assumption that simulated debates and adversarial behaviors (excessive token output, consensus disruption) faithfully model real-world prompt infection and inter-agent compromise, but the manuscript contains no calibration against actual LLM-MAS attack traces, production logs, or sensitivity analysis on simulation parameters.

Authors: The simulation parameters draw from documented adversarial patterns in the LLM-MAS security literature, such as excessive token generation and consensus disruption. Direct calibration against proprietary production logs or real attack traces is not feasible due to limited public availability of such data. However, we will add a dedicated sensitivity analysis section varying key parameters (e.g., adversarial injection rates, topology density, and debate length) to demonstrate dataset robustness. We will also explicitly discuss the limitations of synthetic data and outline pathways for future validation with real traces. revision: partial
Referee: [Defense System Benchmarking stage] Defense System Benchmarking stage and results: positive outcomes are asserted for baselines on MMLU-Pro and GSM8K, but the text provides no concrete metrics (precision/recall/F1 for anomaly detection, post-remediation task accuracy, runtime overhead), variance across topologies or random seeds, or ablation on isolation mechanisms, preventing assessment of the claimed scalability and efficiency.

Authors: We acknowledge that the current results presentation could be more granular to allow full assessment of the claimed scalability and efficiency. The manuscript reports overall positive outcomes for XG-Guard and BlindGuard, but we will expand the evaluation section with concrete metrics: precision, recall, and F1 scores for anomaly detection; post-remediation task accuracy on MMLU-Pro and GSM8K; runtime overhead; variance (standard deviations) across multiple random seeds and network topologies; and an ablation study on the dynamic isolation mechanism. These will be added as tables and figures to substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: Gammaf is a self-contained benchmarking framework with independent synthetic pipelines and external baselines.

full rationale

The paper introduces Gammaf as an evaluation architecture with two pipelines (synthetic debate graph generation across topologies, then live benchmarking of existing defenses like XG-Guard and BlindGuard on tasks such as MMLU-Pro and GSM8K). No equations, fitted parameters, or derivations appear that reduce by construction to inputs; the cost-reduction observation is an empirical outcome measured inside the simulations rather than a renamed prediction. Baselines are cited as established external methods without load-bearing self-citation chains or uniqueness theorems imported from the authors' prior work. The framework does not define its own success metrics in terms of its outputs, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The framework rests on standard assumptions about graph representations of agent interactions and the utility of synthetic data for benchmarking; no free parameters, new axioms, or invented entities are introduced in the abstract description.

pith-pipeline@v0.9.0 · 5577 in / 1280 out tokens · 21843 ms · 2026-05-08T02:38:14.388705+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Anthropic PBC

Anthropic (2024a).Model Context Protocol (MCP) Specification. Anthropic PBC. Accessed: 2026-01-20. Anthropic (2024b). Multi-agent research system.https://www.anthropic.com/research

2026
[2]

Oprea, A. (2024). Phantom: General trigger attacks on retrieval augmented language generation.arXiv preprint arXiv:2405.20485

work page arXiv 2024
[3]

Oprea, A. (2025). Phantom: General backdoor attacks on retrieval augmented language generation

2025
[4]

Chen, W., You, Z., Li, R., Guan, Y ., Qian, C., Zhao, C., Yang, C., Xie, R., Liu, Z., and Sun, M. (2024). Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence

2024
[5]

Hesse, C., and Schulman, J. (2021). Training verifiers to solve math word problems

2021
[6]

K., and Kumar, S

Ehtesham, A., Singh, A., Gupta, G. K., and Kumar, S. (2025). A survey of agent interoperability protocols: Model context protocol (mcp), agent communication protocol (acp), agent-to-agent protocol (a2a), and agent network protocol (anp)

2025
[7]

Gu, X., Zheng, X., Pang, T., Du, C., Liu, Q., Wang, Y ., Jiang, J., and Lin, M. (2024). Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast

2024
[8]

He, P., Dai, Z., Tang, X., Xing, Y ., Liu, H., Zeng, J., Peng, Q., Agrawal, S., Varshney, S., Wang, S., et al. (2025a). Attention knows whom to trust: Attention-based trust management for llm multi-agent systems.arXiv preprint arXiv:2506.02546

work page internal anchor Pith review Pith/arXiv arXiv
[9]

He, P., Lin, Y ., Dong, S., Xu, H., Xing, Y ., and Liu, H. (2025c). Red-teaming llm multi-agent systems via communication attacks. InFindings of the Association for Computational Linguistics: ACL 2025, pages 6726–6747. Association for Computational Linguistics

2025
[10]

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. (2021). Measuring massive multitask language understanding

2021
[11]

Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y ., Tontchev, M., Hu, Q., Fuller, B., Testuggine, D., et al. (2023). Llama guard: Llm-based input-output safeguard for human-ai conversations

2023
[12]

Ju, T., Wang, Y ., Ma, X., Cheng, P., Zhao, H., Wang, Y ., Liu, L., Xie, J., Zhang, Z., and Liu, G. (2024). Flooding spread of manipulated knowledge in llm-based multi-agent communities.arXiv preprint arXiv:2407.07791

work page arXiv 2024
[13]

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al. (2023). Chatgpt for good? on opportunities and challenges of large language models for education.Learning and individual differences, 103:102274

2023
[14]

H., Gonzalez, J

Kwon, W., Li, Z., Zhuang, S., Sheng, Y ., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I. (2023). Efficient memory management for large language model serving with pagedattention

2023
[15]

and Tiwari, M

Lee, D. and Tiwari, M. (2024). Prompt infection: Llm-to-llm prompt injection within multi-agent systems

2024
[16]

Li, G., Hammoud, H. A. A. K., Itani, H., Khizbullin, D., and Ghanem, B. (2023). Camel: Communicative agents for "mind" exploration of large language model society. InAdvances in Neural Information Processing Systems

2023
[17]

Mialon, G., Fourrier, C., Wolf, T., LeCun, Y ., and Scialom, T. (2023). Gaia: a benchmark for general ai assistants. In The Twelfth International Conference on Learning Representations

2023
[18]

Miao, R., Liu, Y ., Wang, Y ., Shen, X., Tan, Y ., Dai, Y ., Pan, S., and Wang, X. (2025). Blindguard: Safeguarding llm-based multi-agent systems under unknown attacks. Microsoft (2023). Microsoft copilot.https://www.microsoft.com/en-us/microsoft-copilot. OpenAI (2025). gpt-oss-120b & gpt-oss-20b model card

2025
[19]

Pan, J., Liu, Y ., Miao, R., Ding, K., Zheng, Y ., Nguyen, Q. V . H., Liew, A. W.-C., and Pan, S. (2025). Explainable and fine-grained safeguarding of llm multi-agent systems via bi-level graph anomaly detection

2025
[20]

Qian, C., Cong, X., Yang, C., Chen, W., Su, Y ., Xu, J., Liu, Z., and Sun, M. (2023). Communicative agents for software development. 17 A Common Framework for Graph-Based Anomaly Detection on LLM-based Multi-Agent Systems

2023
[21]

Qian, C., Xie, Z., Wang, Y ., Liu, W., Zhu, K., Xia, H., Dang, Y ., Du, Z., Chen, W., Yang, C., Liu, Z., and Sun, M. (2025). Scaling large language model-based multi-agent collaboration. InThe Thirteenth International Conference on Learning Representations

2025
[22]

N., Parisien, C., and Cohen, J

Rebedea, T., Dinu, R., Sreedhar, M. N., Parisien, C., and Cohen, J. (2023). Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. InProceedings of the 2023 conference on empirical methods in natural language processing: system demonstrations, pages 431–445

2023
[23]

and Nadiri, A

Talebirad, Y . and Nadiri, A. (2023). Multi-agent collaboration: Harnessing the power of intelligent llm agents

2023
[24]

Talmor, A., Herzig, J., Lourie, N., and Berant, J. (2019). Commonsenseqa: A question answering challenge targeting commonsense knowledge. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4143–4158

2019
[25]

Wang, S., Zhang, G., Yu, M., Wan, G., Meng, F., Guo, C., Wang, K., and Wang, Y . (2025). G-safeguard: A topology- guided security lens and treatment on llm-based multi-agent systems. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics

2025
[26]

Wang, K., Zhuang, A., Fan, R., Yue, X., and Chen, W. (2024). Mmlu-pro: A more robust and challenging multi-task language understanding benchmark. InAdvances in Neural Information Processing Systems

2024
[27]

H., White, R

Wu, Q., Bansal, G., Zhang, J., Wu, Y ., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A. H., White, R. W., Burger, D., and Wang, C. (2023). Autogen: Enabling next-gen llm applications via multi-agent conversation

2023
[28]

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y ., Zhang, B., Liao, Y ., Shang, C., Cui, J., Xu, Y ., Wen, X., Zheng, T., Zhou, W., Zhao, H., Gui, T., Zhang, Q., and Huang, X. (2025). The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(1):121101

2025
[29]

D., et al

Xiang, Z., Zheng, L., Li, Y ., Hong, J., Li, Q., Xie, H., Zhang, J., Xiong, Z., Xie, C., Bastian, N. D., et al. (2025). Guardagent: safeguard llm agents via knowledge-enabled reasoning. InICML 2025 workshop on computer use agents

2025
[30]

Xie, Y ., Zhu, C., Zhang, X., Zhu, T., Ye, D., Wang, M., and Liu, C. (2025). Who’s the mole? modeling and detecting intention-hiding malicious agents in llm-based multi-agent systems

2025
[31]

Yan, B., Zhou, Z., Zhang, X., Li, C., Zeng, R., Qi, Y ., Wang, T., and Zhang, L. (2025). Attack the messages, not the agents: A multi-round adaptive stealthy tampering framework for llm-mas.arXiv preprint arXiv:2508.03125

work page arXiv 2025
[32]

F., Lu, W., Thirunavukarasu, A

Yang, R., Tan, T. F., Lu, W., Thirunavukarasu, A. J., Ting, D. S. W., and Liu, N. (2023). Large language models in health care: Development, applications, and challenges.Health Care Science, 2(4):255–263

2023
[33]

R., and Cao, Y

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y . (2022). React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

2022
[34]

Yu, M., Meng, F., Zhou, X., Wang, S., Mao, J., Pan, L., Chen, T., Wang, K., Li, X., Zhang, Y ., et al. (2025). A survey on trustworthy llm agents: Threats and countermeasures. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6216–6226

2025
[35]

Yu, M., Wang, S., Zhang, G., Mao, J., Yin, C., Liu, Q., Wen, Q., Wang, K., and Wang, Y . (2024). NetSafe: Exploring the Topological Safety of Multi-agent Networks

2024
[36]

Zeng, Y ., Wu, Y ., Zhang, X., Wang, H., and Wu, Q. (2024). Autodefense: Multi-agent llm defense against jailbreak attacks.arXiv preprint arXiv:2403.04783

work page arXiv 2024
[37]

Zhang, B., Tan, Y ., Shen, Y ., Salem, A., Backes, M., Zannettou, S., and Zhang, Y . (2025a). Breaking agents: Compromising autonomous llm agents through malfunction amplification. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34952–34964

2025
[38]

Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y ., Wang, D., and Wang, Q. (2025b). From allies to adversaries: Manipulating llm tool-calling through adversarial injection. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages ...

2025
[39]

Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y ., Wang, D., and Wang, Q. (2025c). From allies to adversaries: Manipulating llm tool-calling through adversarial injection. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), page 2...

2025
[40]

Zhao, Y ., Xiang, Z., Yin, S., Pang, X., Wang, Y ., and Chen, S. (2024). Made: Malicious agent detection for robust multi-agent collaborative perception. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13817–13823. IEEE

2024
[41]

Zhuge, M., Wang, W., Kirsch, L., Faccio, F., Khizbullin, D., and Schmidhuber, J. (2024). Gptswarm: Language agents as optimizable graphs. InForty-first International Conference on Machine Learning. 19 A Common Framework for Graph-Based Anomaly Detection on LLM-based Multi-Agent Systems A Test results for MMLU and CSQA Dataset Method Topology ASR(↓)UnFlagA...

2024

[1] [1]

Anthropic PBC

Anthropic (2024a).Model Context Protocol (MCP) Specification. Anthropic PBC. Accessed: 2026-01-20. Anthropic (2024b). Multi-agent research system.https://www.anthropic.com/research

2026

[2] [2]

Oprea, A. (2024). Phantom: General trigger attacks on retrieval augmented language generation.arXiv preprint arXiv:2405.20485

work page arXiv 2024

[3] [3]

Oprea, A. (2025). Phantom: General backdoor attacks on retrieval augmented language generation

2025

[4] [4]

Chen, W., You, Z., Li, R., Guan, Y ., Qian, C., Zhao, C., Yang, C., Xie, R., Liu, Z., and Sun, M. (2024). Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence

2024

[5] [5]

Hesse, C., and Schulman, J. (2021). Training verifiers to solve math word problems

2021

[6] [6]

K., and Kumar, S

Ehtesham, A., Singh, A., Gupta, G. K., and Kumar, S. (2025). A survey of agent interoperability protocols: Model context protocol (mcp), agent communication protocol (acp), agent-to-agent protocol (a2a), and agent network protocol (anp)

2025

[7] [7]

Gu, X., Zheng, X., Pang, T., Du, C., Liu, Q., Wang, Y ., Jiang, J., and Lin, M. (2024). Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast

2024

[8] [8]

He, P., Dai, Z., Tang, X., Xing, Y ., Liu, H., Zeng, J., Peng, Q., Agrawal, S., Varshney, S., Wang, S., et al. (2025a). Attention knows whom to trust: Attention-based trust management for llm multi-agent systems.arXiv preprint arXiv:2506.02546

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

He, P., Lin, Y ., Dong, S., Xu, H., Xing, Y ., and Liu, H. (2025c). Red-teaming llm multi-agent systems via communication attacks. InFindings of the Association for Computational Linguistics: ACL 2025, pages 6726–6747. Association for Computational Linguistics

2025

[10] [10]

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. (2021). Measuring massive multitask language understanding

2021

[11] [11]

Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y ., Tontchev, M., Hu, Q., Fuller, B., Testuggine, D., et al. (2023). Llama guard: Llm-based input-output safeguard for human-ai conversations

2023

[12] [12]

Ju, T., Wang, Y ., Ma, X., Cheng, P., Zhao, H., Wang, Y ., Liu, L., Xie, J., Zhang, Z., and Liu, G. (2024). Flooding spread of manipulated knowledge in llm-based multi-agent communities.arXiv preprint arXiv:2407.07791

work page arXiv 2024

[13] [13]

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., et al. (2023). Chatgpt for good? on opportunities and challenges of large language models for education.Learning and individual differences, 103:102274

2023

[14] [14]

H., Gonzalez, J

Kwon, W., Li, Z., Zhuang, S., Sheng, Y ., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I. (2023). Efficient memory management for large language model serving with pagedattention

2023

[15] [15]

and Tiwari, M

Lee, D. and Tiwari, M. (2024). Prompt infection: Llm-to-llm prompt injection within multi-agent systems

2024

[16] [16]

Li, G., Hammoud, H. A. A. K., Itani, H., Khizbullin, D., and Ghanem, B. (2023). Camel: Communicative agents for "mind" exploration of large language model society. InAdvances in Neural Information Processing Systems

2023

[17] [17]

Mialon, G., Fourrier, C., Wolf, T., LeCun, Y ., and Scialom, T. (2023). Gaia: a benchmark for general ai assistants. In The Twelfth International Conference on Learning Representations

2023

[18] [18]

Miao, R., Liu, Y ., Wang, Y ., Shen, X., Tan, Y ., Dai, Y ., Pan, S., and Wang, X. (2025). Blindguard: Safeguarding llm-based multi-agent systems under unknown attacks. Microsoft (2023). Microsoft copilot.https://www.microsoft.com/en-us/microsoft-copilot. OpenAI (2025). gpt-oss-120b & gpt-oss-20b model card

2025

[19] [19]

Pan, J., Liu, Y ., Miao, R., Ding, K., Zheng, Y ., Nguyen, Q. V . H., Liew, A. W.-C., and Pan, S. (2025). Explainable and fine-grained safeguarding of llm multi-agent systems via bi-level graph anomaly detection

2025

[20] [20]

Qian, C., Cong, X., Yang, C., Chen, W., Su, Y ., Xu, J., Liu, Z., and Sun, M. (2023). Communicative agents for software development. 17 A Common Framework for Graph-Based Anomaly Detection on LLM-based Multi-Agent Systems

2023

[21] [21]

Qian, C., Xie, Z., Wang, Y ., Liu, W., Zhu, K., Xia, H., Dang, Y ., Du, Z., Chen, W., Yang, C., Liu, Z., and Sun, M. (2025). Scaling large language model-based multi-agent collaboration. InThe Thirteenth International Conference on Learning Representations

2025

[22] [22]

N., Parisien, C., and Cohen, J

Rebedea, T., Dinu, R., Sreedhar, M. N., Parisien, C., and Cohen, J. (2023). Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. InProceedings of the 2023 conference on empirical methods in natural language processing: system demonstrations, pages 431–445

2023

[23] [23]

and Nadiri, A

Talebirad, Y . and Nadiri, A. (2023). Multi-agent collaboration: Harnessing the power of intelligent llm agents

2023

[24] [24]

Talmor, A., Herzig, J., Lourie, N., and Berant, J. (2019). Commonsenseqa: A question answering challenge targeting commonsense knowledge. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4143–4158

2019

[25] [25]

Wang, S., Zhang, G., Yu, M., Wan, G., Meng, F., Guo, C., Wang, K., and Wang, Y . (2025). G-safeguard: A topology- guided security lens and treatment on llm-based multi-agent systems. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics

2025

[26] [26]

Wang, K., Zhuang, A., Fan, R., Yue, X., and Chen, W. (2024). Mmlu-pro: A more robust and challenging multi-task language understanding benchmark. InAdvances in Neural Information Processing Systems

2024

[27] [27]

H., White, R

Wu, Q., Bansal, G., Zhang, J., Wu, Y ., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A. H., White, R. W., Burger, D., and Wang, C. (2023). Autogen: Enabling next-gen llm applications via multi-agent conversation

2023

[28] [28]

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y ., Zhang, B., Liao, Y ., Shang, C., Cui, J., Xu, Y ., Wen, X., Zheng, T., Zhou, W., Zhao, H., Gui, T., Zhang, Q., and Huang, X. (2025). The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(1):121101

2025

[29] [29]

D., et al

Xiang, Z., Zheng, L., Li, Y ., Hong, J., Li, Q., Xie, H., Zhang, J., Xiong, Z., Xie, C., Bastian, N. D., et al. (2025). Guardagent: safeguard llm agents via knowledge-enabled reasoning. InICML 2025 workshop on computer use agents

2025

[30] [30]

Xie, Y ., Zhu, C., Zhang, X., Zhu, T., Ye, D., Wang, M., and Liu, C. (2025). Who’s the mole? modeling and detecting intention-hiding malicious agents in llm-based multi-agent systems

2025

[31] [31]

Yan, B., Zhou, Z., Zhang, X., Li, C., Zeng, R., Qi, Y ., Wang, T., and Zhang, L. (2025). Attack the messages, not the agents: A multi-round adaptive stealthy tampering framework for llm-mas.arXiv preprint arXiv:2508.03125

work page arXiv 2025

[32] [32]

F., Lu, W., Thirunavukarasu, A

Yang, R., Tan, T. F., Lu, W., Thirunavukarasu, A. J., Ting, D. S. W., and Liu, N. (2023). Large language models in health care: Development, applications, and challenges.Health Care Science, 2(4):255–263

2023

[33] [33]

R., and Cao, Y

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y . (2022). React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

2022

[34] [34]

Yu, M., Meng, F., Zhou, X., Wang, S., Mao, J., Pan, L., Chen, T., Wang, K., Li, X., Zhang, Y ., et al. (2025). A survey on trustworthy llm agents: Threats and countermeasures. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6216–6226

2025

[35] [35]

Yu, M., Wang, S., Zhang, G., Mao, J., Yin, C., Liu, Q., Wen, Q., Wang, K., and Wang, Y . (2024). NetSafe: Exploring the Topological Safety of Multi-agent Networks

2024

[36] [36]

Zeng, Y ., Wu, Y ., Zhang, X., Wang, H., and Wu, Q. (2024). Autodefense: Multi-agent llm defense against jailbreak attacks.arXiv preprint arXiv:2403.04783

work page arXiv 2024

[37] [37]

Zhang, B., Tan, Y ., Shen, Y ., Salem, A., Backes, M., Zannettou, S., and Zhang, Y . (2025a). Breaking agents: Compromising autonomous llm agents through malfunction amplification. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34952–34964

2025

[38] [38]

Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y ., Wang, D., and Wang, Q. (2025b). From allies to adversaries: Manipulating llm tool-calling through adversarial injection. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages ...

2025

[39] [39]

Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y ., Wang, D., and Wang, Q. (2025c). From allies to adversaries: Manipulating llm tool-calling through adversarial injection. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), page 2...

2025

[40] [40]

Zhao, Y ., Xiang, Z., Yin, S., Pang, X., Wang, Y ., and Chen, S. (2024). Made: Malicious agent detection for robust multi-agent collaborative perception. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13817–13823. IEEE

2024

[41] [41]

Zhuge, M., Wang, W., Kirsch, L., Faccio, F., Khizbullin, D., and Schmidhuber, J. (2024). Gptswarm: Language agents as optimizable graphs. InForty-first International Conference on Machine Learning. 19 A Common Framework for Graph-Based Anomaly Detection on LLM-based Multi-Agent Systems A Test results for MMLU and CSQA Dataset Method Topology ASR(↓)UnFlagA...

2024