BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

Rui Miao; Shirui Pan; Xin Wang; Xu Shen; Yili Wang; Yiwei Dai; Yixin Liu; Yue Tan

arxiv: 2508.08127 · v2 · submitted 2025-08-11 · 💻 cs.AI

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

Rui Miao , Yixin Liu , Yili Wang , Xu Shen , Yue Tan , Yiwei Dai , Shirui Pan , Xin Wang This is my paper

Pith reviewed 2026-05-18 23:34 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent LLM systemsunsupervised defenseprompt injectionmemory poisoningtool attackcontrastive learninghierarchical encoderattack detection

0 comments

The pith

An unsupervised method can safeguard multi-agent LLM systems against unknown attacks by learning solely from normal agent interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The security of systems where multiple large language models collaborate is at risk because a single compromised agent can influence the entire group's decisions through shared messages. Traditional defenses require examples of attacks to train detectors, but new or unknown attacks make this approach unworkable in practice. The paper introduces an unsupervised alternative that builds a model using only data from safe operations. It does this with a layered encoder that examines each agent's own actions, its neighbors' messages, and the overall system state, plus a training process that corrupts normal data and uses contrast to learn distinctions. Experiments indicate this setup identifies several kinds of attacks in different system layouts more reliably than methods that depend on attack labels.

Core claim

The paper claims that a hierarchical agent encoder capturing individual, neighborhood, and global interaction patterns combined with a corruption-guided detector consisting of directional noise injection and contrastive learning allows effective training solely on normal agent behaviors for detecting malicious agents across diverse attack types and communication patterns with superior generalizability to supervised baselines.

What carries the argument

The hierarchical agent encoder and corruption-guided detector with directional noise injection and contrastive learning that together enable learning a detection model from normal behaviors alone.

If this is right

Detects prompt injection, memory poisoning, and tool attacks in multi-agent LLM systems.
Applies effectively to systems with various communication patterns.
Provides better generalizability than supervised detection methods.
Functions without any attack-specific labels or prior knowledge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could enable ongoing security in evolving threat environments without the need to gather new labeled attack data for each emerging threat.
The contrastive approach on corrupted normal data may generalize to anomaly detection in other collaborative or distributed AI setups.

Load-bearing premise

Patterns learned from normal agent behaviors using directional noise injection and contrastive learning will reliably separate unknown malicious behaviors without exposure to attack examples.

What would settle it

If a new attack type or communication pattern not present in the normal training data causes the detector to fail in identifying malicious agents at rates better than chance or supervised alternatives.

Figures

Figures reproduced from arXiv: 2508.08127 by Rui Miao, Shirui Pan, Xin Wang, Xu Shen, Yili Wang, Yiwei Dai, Yixin Liu, Yue Tan.

**Figure 2.** Figure 2: The designing workflow of our proposed BlindGuard. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The overall performance of MAS on the CSQA [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: ASR@3 with DeepSeek-V3 and Qwen3-30B-A3B as backbone LLMs on the CSQA and PoisonRAG datasets. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation study on PoisonRAG. NL and GL denote [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The AUC with DeepSeek-V3 and Qwen3-30B-A3B as backbone LLMs on the CSQA and PoisonRAG datasets. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: The overall performance of MAS on the CSQA [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

read the original abstract

The security of LLM-based multi-agent systems (MAS) is critically threatened by propagation vulnerability, where malicious agents can distort collective decision-making through inter-agent message interactions. While existing supervised defense methods demonstrate promising performance, they may be impractical in real-world scenarios due to their heavy reliance on labeled malicious agents to train a supervised malicious detection model. To enable practical and generalizable MAS defenses, in this paper, we propose BlindGuard, an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors. To this end, we establish a hierarchical agent encoder to capture individual, neighborhood, and global interaction patterns of each agent, providing a comprehensive understanding for malicious agent detection. Meanwhile, we design a corruption-guided detector that consists of directional noise injection and contrastive learning, allowing effective detection model training solely on normal agent behaviors. Extensive experiments show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across MAS with various communication patterns while maintaining superior generalizability compared to supervised baselines. The code is available at: https://github.com/MR9812/BlindGuard.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BlindGuard gives a practical unsupervised route to MAS attack detection by training only on normal behaviors, but the noise-based separation needs real verification against actual attacks.

read the letter

The main takeaway is that this paper offers an unsupervised defense for LLM multi-agent systems that avoids the labeled attack data problem. They build a hierarchical encoder to model each agent at individual, neighborhood, and global levels, then train a detector with directional noise injection and contrastive learning on clean behaviors only. That framing is new for this security setting, where prior work cited in the abstract depends on supervised labels that are often unavailable at deployment. The code release helps anyone who wants to reproduce or extend it. Experiments are said to cover prompt injection, memory poisoning, and tool attacks across different communication patterns, with claims of better generalizability than supervised baselines. That addresses a clear practical gap. The soft spot is the core assumption that injected directional noise will reliably push embeddings toward the directions real attacks take. If an attack alters message patterns in ways that sit outside those corrupted directions, the contrastive boundary trained only on normals could miss it. The abstract reports strong results but gives little on exact baseline implementations, statistical tests, or ablation details, so those performance numbers need checking before the generalizability claim lands. This work is aimed at people building or hardening multi-agent LLM applications where attack data is scarce or changing. A reader focused on practical AI safety would get usable ideas from the method and the released code. It deserves peer review because the unsupervised angle is timely and the setup is described clearly enough to evaluate, even if the experiments will require closer scrutiny on controls and edge cases.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes BlindGuard, an unsupervised defense for LLM-based multi-agent systems (MAS) against unknown attacks including prompt injection, memory poisoning, and tool attacks. It introduces a hierarchical agent encoder capturing individual, neighborhood, and global interaction patterns, paired with a corruption-guided detector that applies directional noise injection and contrastive learning trained solely on normal agent behaviors. Experiments are reported to show effective detection across MAS with varied communication patterns and superior generalizability relative to supervised baselines, with code released.

Significance. If the central results hold, the work offers a practical unsupervised alternative to label-dependent supervised defenses, addressing propagation vulnerabilities in MAS where attack-specific data is unavailable. The emphasis on generalizability and the public code release are positive for reproducibility and real-world applicability.

major comments (2)

[corruption-guided detector and hierarchical encoder sections] The soundness of the central claim rests on the corruption-guided detector (directional noise injection + contrastive learning) producing a decision boundary that separates real unknown attacks. This requires that attack-induced deviations in the hierarchical embeddings align with the directions and magnitudes of the injected noise; if attacks produce orthogonal or higher-order changes in inter-agent patterns, the unsupervised objective trained only on corrupted normals will not generalize. The manuscript provides no explicit analysis (e.g., embedding visualizations, cosine similarity distributions, or ablation on noise directionality) to verify this alignment for the three attack types.
[Experiments section] The abstract and experimental claims of 'superior generalizability' and effective detection across communication patterns require stronger statistical grounding. Details on the number of independent runs, variance reporting, significance tests against baselines, and whether baselines were re-implemented with the same hierarchical encoder are needed to rule out confounding from hyperparameter tuning or dataset-specific effects.

minor comments (2)

[Abstract] The abstract states results for 'MAS with various communication patterns' but does not enumerate or define those patterns; adding a brief taxonomy or reference to the experimental setup would improve clarity.
[Method] Notation for the hierarchical representations (individual/neighborhood/global) should be introduced with explicit equations or a diagram early in the method section to aid readers in following the contrastive loss formulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of our work. We address each major comment below.

read point-by-point responses

Referee: [corruption-guided detector and hierarchical encoder sections] The soundness of the central claim rests on the corruption-guided detector (directional noise injection + contrastive learning) producing a decision boundary that separates real unknown attacks. This requires that attack-induced deviations in the hierarchical embeddings align with the directions and magnitudes of the injected noise; if attacks produce orthogonal or higher-order changes in inter-agent patterns, the unsupervised objective trained only on corrupted normals will not generalize. The manuscript provides no explicit analysis (e.g., embedding visualizations, cosine similarity distributions, or ablation on noise directionality) to verify this alignment for the three attack types.

Authors: We agree that explicit verification of the alignment between attack-induced embedding deviations and the injected noise directions would strengthen the central claim. Although the manuscript demonstrates effective detection across prompt injection, memory poisoning, and tool attacks through extensive experiments, it does not include supporting analyses such as visualizations or ablations. In the revised manuscript we will add t-SNE visualizations of hierarchical embeddings for normal versus attacked agents, cosine similarity distributions comparing corrupted normal samples to real attacks, and an ablation on noise directionality to show that the contrastive objective captures the relevant deviations for all three attack types. revision: yes
Referee: [Experiments section] The abstract and experimental claims of 'superior generalizability' and effective detection across communication patterns require stronger statistical grounding. Details on the number of independent runs, variance reporting, significance tests against baselines, and whether baselines were re-implemented with the same hierarchical encoder are needed to rule out confounding from hyperparameter tuning or dataset-specific effects.

Authors: We appreciate the call for stronger statistical grounding. Our original experiments used 5 independent runs per setting with results reported as means, but we did not include formal significance tests or explicit clarification on baseline re-implementations. In the revision we will report the exact number of runs and variances, add paired t-tests against baselines to establish statistical significance, and confirm that supervised baselines were re-implemented with the same hierarchical encoder to ensure fair comparison and rule out confounding from hyperparameter or dataset effects. revision: yes

Circularity Check

0 steps flagged

No circularity: unsupervised contrastive pipeline is self-contained and does not reduce to fitted inputs or self-citations

full rationale

The paper presents BlindGuard as an unsupervised defense that trains a hierarchical agent encoder on normal behaviors and applies a corruption-guided detector consisting of directional noise injection plus contrastive learning. No equations are shown that define the detection decision boundary or anomaly score as a direct function of parameters fitted from attack data; the method is described as learning representations exclusively from normal agent interactions augmented with noise, then flagging deviations at inference time. The central claim of generalizability to unknown attacks (prompt injection, memory poisoning, tool attacks) is framed as an empirical outcome verified through experiments across communication patterns, not as a mathematical identity or self-referential fit. Any self-citations to prior supervised work are peripheral and not invoked as a uniqueness theorem or load-bearing justification for the unsupervised premise. The derivation chain therefore remains independent of the target malicious behaviors and does not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard unsupervised contrastive learning assumptions plus the modeling choice that directional noise on normal trajectories will proxy unknown attacks. No new physical or mathematical entities are introduced.

free parameters (1)

noise injection parameters
Directional noise scale and contrastive temperature are chosen to shape the detector; exact values are not stated in the abstract.

axioms (1)

domain assumption Normal agent interaction patterns contain sufficient statistical structure to separate unknown malicious deviations via contrastive learning
Invoked in the description of the corruption-guided detector training on normal behaviors only.

pith-pipeline@v0.9.0 · 5744 in / 1307 out tokens · 25379 ms · 2026-05-18T23:34:11.153646+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchical agent encoder to capture individual, neighborhood, and global interaction patterns

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring
cs.MA 2026-05 unverdicted novelty 7.0

CASPIAN introduces unified cross-channel causal monitoring via late-interaction conditional transfer entropy to detect cascade onset and attribute origin, bridge, and amplifier agents in LLM multi-agent systems.
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
cs.CR 2026-05 unverdicted novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation
cs.LG 2026-05 unverdicted novelty 7.0

PropGuard is a propagation-aware framework for LLM-MAS that constructs dual-view spatio-temporal graphs, employs a GE-GRPO inspector to recover suspicious subgraphs, and applies source-guided remediation to lower atta...
When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems
cs.CR 2026-05 unverdicted novelty 6.0

Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of sus...

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · cited by 4 Pith papers · 11 internal anchors

[1]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168. Ding, K.; Li, J.; Bhanushali, R.; and Liu, H

work page internal anchor Pith review Pith/arXiv arXiv
[2]

In Proceedings of the 2019 SIAM international conference on data mining, 594–602

Deep anomaly detection on attributed networks. In Proceedings of the 2019 SIAM international conference on data mining, 594–602. SIAM. Gan, Y .; Yang, Y .; Ma, Z.; He, P.; Zeng, R.; Wang, Y .; Li, Q.; Zhou, C.; Li, S.; Wang, T.; et al

work page 2019
[3]

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents,

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents. arXiv preprint arXiv:2411.09523. Gao, D.; Li, Z.; Pan, X.; Kuang, W.; Ma, Z.; Qian, B.; Wei, F.; Zhang, W.; Xie, Y .; Chen, D.; et al

work page arXiv
[4]

AgentScope: A Flexible yet Robust Multi-Agent Platform,

Agentscope: A flexible yet robust multi-agent platform. arXiv preprint arXiv:2402.14034. Guo, T.; Chen, X.; Wang, Y .; Chang, R.; Pei, S.; Chawla, N. V .; Wiest, O.; and Zhang, X

work page arXiv
[5]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680. He, P.; Dai, Z.; Tang, X.; Xing, Y .; Liu, H.; Zeng, J.; Peng, Q.; Agrawal, S.; Varshney, S.; Wang, S.; et al

work page internal anchor Pith review Pith/arXiv arXiv
[6]

To trust or not to trust: Attention-based Trust Management for LLM Multi-Agent Systems

At- tention Knows Whom to Trust: Attention-based Trust Man- agement for LLM Multi-Agent Systems. arXiv preprint arXiv:2506.02546. Hendrycks, D.; Burns, C.; Basart, S.; Zou, A.; Mazeika, M.; Song, D.; and Steinhardt, J

work page internal anchor Pith review Pith/arXiv arXiv
[7]

In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 12140–12147

Smart- llm: Smart multi-agent robot task planning using large lan- guage models. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 12140–12147. IEEE. Kim, Y .; Park, C.; Jeong, H.; Chan, Y . S.; Xu, X.; McDuff, D.; Lee, H.; Ghassemi, M.; Breazeal, C.; and Park, H. W

work page 2024
[8]

Adam: A Method for Stochastic Optimization

Adam: A method for stochas- tic optimization. arXiv preprint arXiv:1412.6980. Kipf, T. N.; and Welling, M

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Clement Vignac et al

Macm: Utilizing a multi-agent system for condition mining in solving complex mathematical problems. Advances in Neural Information Processing Systems, 37: 53418–53437. Li, S.; Liu, Y .; Chen, Q.; Webb, G. I.; and Pan, S. 2024a. Noise-resilient unsupervised graph representation learning via multi-hop feature quality estimation. In Proceedings of the 33rd A...

work page arXiv
[10]

Li, X.; Zeng, Y .; Xing, X.; Xu, J.; and Xu, X. 2025b. Hedgeagents: A balanced-aware multi-agent financial trad- ing system. In Companion Proceedings of the ACM on Web Conference 2025, 296–305. Li, Y .; Du, Y .; Zhang, J.; Hou, L.; Grabowski, P.; Li, Y .; and Ie, E. 2024c. Improving Multi-Agent Debate with Sparse Communication Topology. In Findings of the...

work page 2025
[11]

DeepSeek-V3 Technical Report

Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437. Liu, Y .; Li, Z.; Pan, S.; Gong, C.; Zhou, C.; and Karypis, G

work page internal anchor Pith review Pith/arXiv arXiv
[12]

arXiv preprint arXiv:2507.21407

Graph- Augmented Large Language Model Agents: Current Progress and Future Prospects. arXiv preprint arXiv:2507.21407. Ma, R.; Pang, G.; Chen, L.; and Van Den Hengel, A

work page arXiv
[13]

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

The landscape of emerging ai agent architectures for reason- ing, planning, and tool calling: A survey. arXiv preprint arXiv:2404.11584. Nazary, F.; Deldjoo, Y .; and Noia, T. d

work page internal anchor Pith review Pith/arXiv arXiv
[14]

In 2023 IEEE International Conference on Data Mining (ICDM), 1253–1258

Prem: A simple yet effective approach for node-level graph anomaly detec- tion. In 2023 IEEE International Conference on Data Mining (ICDM), 1253–1258. IEEE. Qian, C.; Liu, W.; Liu, H.; Chen, N.; Dang, Y .; Li, J.; Yang, C.; Chen, W.; Su, Y .; Cong, X.; et al

work page 2023
[15]

ChatDev: Communicative Agents for Software Development

Chatdev: Com- municative agents for software development. arXiv preprint arXiv:2307.07924. Qiao, H.; and Pang, G

work page internal anchor Pith review Pith/arXiv arXiv
[16]

arXiv preprint arXiv:2409.09957

Deep graph anomaly detection: A survey and new perspectives. arXiv preprint arXiv:2409.09957. Reimers, N.; and Gurevych, I

work page arXiv
[17]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. Shen, X.; Liu, Y .; Dai, Y .; Wang, Y .; Miao, R.; Tan, Y .; Pan, S.; and Wang, X

work page internal anchor Pith review Pith/arXiv arXiv 1908
[18]

arXiv preprint arXiv:2505.23352

Understanding the Information Prop- agation Effects of Communication Topologies in LLM-based Multi-Agent Systems. arXiv preprint arXiv:2505.23352. Talmor, A.; Herzig, J.; Lourie, N.; and Berant, J

work page arXiv
[19]

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

Com- monsenseqa: A question answering challenge targeting com- monsense knowledge. arXiv preprint arXiv:1811.00937. Tran, K.-T.; Dao, D.; Nguyen, M.-D.; Pham, Q.-V .; O’Sullivan, B.; and Nguyen, H. D

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Multi-agent col- laboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322. Wang, S.; Zhang, G.; Yu, M.; Wan, G.; Meng, F.; Guo, C.; Wang, K.; and Wang, Y

work page internal anchor Pith review Pith/arXiv arXiv
[21]

G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems,

G-Safeguard: A Topology- Guided Security Lens and Treatment on LLM-based Multi- agent Systems. arXiv preprint arXiv:2502.11127. Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; and Wein- berger, K

work page arXiv
[22]

Qwen3 Technical Report

Qwen3 technical report. arXiv preprint arXiv:2505.09388. Yu, M.; Meng, F.; Zhou, X.; Wang, S.; Mao, J.; Pang, L.; Chen, T.; Wang, K.; Li, X.; Zhang, Y .; et al

work page internal anchor Pith review Pith/arXiv arXiv
[23]

A survey on trustworthy llm agents: Threats and countermeasures.arXiv preprint arXiv:2503.09648, 2025

A survey on trustworthy llm agents: Threats and countermeasures. arXiv preprint arXiv:2503.09648. Yu, M.; Wang, S.; Zhang, G.; Mao, J.; Yin, C.; Liu, Q.; Wen, Q.; Wang, K.; and Wang, Y

work page arXiv
[24]

Netsafe: Exploring the topological safety of multi- agent networks,

Netsafe: Exploring the topological safety of multi-agent networks. arXiv preprint arXiv:2410.15686. Zhan, Q.; Liang, Z.; Ying, Z.; and Kang, D. 2024a. In- jecAgent: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents. In Findings of the Association for Computational Linguistics ACL 2024 , 10471–10506. Zhan, Q.; Liang, Z...

work page arXiv 2024
[25]

S’more: Structural mixture of residual experts for parameter-efficient llm fine-tuning

Competeai: Understanding the competi- tion behaviors in large language model-based agents. arXiv preprint arXiv:2310.17512. Zhao, Z.; Chai, W.; Wang, X.; Li, B.; Hao, S.; Cao, S.; Ye, T.; and Wang, G

work page arXiv
[26]

Corba: Contagious recursive blocking attacks on multi- agent systems based on large language models,

Corba: Contagious recursive blocking at- tacks on multi-agent systems based on large language models. arXiv preprint arXiv:2502.14529. A. Related Work LLM-based Multi-agent System Recent advances in LLM-based MAS have demonstrated remarkable capabilities in general task-solving. The perfor- mance of MAS is predominantly determined by collabora- tion and c...

work page arXiv 2025
[27]

and tool- handling mechanisms (Zhan et al. 2024a). The most se- vere threats target message-passing mechanisms (Zhou et al. 2025), enabling malicious attackers to implant prejudiced content. NetSafe (Yu et al

work page 2025

[1] [1]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168. Ding, K.; Li, J.; Bhanushali, R.; and Liu, H

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

In Proceedings of the 2019 SIAM international conference on data mining, 594–602

Deep anomaly detection on attributed networks. In Proceedings of the 2019 SIAM international conference on data mining, 594–602. SIAM. Gan, Y .; Yang, Y .; Ma, Z.; He, P.; Zeng, R.; Wang, Y .; Li, Q.; Zhou, C.; Li, S.; Wang, T.; et al

work page 2019

[3] [3]

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents,

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents. arXiv preprint arXiv:2411.09523. Gao, D.; Li, Z.; Pan, X.; Kuang, W.; Ma, Z.; Qian, B.; Wei, F.; Zhang, W.; Xie, Y .; Chen, D.; et al

work page arXiv

[4] [4]

AgentScope: A Flexible yet Robust Multi-Agent Platform,

Agentscope: A flexible yet robust multi-agent platform. arXiv preprint arXiv:2402.14034. Guo, T.; Chen, X.; Wang, Y .; Chang, R.; Pei, S.; Chawla, N. V .; Wiest, O.; and Zhang, X

work page arXiv

[5] [5]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680. He, P.; Dai, Z.; Tang, X.; Xing, Y .; Liu, H.; Zeng, J.; Peng, Q.; Agrawal, S.; Varshney, S.; Wang, S.; et al

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

To trust or not to trust: Attention-based Trust Management for LLM Multi-Agent Systems

At- tention Knows Whom to Trust: Attention-based Trust Man- agement for LLM Multi-Agent Systems. arXiv preprint arXiv:2506.02546. Hendrycks, D.; Burns, C.; Basart, S.; Zou, A.; Mazeika, M.; Song, D.; and Steinhardt, J

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 12140–12147

Smart- llm: Smart multi-agent robot task planning using large lan- guage models. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 12140–12147. IEEE. Kim, Y .; Park, C.; Jeong, H.; Chan, Y . S.; Xu, X.; McDuff, D.; Lee, H.; Ghassemi, M.; Breazeal, C.; and Park, H. W

work page 2024

[8] [8]

Adam: A Method for Stochastic Optimization

Adam: A method for stochas- tic optimization. arXiv preprint arXiv:1412.6980. Kipf, T. N.; and Welling, M

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Clement Vignac et al

Macm: Utilizing a multi-agent system for condition mining in solving complex mathematical problems. Advances in Neural Information Processing Systems, 37: 53418–53437. Li, S.; Liu, Y .; Chen, Q.; Webb, G. I.; and Pan, S. 2024a. Noise-resilient unsupervised graph representation learning via multi-hop feature quality estimation. In Proceedings of the 33rd A...

work page arXiv

[10] [10]

Li, X.; Zeng, Y .; Xing, X.; Xu, J.; and Xu, X. 2025b. Hedgeagents: A balanced-aware multi-agent financial trad- ing system. In Companion Proceedings of the ACM on Web Conference 2025, 296–305. Li, Y .; Du, Y .; Zhang, J.; Hou, L.; Grabowski, P.; Li, Y .; and Ie, E. 2024c. Improving Multi-Agent Debate with Sparse Communication Topology. In Findings of the...

work page 2025

[11] [11]

DeepSeek-V3 Technical Report

Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437. Liu, Y .; Li, Z.; Pan, S.; Gong, C.; Zhou, C.; and Karypis, G

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

arXiv preprint arXiv:2507.21407

Graph- Augmented Large Language Model Agents: Current Progress and Future Prospects. arXiv preprint arXiv:2507.21407. Ma, R.; Pang, G.; Chen, L.; and Van Den Hengel, A

work page arXiv

[13] [13]

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

The landscape of emerging ai agent architectures for reason- ing, planning, and tool calling: A survey. arXiv preprint arXiv:2404.11584. Nazary, F.; Deldjoo, Y .; and Noia, T. d

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

In 2023 IEEE International Conference on Data Mining (ICDM), 1253–1258

Prem: A simple yet effective approach for node-level graph anomaly detec- tion. In 2023 IEEE International Conference on Data Mining (ICDM), 1253–1258. IEEE. Qian, C.; Liu, W.; Liu, H.; Chen, N.; Dang, Y .; Li, J.; Yang, C.; Chen, W.; Su, Y .; Cong, X.; et al

work page 2023

[15] [15]

ChatDev: Communicative Agents for Software Development

Chatdev: Com- municative agents for software development. arXiv preprint arXiv:2307.07924. Qiao, H.; and Pang, G

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

arXiv preprint arXiv:2409.09957

Deep graph anomaly detection: A survey and new perspectives. arXiv preprint arXiv:2409.09957. Reimers, N.; and Gurevych, I

work page arXiv

[17] [17]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. Shen, X.; Liu, Y .; Dai, Y .; Wang, Y .; Miao, R.; Tan, Y .; Pan, S.; and Wang, X

work page internal anchor Pith review Pith/arXiv arXiv 1908

[18] [18]

arXiv preprint arXiv:2505.23352

Understanding the Information Prop- agation Effects of Communication Topologies in LLM-based Multi-Agent Systems. arXiv preprint arXiv:2505.23352. Talmor, A.; Herzig, J.; Lourie, N.; and Berant, J

work page arXiv

[19] [19]

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

Com- monsenseqa: A question answering challenge targeting com- monsense knowledge. arXiv preprint arXiv:1811.00937. Tran, K.-T.; Dao, D.; Nguyen, M.-D.; Pham, Q.-V .; O’Sullivan, B.; and Nguyen, H. D

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Multi-agent col- laboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322. Wang, S.; Zhang, G.; Yu, M.; Wan, G.; Meng, F.; Guo, C.; Wang, K.; and Wang, Y

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems,

G-Safeguard: A Topology- Guided Security Lens and Treatment on LLM-based Multi- agent Systems. arXiv preprint arXiv:2502.11127. Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; and Wein- berger, K

work page arXiv

[22] [22]

Qwen3 Technical Report

Qwen3 technical report. arXiv preprint arXiv:2505.09388. Yu, M.; Meng, F.; Zhou, X.; Wang, S.; Mao, J.; Pang, L.; Chen, T.; Wang, K.; Li, X.; Zhang, Y .; et al

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

A survey on trustworthy llm agents: Threats and countermeasures.arXiv preprint arXiv:2503.09648, 2025

A survey on trustworthy llm agents: Threats and countermeasures. arXiv preprint arXiv:2503.09648. Yu, M.; Wang, S.; Zhang, G.; Mao, J.; Yin, C.; Liu, Q.; Wen, Q.; Wang, K.; and Wang, Y

work page arXiv

[24] [24]

Netsafe: Exploring the topological safety of multi- agent networks,

Netsafe: Exploring the topological safety of multi-agent networks. arXiv preprint arXiv:2410.15686. Zhan, Q.; Liang, Z.; Ying, Z.; and Kang, D. 2024a. In- jecAgent: Benchmarking Indirect Prompt Injections in Tool- Integrated Large Language Model Agents. In Findings of the Association for Computational Linguistics ACL 2024 , 10471–10506. Zhan, Q.; Liang, Z...

work page arXiv 2024

[25] [25]

S’more: Structural mixture of residual experts for parameter-efficient llm fine-tuning

Competeai: Understanding the competi- tion behaviors in large language model-based agents. arXiv preprint arXiv:2310.17512. Zhao, Z.; Chai, W.; Wang, X.; Li, B.; Hao, S.; Cao, S.; Ye, T.; and Wang, G

work page arXiv

[26] [26]

Corba: Contagious recursive blocking attacks on multi- agent systems based on large language models,

Corba: Contagious recursive blocking at- tacks on multi-agent systems based on large language models. arXiv preprint arXiv:2502.14529. A. Related Work LLM-based Multi-agent System Recent advances in LLM-based MAS have demonstrated remarkable capabilities in general task-solving. The perfor- mance of MAS is predominantly determined by collabora- tion and c...

work page arXiv 2025

[27] [27]

and tool- handling mechanisms (Zhan et al. 2024a). The most se- vere threats target message-passing mechanisms (Zhou et al. 2025), enabling malicious attackers to implant prejudiced content. NetSafe (Yu et al

work page 2025