A survey on trustworthy llm agents: Threats and countermeasures

Miao Yu, Fanci Meng, Xinyun Zhou, Shilong Wang, Junyuan Mao, Linsey Pang, Tianlong Chen, Kun Wang, Xinfeng Li, Yongfeng Zhang, Bo An, Qingsong Wen · 2025 · arXiv 2503.09648

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

cs.CR · 2025-03-30 · unverdicted · novelty 7.0

MCP lifecycle is defined with four phases and 16 activities; a threat taxonomy of 16 scenarios is constructed, validated via case studies, and paired with phase-specific safeguards.

Capability Advertisement as a Market for Lemons: A Trust Layer for Heterogeneous Agent Networks

cs.MA · 2026-06-02 · unverdicted · novelty 6.0

Proposes a Trust Layer on top of existing agent protocols that adds probabilistic capability descriptors, screening, and reputation to enable separating equilibria and bound delegation reliability.

CHAL: Council of Hierarchical Agentic Language

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.

Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

cs.AI · 2025-08-11 · unverdicted · novelty 6.0

BlindGuard introduces an unsupervised hierarchical agent encoder plus corruption-guided contrastive detector that identifies malicious agents in LLM-based multi-agent systems without any attack labels or prior knowledge of malicious behaviors.

Safety in Self-Evolving LLM Agent Systems: Threats, Amplification, and Case Studies

cs.CR · 2026-06-22 · unverdicted · novelty 5.0

Self-evolving LLM agents introduce persistent, amplifying security threats that static defenses cannot address, as shown by analysis of 25 attack surface cells and case studies.

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

cs.AI · 2025-03-31 · unverdicted · novelty 2.0

This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

citing papers explorer

Showing 7 of 7 citing papers after filters.

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions cs.CR · 2025-03-30 · unverdicted · none · ref 78
MCP lifecycle is defined with four phases and 16 activities; a threat taxonomy of 16 scenarios is constructed, validated via case studies, and paired with phase-specific safeguards.
Capability Advertisement as a Market for Lemons: A Trust Layer for Heterogeneous Agent Networks cs.MA · 2026-06-02 · unverdicted · none · ref 27
Proposes a Trust Layer on top of existing agent protocols that adds probabilistic capability descriptors, screening, and reputation to enable separating equilibria and bound delegation reliability.
CHAL: Council of Hierarchical Agentic Language cs.AI · 2026-05-12 · unverdicted · none · ref 178
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem cs.SE · 2026-05-08 · unverdicted · none · ref 64
MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks cs.AI · 2025-08-11 · unverdicted · none · ref 23
BlindGuard introduces an unsupervised hierarchical agent encoder plus corruption-guided contrastive detector that identifies malicious agents in LLM-based multi-agent systems without any attack labels or prior knowledge of malicious behaviors.
Safety in Self-Evolving LLM Agent Systems: Threats, Amplification, and Case Studies cs.CR · 2026-06-22 · unverdicted · none · ref 101
Self-evolving LLM agents introduce persistent, amplifying security threats that static defenses cannot address, as shown by analysis of 25 attack surface cells and case studies.
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems cs.AI · 2025-03-31 · unverdicted · none · ref 68
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

A survey on trustworthy llm agents: Threats and countermeasures

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer