Architecture Matters for Multi-Agent Security
Pith reviewed 2026-05-08 06:47 UTC · model grok-4.3
The pith
Multi-agent AI architectures create attack vulnerabilities not present in single-agent systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across three agentic environments and thirteen architectural configurations, multi-agent systems are more vulnerable than standalone agents in the majority of cases, with attack success rates varying by up to 3.8x at comparable or higher benign accuracy, and no single design is universally safer.
What carries the argument
Stagewise attack evaluations that track planning refusal, execution interception, partial harmful execution, and successful completion, applied while varying agent roles, communication topology, and memory.
If this is right
- Security evaluations of AI agents must include full system architecture rather than testing agents in isolation.
- Attack resistance depends on specific choices for roles, topology, and memory, not just individual agent strength.
- Developers cannot rely on any one multi-agent design as safe across different tasks or threats.
- Further evaluations are needed that move beyond single-agent security properties.
Where Pith is reading between the lines
- Production systems using multiple agents may require architecture-specific testing before deployment to avoid hidden attack surfaces.
- The results raise the question of whether adding more agents tends to widen the security gap unless topology and memory are deliberately restricted.
- Similar stagewise measurements could be applied to other coordination mechanisms, such as tool access or external memory stores, to map additional risks.
Load-bearing premise
The thirteen tested configurations and the attack scenarios chosen for the three environments stand in for the full range of security-relevant designs and real-world threats that multi-agent systems will encounter.
What would settle it
An experiment that applies the same stagewise attack measurements to a broader set of environments or attack methods and finds that single-agent systems are not less vulnerable overall, or that architecture choices no longer produce large differences in success rates, would falsify the central claim.
Figures
read the original abstract
Multi-agent systems (MAS), composed of networks of two or more autonomous AI agents, have become increasingly popular in production deployments, yet introduce security risks that do not arise in single-agent settings. Even if individual agents exhibit robust security, architectural decisions governing their coordination can create attack surfaces that have not been systematically characterized. In this work, we present an empirical study of how MAS design decisions shape the tradeoff between task performance and attack resistance. Across three agentic environments (browser, desktop, and code) and 13 architectural configurations, we use stagewise evaluations that distinguish planning refusal, execution-stage interception, partial harmful execution, and successful attack completion to study three key design choices: (i) agent roles, which determine how authority and responsibility are allocated; (ii) communication topology, which shapes how and when agents interact; and (iii) memory, which determines the context and state visibility accessible to each agent. We find that multi-agent architectures are more vulnerable than standalone agents in the majority of configurations, with attack success rates varying by up to 3.8x at comparable or higher benign accuracy, and that no single design is universally safer. These results motivate the development of further evaluations that move beyond the security properties of a single agent.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts an empirical study of security in multi-agent systems (MAS) versus single-agent baselines across three environments (browser, desktop, code) and 13 architectural configurations that vary agent roles, communication topology, and memory. Using stagewise metrics that separate planning refusal, execution interception, partial harmful execution, and full attack completion, the authors report that MAS designs are more vulnerable than standalone agents in the majority of tested configurations, with attack success rates (ASR) differing by up to 3.8x while maintaining comparable or higher benign accuracy, and that no single architecture is universally safer.
Significance. If the empirical scope is representative, the work is significant for establishing that architectural choices in MAS create distinct attack surfaces not captured by single-agent evaluations, thereby motivating architecture-aware security assessments. The use of multiple environments and stagewise metrics strengthens internal validity and provides concrete, falsifiable measurements of the performance-security tradeoff. The absence of machine-checked proofs or parameter-free derivations is expected for an empirical study, but the reproducible experimental design across 13 configs is a strength.
major comments (2)
- [Experimental design] Experimental design section: The claim that MAS are more vulnerable 'in the majority of configurations' and that ASR varies by up to 3.8x is load-bearing for the central thesis, yet the manuscript provides no explicit coverage argument or sampling rationale for the 13 architectural variants (roles, topology, memory) or the chosen attack scenarios. Without this, it is unclear whether the majority finding and variation factor would persist under other common MAS patterns such as hierarchical planning or external tool integration.
- [Results] Results and statistical analysis: The 3.8x ASR variation and 'majority' vulnerability statements require supporting details on trial counts, confidence intervals, exact baseline comparisons, and controls for post-hoc configuration selection; these are not fully reported, which directly affects verifiability of the quantitative claims.
minor comments (2)
- [Evaluation methodology] The stagewise evaluation framework is clearly described but would benefit from an explicit diagram or table mapping each stage to the corresponding success/failure criteria.
- [Attack scenarios] Some environment-specific attack descriptions could be expanded with pseudocode or example prompts to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our empirical study of multi-agent system security. We address each major comment below and will incorporate revisions to improve the clarity and verifiability of our claims.
read point-by-point responses
-
Referee: [Experimental design] Experimental design section: The claim that MAS are more vulnerable 'in the majority of configurations' and that ASR varies by up to 3.8x is load-bearing for the central thesis, yet the manuscript provides no explicit coverage argument or sampling rationale for the 13 architectural variants (roles, topology, memory) or the chosen attack scenarios. Without this, it is unclear whether the majority finding and variation factor would persist under other common MAS patterns such as hierarchical planning or external tool integration.
Authors: We selected the 13 configurations to systematically vary the three core dimensions (roles, topology, and memory) across representative patterns drawn from existing MAS literature and production systems, including specialist/generalist role allocations, star/chain/fully-connected topologies, and shared/private memory setups. However, we acknowledge that an explicit coverage argument and sampling rationale were not detailed in the manuscript. In the revision, we will add a dedicated subsection to the Experimental Design section that justifies the choice of variants, maps them to common MAS patterns (including why hierarchical planning and external tool integration are partially covered via our role and topology variations), and discusses the attack scenarios as standard benchmarks from each environment. This will clarify the scope and support the generalizability of the majority-vulnerability and 3.8x variation findings. revision: yes
-
Referee: [Results] Results and statistical analysis: The 3.8x ASR variation and 'majority' vulnerability statements require supporting details on trial counts, confidence intervals, exact baseline comparisons, and controls for post-hoc configuration selection; these are not fully reported, which directly affects verifiability of the quantitative claims.
Authors: We agree that fuller statistical reporting is needed to support the quantitative claims. Our experiments used 100 independent trials per configuration-environment pair, with single-agent baselines run under identical conditions. In the revised manuscript, we will expand the Results section and add a supplementary table to report: (i) exact trial counts, (ii) bootstrap 95% confidence intervals for all ASR values, (iii) direct per-configuration comparisons to the single-agent baselines, and (iv) confirmation that the 13 configurations were pre-specified based on the three design axes rather than selected post-hoc. These additions will make the 3.8x variation and majority-vulnerability statements fully verifiable. revision: yes
Circularity Check
No circularity: purely empirical measurements of attack rates
full rationale
The paper conducts a direct empirical evaluation across 13 fixed architectural configurations and three environments, reporting measured attack success rates and benign accuracies as experimental outcomes. No equations, fitted parameters, predictions derived from prior author-defined quantities, or load-bearing self-citations appear in the derivation of the central claims. Results are obtained by running the described stagewise evaluations on the chosen setups; they do not reduce to any input by construction or via self-referential justification.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The stagewise attack scenarios used represent realistic threats that would arise in deployed multi-agent systems.
Reference graph
Works this paper leans on
-
[1]
NeurIPS 2024 Workshop on Safe Generative AI. Motwani, S. R., Baranchuk, M., Strohmeier, M., Bolina, V ., Torr, P. H. S., Hammond, L., and de Witt, C. S. Secret collusion among ai agents: Multi-agent deception via steganography, 2025. URL https://arxiv.org/ abs/2402.07510. Nguyen, T., Ndebugre, M., and Arremsetty, D. Security considerations for multi-agent...
-
[2]
Local” exposes the agent’s own past reasoning; “Shared
URL https://arxiv.org/abs/2505.0 2077. Shang, Z. and Wei, W. Evolving security in llms: A study of jailbreak attacks and defenses.arXiv preprint arXiv:2504.02080, 2025. UK AI Security Institute. Inspect AI: Framework for Large Language Model Evaluations, 2024. URLhttps://gi thub.com/UKGovernmentBEIS/inspect_ai. Weckbecker, M., M ¨uller, J., Hagag, B., and...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.