MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems

Adish Singla; Goran Radanovic; Jonathan N\"other

arxiv: 2602.04431 · v2 · pith:ED66SJSDnew · submitted 2026-02-04 · 💻 cs.LG · cs.GT

MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems

Jonathan N\"other , Adish Singla , Goran Radanovic This is my paper

classification 💻 cs.LG cs.GT

keywords systemsagenticagentsmamasafesafetyapproachattacks

0 comments

read the original abstract

LLM-based multi-agent systems have demonstrated impressive capabilities, but they also introduce significant safety risks when individual agents fail or behave adversarially. In this work, we study the automated design of agentic systems that remain safe even when a subset of agents is compromised. Inspired by Stackelberg security games, we formalize this problem as a game between a system designer (the Meta-Agent) and a best-responding Meta-Adversary that selects and compromises a subset of agents to minimize safety. We propose Meta-Adversary-Meta-Agent (MaMa), a novel algorithm inspired by this formalization for automatically designing safe agentic systems. Our approach uses LLM-based adversarial search, where the Meta-Agent iteratively proposes system designs and receives feedback based on the strongest attacks discovered by the Meta-Adversary. Empirical evaluations across diverse environments show that systems designed with MaMa consistently defend against worst-case attacks while maintaining performance comparable to systems optimized solely for task success. Moreover, the resulting systems generalize to stronger adversaries, as well as ones with different attack objectives or underlying LLMs, demonstrating robust safety beyond the training setting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense
cs.AI 2026-05 unverdicted novelty 6.0 partial

Tool-mediated LLM agents with deterministic tools and a machine-checked Lyapunov certificate achieve stable control in cyber defense, reducing attacker game value by 59% on real attack graphs.