A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.
Wu and Aviv Tamar and Jean Harb and Pieter Abbeel and Igor Mordatch , title =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MA 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Insider Attacks in Multi-Agent LLM Consensus Systems
A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.