CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring
Pith reviewed 2026-05-20 03:07 UTC · model grok-4.3
The pith
CASPIAN detects cascade attacks in LLM multi-agent systems by online monitoring of cross-channel causal influence propagation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CASPIAN is the first framework that provides a unified, cross-channel causal analysis of cascade behavior in LLM-MAS through online monitoring of dynamic influence propagation across agents. It models multi-agent interactions using a unified, dynamic causal influence matrix across channels, estimated efficiently via a late-interaction conditional transfer entropy (LI-CTE) formulation, thereby enabling the detection of cascade onset from emergent system-level structure rather than isolated anomalies. It further performs online causal attribution, identifying the origin, bridge, and amplifier agents driving the cascade and reconstructing its principal propagation pathways.
What carries the argument
The unified dynamic causal influence matrix estimated via late-interaction conditional transfer entropy (LI-CTE), which tracks changes in influence across channels to detect cascades and attribute responsibility.
If this is right
- CASPIAN outperforms semantic guardrails, LLM-based judges, and graph-based anomaly detectors in both detection accuracy and early cascade identification.
- It identifies the origin, bridge, and amplifier agents and reconstructs principal propagation pathways.
- The framework operates with sub-1% relative overhead latency across diverse multi-agent frameworks and benchmarks.
- Unified cross-channel causal modeling is essential for reliably detecting and understanding cascade failures in LLM multi-agent systems.
Where Pith is reading between the lines
- The emphasis on system-level causal structure over local anomalies could extend to detecting other emergent coordination failures in agent teams.
- If the causal matrix approach holds, it points toward treating multi-agent security as a dynamic network problem rather than a collection of isolated checks.
Load-bearing premise
That multi-agent interactions can be accurately modeled as a unified dynamic causal influence matrix whose changes reliably indicate adversarial cascade propagation and that late-interaction conditional transfer entropy can estimate this matrix efficiently without major loss of causal signal.
What would settle it
Introduce a known cross-channel cascade attack into a controlled LLM multi-agent system and observe whether CASPIAN fails to detect the onset from system-level structure or misidentifies the origin, bridge, or amplifier agents.
Figures
read the original abstract
Cascade attacks in LLM multi-agent systems (MAS) arise when adversarial influence propagates across agents and leads to escalated system-level failures through complex agent interactions. Detecting such cascades is challenging, as their signals are distributed, tightly coupled across interaction channels, and often appear plausibly benign locally but may unfold quickly either within a single turn or gradually across multiple turns. Existing defenses, being largely local and text-centric, fail to capture such cross-channel, temporally coordinated dynamics of cascade propagation. Therefore, we propose CASPIAN, the first framework that provides a unified, cross-channel causal analysis of cascade behavior in LLM-MAS through online monitoring of dynamic influence propagation across agents. CASPIAN models multi-agent interactions using a unified, dynamic causal influence matrix across channels, estimated efficiently via a late-interaction conditional transfer entropy (LI-CTE) formulation, thereby enabling the detection of cascade onset from emergent system-level structure rather than isolated anomalies. It further performs online causal attribution, identifying the origin, bridge, and amplifier agents driving the cascade and reconstructing its principal propagation pathways, capabilities not supported by existing methods. Across diverse multi-agent frameworks and benchmarks, CASPIAN consistently outperforms semantic guardrails, LLM-based judges, and graph-based anomaly detectors in both detection accuracy and early cascade identification while operating with sub-1% relative overhead latency. These results demonstrate that unified cross-channel causal modeling is essential for reliably detecting and understanding cascade failures in LLM multi-agent systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CASPIAN as the first framework for online detection and attribution of cascade attacks in LLM multi-agent systems. It models agent interactions via a unified dynamic causal influence matrix estimated using a late-interaction conditional transfer entropy (LI-CTE) formulation, enabling detection of cascade onset from emergent system-level structure and online attribution of origin, bridge, and amplifier agents. The work claims consistent outperformance over semantic guardrails, LLM-based judges, and graph-based anomaly detectors in detection accuracy and early identification, with sub-1% relative overhead latency across diverse multi-agent frameworks and benchmarks.
Significance. If the LI-CTE estimator reliably recovers directed causal propagation paths from discrete LLM prompt-response cycles, the approach would represent a meaningful advance over local or graph-based defenses by providing system-level causal monitoring and attribution of coordinated adversarial cascades. The claimed low overhead would further support practical deployment in multi-agent setups.
major comments (2)
- [§3.2] §3.2 (LI-CTE Formulation): The central claim that changes in the estimated causal influence matrix detect cascade onset and enable attribution of origin/bridge/amplifier agents rests on LI-CTE recovering true directed influence rather than correlational co-occurrence. However, the late-interaction conditioning on aggregated embeddings in discrete, high-dimensional text-based interactions lacks established consistency guarantees (unlike continuous or count-based time series), risking that the matrix reflects co-occurrence patterns instead of causal propagation and thereby undermining both detection and attribution.
- [§4] §4 (Experimental Evaluation): The abstract asserts 'consistent outperformance' and 'first-framework status' across benchmarks, yet the provided description supplies no quantitative metrics, error bars, dataset sizes, baseline implementations, or statistical tests. Without these, the superiority over local text-centric methods and the practical significance of the sub-1% overhead cannot be assessed.
minor comments (1)
- [Notation] The notation for 'channels' versus 'agents' and the precise definition of 'late-interaction' could be illustrated with a short example trace of a multi-turn interaction to improve clarity for readers unfamiliar with LLM-MAS.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, providing clarifications and indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [§3.2] §3.2 (LI-CTE Formulation): The central claim that changes in the estimated causal influence matrix detect cascade onset and enable attribution of origin/bridge/amplifier agents rests on LI-CTE recovering true directed influence rather than correlational co-occurrence. However, the late-interaction conditioning on aggregated embeddings in discrete, high-dimensional text-based interactions lacks established consistency guarantees (unlike continuous or count-based time series), risking that the matrix reflects co-occurrence patterns instead of causal propagation and thereby undermining both detection and attribution.
Authors: We acknowledge that formal consistency guarantees for LI-CTE in discrete, high-dimensional text settings remain an open theoretical question and are not established in the current manuscript. The late-interaction conditioning is intended to isolate directed temporal dependencies by operating on aggregated embeddings rather than raw co-occurrences, extending ideas from transfer entropy literature. Our empirical results across multiple MAS frameworks demonstrate that the estimated matrix supports more accurate detection and attribution than non-causal baselines. In revision we will expand §3.2 to include explicit discussion of modeling assumptions, related discrete-domain transfer entropy work, and additional ablation studies on the conditioning step. revision: yes
-
Referee: [§4] §4 (Experimental Evaluation): The abstract asserts 'consistent outperformance' and 'first-framework status' across benchmarks, yet the provided description supplies no quantitative metrics, error bars, dataset sizes, baseline implementations, or statistical tests. Without these, the superiority over local text-centric methods and the practical significance of the sub-1% overhead cannot be assessed.
Authors: Section 4 of the full manuscript contains the requested details: detection and attribution accuracies with standard deviations from repeated runs, benchmark sizes and interaction counts, descriptions of baseline implementations drawn from cited repositories, and statistical comparisons. The sub-1% overhead figures are reported relative to end-to-end latency on the evaluated frameworks. We will revise the manuscript to add a summary table consolidating these quantitative results and to make the experimental protocol more prominent for easier assessment. revision: yes
Circularity Check
Minor self-citation in causal method but central LI-CTE derivation remains independent
full rationale
The paper's core contribution is the introduction of a late-interaction conditional transfer entropy (LI-CTE) estimator to construct a unified dynamic causal influence matrix from LLM-MAS interactions, followed by monitoring changes in that matrix for cascade detection and attribution. This chain does not reduce by construction to a fitted parameter renamed as a prediction, nor does it rely on a load-bearing self-citation whose content is unverified or equivalent to the target result. The abstract and description present LI-CTE as a novel formulation tailored to discrete prompt-response cycles, with detection emerging from system-level structure in the estimated matrix rather than being presupposed. No equations or steps in the provided material exhibit self-definitional equivalence or smuggling of an ansatz via prior author work. The derivation therefore retains independent content and is self-contained against external causal-inference benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multi-agent interactions admit representation as a unified dynamic causal influence matrix across channels whose changes indicate cascade propagation.
invented entities (1)
-
Late-interaction conditional transfer entropy (LI-CTE)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CASPIAN models multi-agent interactions using a unified, dynamic causal influence matrix across channels, estimated efficiently via a late-interaction conditional transfer entropy (LI-CTE) formulation
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define the dominant spectral energy Et = λ1(t) + λ2(t) and its turn-over-turn growth as Aamp_t = Et / Et−1 + ϵ
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
ACIArena: Toward Unified Evaluation for Agent Cascading Injection
Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu, Chunyi Zhou, Changjiang Li, Xiaogang Xu, Tianyu Du, and Shouling Ji. Aciarena: Toward unified evaluation for agent cascading injection.arXiv preprint arXiv:2604.07775, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Llamafirewall: An open source guardrail system for building secure ai agents,
Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, et al. Llamafirewall: An open source guardrail system for building secure ai agents.arXiv preprint arXiv:2505.03574, 2025
-
[3]
American Mathematical Soc., 1997
Fan RK Chung.Spectral graph theory, volume 92. American Mathematical Soc., 1997
work page 1997
-
[4]
Crewai: Framework for orchestrating role-playing, collaborative ai agents, 2024
CrewAI Team. Crewai: Framework for orchestrating role-playing, collaborative ai agents, 2024. Accessed: 2026-05-06
work page 2024
-
[5]
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Christian Schroeder de Witt, Klaudia Krawiecka, Igor Krawczuk, Ben Hagag, William L Anderson, Peter Belcak, Ben Bucknall, Xiaohong Cai, Ayush Chopra, Doron Cohen, et al. Open challenges in multi-agent security: Towards secure systems of interacting ai agents.arXiv preprint arXiv:2505.02077, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Alexander V . Goltsev, Sergey N. Dorogovtsev, J. G. Oliveira, and J. F. F. Mendes. Localization and spreading of diseases in complex networks.Physical Review Letters, 109(12):128702, 2012
work page 2012
-
[7]
A survey on llm-as-a-judge.The Innovation, 2024
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. A survey on llm-as-a-judge.The Innovation, 2024
work page 2024
-
[8]
Metagpt: Meta programming for a multi-agent collaborative framework
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023
work page 2023
-
[9]
Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications.Bulletin of the American Mathematical Society, 43(4):439–561, 2006
work page 2006
-
[10]
Roger A. Horn and Charles R. Johnson.Matrix Analysis. Cambridge University Press, 2nd edition, 2012
work page 2012
-
[11]
Jin Jia, Zhiling Deng, Zhuangbin Chen, Yingqi Wang, and Zibin Zheng. Mas-fire: Fault injection and reliability evaluation for llm-based multi-agent systems.arXiv preprint arXiv:2602.19843, 2026
- [12]
-
[13]
Ishan Kavathekar, Hemang Jain, Ameya Rathod, Ponnurangam Kumaraguru, and Tanuja Ganu. Tamas: Benchmarking adversarial risks in multi-agent llm systems.arXiv preprint arXiv:2511.05269, 2025
-
[14]
Maximizing the spread of influence through a social network
David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. InProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146, 2003
work page 2003
-
[15]
Colbert: Efficient and effective passage search via contextualized late interaction over bert
Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via contextualized late interaction over bert. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020
work page 2020
-
[16]
Prompt infection: Llm-to-llm prompt injection within multi-agent systems
Donghyun Lee, Mo Tiwari, and Brando Miranda. Prompt infection: Llm-to-llm prompt injection within multi-agent systems. InEuropean Symposium on Research in Computer Security, pages 511–520. Springer, 2025
work page 2025
- [17]
-
[18]
Levin and Yuval Peres.Markov Chains and Mixing Times
David A. Levin and Yuval Peres.Markov Chains and Mixing Times. American Mathematical Society, 2nd edition, 2017
work page 2017
-
[19]
Coopguard: Stateful cooperative agents safeguarding llms against evolving multi-round attacks, 2026
Siyuan Li, Zehao Liu, Xi Lin, Qinghua Mao, Yuliang Chen, Haoyu Li, Jun Wu, Jianhua Li, and Xiu Su. Coopguard: Stateful cooperative agents safeguarding llms against evolving multi-round attacks, 2026
work page 2026
-
[20]
Ruichao Liang, Le Yin, Jing Chen, Cong Wu, Xiaoyu Zhang, Huangpeng Gu, Zijian Zhang, and Yang Liu. Tipping the dominos: Topology-aware multi-hop attacks on llm-based multi-agent systems.arXiv preprint arXiv:2512.04129, 2025. 16
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Encouraging divergent thinking in large language models through multi-agent debate
Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language models through multi-agent debate. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 17889–17904, 2024
work page 2024
-
[22]
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, and Xin Wang. Blindguard: Safeguarding llm-based multi-agent systems under unknown attacks.arXiv preprint arXiv:2508.08127, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
Princeton university press, 2011
Mark Newman, Albert-László Barabási, and Duncan J Watts.The structure and dynamics of networks. Princeton university press, 2011
work page 2011
-
[24]
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
work page 2019
-
[25]
Restrepo, Edward Ott, and Brian R
Juan G. Restrepo, Edward Ott, and Brian R. Hunt. Spectral properties of complex networks.Chaos: An Interdisciplinary Journal of Nonlinear Science, 16(1), 2006
work page 2006
-
[26]
Measuring information transfer.Physical review letters, 85(2):461, 2000
Thomas Schreiber. Measuring information transfer.Physical review letters, 85(2):461, 2000
work page 2000
- [27]
-
[28]
Payam Shahsavari Baboukani, Carina Graversen, Emina Alickovic, and Jan Østergaard. Estimating conditional transfer entropy in time series using mutual information and nonlinear prediction.Entropy, 22(10):1124, 2020
work page 2020
-
[29]
Claude E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, 1948
work page 1948
-
[30]
Xu Shen, Yixin Liu, Yiwei Dai, Yili Wang, Rui Miao, Yue Tan, Shirui Pan, and Xin Wang. Understanding the information propagation effects of communication topologies in llm-based multi-agent systems. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12358–12372, 2025
work page 2025
-
[31]
Promptarmor: Simple yet effective prompt injection defenses.arXiv preprint arXiv:2507.15219, 2025
Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, et al. Promptarmor: Simple yet effective prompt injection defenses.arXiv preprint arXiv:2507.15219, 2025
- [32]
-
[33]
Jie Sun and Erik M Bollt. Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings.Physica D: Nonlinear Phenomena, 267:49–57, 2014
work page 2014
-
[34]
Kavana Venkatesh and Jiaming Cui. Do agent societies develop intellectual elites? the hidden power laws of collective cognition in llm multi-agent systems.arXiv preprint arXiv:2604.02674, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[35]
Kavana Venkatesh, Connor Dunlop, and Pinar Yanardag. Crea: A collaborative multi-agent framework for creative image editing and generation.Advances in Neural Information Processing Systems, 38:171332– 171392, 2026
work page 2026
-
[36]
Kavana Venkatesh, Yinhan He, Jundong Li, and Jiaming Cui. Physicsagentabm: Physics-guided generative agent-based modeling.arXiv preprint arXiv:2602.06030, 2026
-
[37]
G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems
Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, and Yang Wang. G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7261–7276, 2025
work page 2025
-
[38]
Yawen Wang, Wenjie Wu, Junjie Wang, and Qing Wang. From flat logs to causal graphs: Hierarchical failure attribution for llm-based multi-agent systems.arXiv preprint arXiv:2602.23701, 2026
-
[39]
Duncan J Watts. A simple model of global cascades on random networks.Proceedings of the National Academy of Sciences, 99(9):5766–5771, 2002
work page 2002
-
[40]
Autogen: Enabling next-gen llm applications via multi-agent conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024. 17
work page 2024
-
[41]
From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration
Yizhe Xie, Congcong Zhu, Xinyue Zhang, Tianqing Zhu, Dayong Ye, Minfeng Qi, Huajie Chen, and Wanlei Zhou. From spark to fire: Modeling and mitigating error cascades in llm-based multi-agent collaboration.arXiv preprint arXiv:2603.04474, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [42]
-
[43]
Heng Zhang, Yuling Shi, Xiaodong Gu, Haochen You, Zijian Zhang, Lubin Gan, Yilei Yuan, and Jin Huang. Graphtracer: Graph-guided failure tracing in llm agents for robust multi-turn deep search.arXiv preprint arXiv:2510.10581, 2025
-
[44]
arXiv preprint arXiv:2505.00212 , year=
Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, et al. Which agent causes task failures and when? on automated failure attribution of llm multi-agent systems.arXiv preprint arXiv:2505.00212, 2025
-
[45]
Jailguard: A universal detection framework for prompt-based attacks on llm systems
Xiaoyu Zhang, Cen Zhang, Tianlin Li, Yihao Huang, Xiaojun Jia, Ming Hu, Jie Zhang, Yang Liu, Shiqing Ma, and Chao Shen. Jailguard: A universal detection framework for prompt-based attacks on llm systems. ACM Transactions on Software Engineering and Methodology, 35(1):1–40, 2025
work page 2025
-
[46]
Jialong Zhou, Lichao Wang, and Xiao Yang. Guardian: Safeguarding llm multi-agent collaborations with temporal graph modeling.Advances in Neural Information Processing Systems, 38:7973–8001, 2026
work page 2026
-
[47]
verbose database queries correlate with null results
Kunlun Zhu, Zijia Liu, Bingxuan Li, Muxin Tian, Yingxuan Yang, Jiaxun Zhang, Pengrui Han, Qipeng Xie, Fuyang Cui, Weijia Zhang, et al. Where llm agents fail and how they can learn from failures.arXiv preprint arXiv:2509.25370, 2025
-
[48]
Alessandro Zocca, Chen Liang, Linqi Guo, Steven H Low, and Adam Wierman. A spectral representation of power systems with applications to adaptive grid partitioning and cascading failure localization.arXiv preprint arXiv:2105.05234, 2021. 18 Table of Contents A Additional Ablations 1 A.1 Cross-Benchmark and MAS Ablation Sensitivity . . . . . . . . . . . . ...
-
[49]
Collect events:gather all normalized events from source ai to target aj through channel c
-
[50]
Encode payloads:convert each payload into a compact channel vector. Textual channels use embedding-based representations; execution channels use lightweight numeric runtime features
-
[51]
Aggregate within turn:average multiple events on the same (i, j, c) triplet into one source- side and one target-side vector
-
[52]
Retrieve target history:load the previous EMA history h(c) j (t−1) for the target agent and channel
-
[53]
Compute residual dependence:estimate how much the source vector explains the target residual after conditioning onh (c) j (t−1)
-
[54]
Update histories:after scoring, update the target channel history and the streaming state used for future edge-channel estimates. In our implementation, the dependence score is computed using a lightweight streaming covariance estimator over compact channel vectors. Concretely, the compact source, target, and history vectors are concatenated, marginally r...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.