When Latent Agents Lie: KV-Cache Integrity in Multi-Agent LLM Collaboration
Pith reviewed 2026-06-30 08:24 UTC · model grok-4.3
The pith
Tampering with hidden KV-cache state can degrade multi-agent LLM answers even when visible commitments look plausible.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Specialists each see part of the evidence, send a short commitment, and pass full KV-cache state to a coordinator. In clean runs this latent collaboration improves over a matched text-only version, reaching EM/F1 of 0.338/0.486 versus 0.231/0.369 on transformed HiddenBench with Qwen3-4B. When one specialist is malicious, changing the hidden KV state collapses performance even when the visible commitment still looks plausible. A verifier that checks only text misses this failure mode. Simple magnitude checks catch some corruptions but adaptive attacks evade them. The most reliable fix is an HMAC-SHA256 manifest that binds the specialist, session, model, visible commitment, tensor metadata, an
What carries the argument
HMAC-SHA256 manifest that binds specialist identity, session, model, visible commitment, tensor metadata, and payload digest to protect KV-cache during transport.
If this is right
- Full-KV latent memory can improve multi-agent collaboration but must be treated as a security-sensitive object.
- Visible text commitments alone cannot verify the integrity of shared hidden state.
- Adaptive attacks can evade magnitude-based checks on KV tensors while still damaging answers.
- Cryptographic binding of KV state to visible commitments preserves performance gains from latent sharing.
- KV-cache exchanged between agents should be protected in transport rather than inspected after receipt.
Where Pith is reading between the lines
- Similar integrity mechanisms could apply to other forms of hidden state exchange in distributed LLM systems.
- The measured performance lift from latent sharing suggests that secure KV protocols may be worth adopting in production agent frameworks.
- Standardized manifests for model state could reduce the attack surface when multiple models exchange internal representations.
Load-bearing premise
The 295 recorded tampered payloads and the adaptive attacks tested represent realistic threats that could be mounted against deployed multi-agent LLM systems.
What would settle it
An adaptive attack that modifies KV-cache content, changes the coordinator's answer, and still produces a payload accepted by the HMAC manifest.
Figures
read the original abstract
LLM agents can share more than text. In some systems, an agent can send a short visible message while also passing its full KV-cache state to another model. This hidden state can help the final model combine evidence from several agents, but it is also hard to inspect. A visible message may look harmless even if the hidden state has been changed. We study this problem in a multi-agent question-answering setup. Specialists each see part of the evidence, send a short commitment, and pass full KV-cache state to a coordinator. In clean runs, this latent collaboration improves over a matched text-only version. On transformed HiddenBench with Qwen3-4B, it reaches EM/F1 of 0.338/0.486, compared with 0.231/0.369 for text collaboration. Qwen3-8B and HotPotQA runs show the same direction of improvement. The problem appears when one specialist is malicious. Some false visible commitments can steer answers. More seriously, changing the hidden KV state can collapse performance even when the visible commitment still looks plausible. A verifier that checks only text misses this failure mode. Simple magnitude checks catch some obvious corruptions, but adaptive attacks can evade them while still damaging the final answer. The most reliable fix we find is not to guess whether hidden state looks normal, but to protect it in transport. We implement an HMAC-SHA256 manifest that binds the specialist, session, model, visible commitment, tensor metadata, and payload digest. It accepts all 774 honest replayed payloads and rejects all 295 recorded tampered payloads. The main lesson is that full-KV latent memory can be useful, but it should be treated as a security-sensitive object, not as ordinary internal model state.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines security risks in multi-agent LLM systems where agents exchange full KV-cache states alongside short visible commitments in a question-answering setup. It reports that latent KV sharing improves exact match and F1 scores over text-only collaboration (e.g., 0.338/0.486 vs. 0.231/0.369 on HiddenBench with Qwen3-4B). It then demonstrates that tampering the hidden KV state can degrade coordinator performance even when the visible commitment appears plausible, that magnitude-based checks are evadable by adaptive attacks, and that an HMAC-SHA256 manifest binding specialist, session, model, commitment, tensor metadata, and payload digest perfectly separates 774 honest replayed payloads from 295 recorded tampered ones. The central recommendation is to treat KV-cache payloads as security-sensitive objects requiring cryptographic transport protection rather than relying on post-hoc inspection.
Significance. If the empirical separation holds under a representative threat model, the work is significant for highlighting an under-explored attack vector in latent multi-agent collaboration and for supplying a concrete, implementable cryptographic countermeasure. The exact acceptance/rejection counts and performance deltas provide clear, falsifiable evidence; the absence of free parameters or fitted models in the HMAC construction is a strength. The result bears on the design of any system passing internal model state between agents.
major comments (1)
- [Abstract, attack evaluation paragraph] Abstract, attack evaluation paragraph: The claim that the HMAC-SHA256 manifest is the most reliable fix rests on its acceptance of all 774 honest payloads and rejection of all 295 recorded tampered payloads. The manuscript provides no evidence that these 295 examples adequately sample sophisticated adaptive KV-cache modifications that preserve the visible commitment while still damaging answers, leaving the superiority over inspection methods dependent on an unverified representativeness assumption.
minor comments (1)
- [Abstract] Abstract: The reported EM/F1 improvements are given without error bars, dataset transformation details, or statistical tests, which would make the utility claim more robust even if not central to the security argument.
Simulated Author's Rebuttal
We thank the referee for their careful reading and valuable feedback on our manuscript. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract, attack evaluation paragraph] Abstract, attack evaluation paragraph: The claim that the HMAC-SHA256 manifest is the most reliable fix rests on its acceptance of all 774 honest payloads and rejection of all 295 recorded tampered payloads. The manuscript provides no evidence that these 295 examples adequately sample sophisticated adaptive KV-cache modifications that preserve the visible commitment while still damaging answers, leaving the superiority over inspection methods dependent on an unverified representativeness assumption.
Authors: We appreciate the referee pointing out this limitation in our evaluation. The 295 tampered payloads were produced by the adaptive attacks we implemented, which successfully evaded magnitude-based detection while degrading coordinator performance. We acknowledge that this finite set does not represent all conceivable sophisticated modifications that could preserve the visible commitment. However, the strength of the HMAC-SHA256 manifest lies in its cryptographic properties rather than empirical coverage: it binds the payload digest, so any change to the KV-cache alters the digest and invalidates the HMAC (provided the key remains secret). Therefore, the detection capability does not depend on the representativeness of the 295 examples. We will revise the abstract and the relevant evaluation paragraph to clarify this point, explicitly distinguishing the cryptographic guarantee from the empirical results and acknowledging the scope of the tested attacks. revision: yes
Circularity Check
No circularity; results are direct empirical measurements on recorded payloads
full rationale
The paper reports an empirical evaluation: an HMAC-SHA256 manifest is implemented and tested on 774 honest replayed payloads (all accepted) and 295 recorded tampered payloads (all rejected). No derivation chain, equations, fitted parameters, or self-citations are present that reduce the central claim to its own inputs by construction. The representativeness of the 295 tampered examples is a validity question outside the scope of circularity analysis.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math HMAC-SHA256 provides integrity when the key remains secret
Reference graph
Works this paper leans on
-
[1]
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)
Zhong, Yinmin et al..DistServe: Disaggregating Pre- fill and Decoding for Goodput-optimized Large Lan- guage Model Serving. 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 2024. https://www.usenix.org/conference/ osdi24/presentation/zhong-yinmin
2024
-
[2]
23rd USENIX Conference on File and Storage Technologies (FAST 25)
Qin, Ruoyu et al..Mooncake: Trading More Stor- age for Less Computation—A KVCache-centric Ar- chitecture for Serving LLM Chatbot. 23rd USENIX Conference on File and Storage Technologies (FAST 25). 2025. https://www.usenix.org/conference/ fast25/presentation/qin
2025
-
[3]
The Twelfth Interna- tional Conference on Learning Representations
Pham, Chau et al..Let Models Speak Ciphers: Multia- gent Debate through Embeddings. The Twelfth Interna- tional Conference on Learning Representations. 2024. https://openreview.net/forum?id=sehRvaIPQQ
2024
-
[4]
Latent Collaboration in Multi-Agent Systems
Zou, Jiaru et al..Latent Collaboration in Multi-Agent Systems. Forty-third International Conference on Ma- chine Learning. 2026. arXiv:2511.20639. https:// doi.org/10.48550/arXiv.2511.20639
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.20639 2026
-
[5]
Du, Zhuoyun et al..Enabling Agents to Communicate Entirely in Latent Space. arXiv. 2026. arXiv:2511.09149. https://doi.org/10.48550/arXiv.2511.09149
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.09149 2026
-
[6]
et al..Latent Space Communication via K-V Cache Alignment
Dery, Lucio M. et al..Latent Space Communication via K-V Cache Alignment. arXiv. 2026. arXiv:2601.06123. https://doi.org/10.48550/arXiv.2601.06123
-
[7]
The Fourteenth International Conference on Learning Repre- sentations
Fu, Tianyu et al..Cache-to-Cache: Direct Semantic Communication Between Large Language Models. The Fourteenth International Conference on Learning Repre- sentations. 2026. https://openreview.net/forum? id=LeatkxrBCi
2026
-
[8]
Wang, Chenxi et al..Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems. arXiv. 2026. arXiv:2605.28214.https://doi. org/10.48550/arXiv.2605.28214
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.28214 2026
-
[9]
Asif, Sadia et al..LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems. arXiv. 2026. arXiv:2605.22786.https://doi.org/10. 48550/arXiv.2605.22786
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[10]
Computer Security
Lee, Donghyun; Tiwari, Mo; Miranda, Brando.Prompt Infection: LLM-to-LLM Prompt Injection within Multi- agent Systems. Computer Security. ESORICS 2025 In- ternational Workshops. 2026. https://doi.org/10. 1007/978-3-032-16092-8_28
2025
- [11]
-
[12]
Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs
Li, Yuxuan; Naito, Aoi; Shirado, Hirokazu. Systematic Failures in Collective Reason- ing under Distributed Information in Multi- Agent LLMs. arXiv. 2026. arXiv:2505.11556. https://doi.org/10.48550/arXiv.2505.11556
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.11556 2026
-
[13]
Proceedings of the 2025 Conference on Empirical Methods in Natu- ral Language Processing
Tang, Yichen et al..Augmenting Multi-Agent Commu- nication with State Delta Trajectory. Proceedings of the 2025 Conference on Empirical Methods in Natu- ral Language Processing. 2025. https://doi.org/10. 18653/v1/2025.emnlp-main.518
2025
-
[14]
Zheng, Yujia et al..Thought Communication in Multi- agent Collaboration. arXiv. 2025. arXiv:2510.20733. https://doi.org/10.48550/arXiv.2510.20733
-
[15]
Liu, Xiaoze et al..The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems. arXiv. 2026. arXiv:2602.15382.https://doi.org/10. 48550/arXiv.2602.15382
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
Mou, Xinyi et al..HyLaT: Efficient Multi-Agent Communication via Hybrid Latent-Text Protocol. arXiv. 2026. arXiv:2605.25421.https://doi.org/10. 48550/arXiv.2605.25421
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
Parekh, Swapnil.Thinking Wrong in Si- lence: Backdoor Attacks on Continuous La- tent Reasoning. arXiv. 2026. arXiv:2604.00770. https://doi.org/10.48550/arXiv.2604.00770
-
[18]
Wan, Zhipeng et al..Information Leakage from Embedding in Large Language Mod- els. arXiv. 2024. arXiv:2405.11916. https: //doi.org/10.48550/arXiv.2405.11916
-
[19]
Liu, Tiantian et al..Mitigating Privacy Risks in LLM Embeddings from Embedding In- version. arXiv. 2024. arXiv:2411.05034. https://doi.org/10.48550/arXiv.2411.05034
-
[20]
Nikolaou, Giorgos et al..Language Models are Injective and Hence Invertible. arXiv. 2025. arXiv:2510.15511. https://doi.org/10.48550/arXiv.2510.15511
-
[21]
El Yagoubi, Faouzi; Badu-Marfo, Godwin; Al Mallah, Ranwa.AgentLeak: A Benchmark for Internal-Channel Privacy Leakage in Multi-Agent LLM Systems. IEEE Access. 2026. https://doi.org/10.1109/ACCESS. 2026.3704541
-
[22]
Cui, Yu; Du, Hongyang.MAD-Spear: A Conformity- Driven Prompt Injection Attack on Multi-Agent Debate 15 Systems. arXiv. 2025. arXiv:2507.13038.https://doi. org/10.48550/arXiv.2507.13038
-
[23]
Advances in Neural Information Processing Sys- tems 38 (NeurIPS 2025) Datasets and Benchmarks Track
Cemri, Mert et al..Why Do Multi-Agent LLM Systems Fail?. Advances in Neural Information Processing Sys- tems 38 (NeurIPS 2025) Datasets and Benchmarks Track. 2025. https://openreview.net/forum?id= fAjbYBmonr
2025
-
[24]
Zhang, Lingxi; Zheng, Guangtao; Chen, Han- jie.When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems. arXiv. 2026. arXiv:2605.01133. https://doi.org/10.48550/arXiv.2605.01133
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.01133 2026
-
[26]
Proceedings of the ACM Web Conference 2026
Feng, Yang; Pan, Xudong.SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dy- namic Threat Detection. Proceedings of the ACM Web Conference 2026. 2026. https://doi.org/10.1145/ 3774904.3792462
-
[27]
Luo, Yaoyang et al..Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence- Level Rectification. arXiv. 2026. arXiv:2605.28104. https://doi.org/10.48550/arXiv.2605.28104
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.28104 2026
-
[28]
Schroeder de Witt, Christian.Open Challenges in Multi- Agent Security: Towards Secure Systems of Interacting AI Agents. arXiv. 2025. arXiv:2505.02077. https:// doi.org/10.48550/arXiv.2505.02077
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.02077 2025
-
[29]
Advances in Neural Informa- tion Processing Systems 30
Blanchard, Peva et al..Machine Learning with Adversaries: Byzantine Tolerant Gradi- ent Descent. Advances in Neural Informa- tion Processing Systems 30. 2017. https: //proceedings.neurips.cc/paper/2017/hash/ f4b9ec30ad9f68f89b29639786cb62ef-Abstract. html
2017
-
[30]
Proceedings of the 35th International Conference on Machine Learn- ing
Yin, Dong et al..Byzantine-Robust Distributed Learn- ing: Towards Optimal Statistical Rates. Proceedings of the 35th International Conference on Machine Learn- ing. 2018. https://proceedings.mlr.press/v80/ yin18a.html
2018
-
[31]
Robust Aggregation for Federated Learning
Pillutla, Krishna; Kakade, Sham M.; Harchaoui, Zaid. Robust Aggregation for Federated Learning. IEEE Transactions on Signal Processing. 2022.https://doi. org/10.1109/TSP.2022.3153135
-
[32]
Proceedings of the 35th Interna- tional Conference on Machine Learning
El Mhamdi, El Mahdi; Guerraoui, Rachid; Rouault, Se- bastien.The Hidden Vulnerability of Distributed Learn- ing in Byzantium. Proceedings of the 35th Interna- tional Conference on Machine Learning. 2018. https: //proceedings.mlr.press/v80/mhamdi18a.html
2018
-
[33]
A Little Is Enough: Circumventing Defenses for Distributed Learning
Baruch, Gilad; Baruch, Moran; Goldberg, Yoav. A Little Is Enough: Circumventing Defenses for Distributed Learning. Advances in Neural In- formation Processing Systems 32. 2019. https: //proceedings.neurips.cc/paper/2019/hash/ ec1c59141046cd1866bbbcdfb6ae31d4-Abstract. html
2019
-
[34]
Findings of the Association for Computational Linguistics: NAACL
Zhou, Wei et al..Efficient Multi-Agent Collabora- tion with Tool Use for Online Planning in Com- plex Table Question Answering. Findings of the Association for Computational Linguistics: NAACL
-
[35]
2025. https://doi.org/10.18653/v1/2025. findings-naacl.54
-
[36]
Besrour, Ines et al..RAGentA: Multi-Agent Retrieval- Augmented Generation for Attributed Question Answer- ing. arXiv. 2025. arXiv:2506.16988. https://doi. org/10.48550/arXiv.2506.16988
-
[37]
Xiao, Xingchen et al..MASS-RAG: Multi- Agent Synthesis Retrieval-Augmented Gen- eration. arXiv. 2026. arXiv:2604.18509. https://doi.org/10.48550/arXiv.2604.18509
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.18509 2026
- [38]
-
[39]
Gao, Tianhao; Yang, Kai; Li, Yiyang.FD-RAG: Fed- erated Dual-System Retrieval-Augmented Generation. arXiv. 2026. arXiv:2605.27432.https://doi.org/10. 48550/arXiv.2605.27432
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[40]
Mao, Chenxin et al..An Efficient and Privacy- Preserving Architecture for Cross-Institutional Collab- orative RAG. arXiv. 2026. arXiv:2605.25716. https: //doi.org/10.48550/arXiv.2605.25716. 16
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.25716 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.