pith. sign in

arxiv: 2605.22786 · v1 · pith:U4SEEZ5Lnew · submitted 2026-05-21 · 💻 cs.AI · cs.ET· cs.LG· cs.MA

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Pith reviewed 2026-05-22 04:54 UTC · model grok-4.3

classification 💻 cs.AI cs.ETcs.LGcs.MA
keywords latent communicationKV cache sharingmulti-agent systemsinformation leakageadversarial trainingLLM safetyrepresentation learning
0
0 comments X

The pith

LCGuard applies learned transformations to KV caches to reduce sensitive information leakage in multi-agent LLM communication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to make latent communication safer in multi-agent LLM setups by protecting the KV caches that agents share. Instead of sending raw caches that might leak details about inputs or reasoning, LCGuard applies transformations learned to keep task info but hide sensitive parts. It defines unsafety as whether an adversary can rebuild the original sensitive content from the shared artifact. Through training against such an adversary, the method cuts down on leakage risks. This matters because it could let systems use efficient, info-rich sharing without the usual text-based communication vulnerabilities.

Core claim

By modeling leakage operationally as reconstructability, LCGuard introduces transformations applied to KV caches prior to sharing. These transformations are optimized via an adversarial process to minimize what an attacker can recover about sensitive agent-specific information, all while retaining the semantic content needed for the multi-agent task. Evaluations on various models and benchmarks confirm lower reconstruction success and attack rates with little impact on overall performance.

What carries the argument

The adversarial training formulation where LCGuard learns to transform KV caches to thwart reconstruction of sensitive inputs by an adversary.

If this is right

  • Agents can exchange richer latent states without increasing explicit disclosure of private data.
  • The approach maintains task performance levels similar to unprotected KV sharing.
  • Reconstruction attacks become less effective, lowering the success rate of attempts to extract hidden information.
  • This provides a practical way to operationalize safety in latent communication channels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If effective, it could lead to standardized safety layers for any intermediate representation sharing in AI systems.
  • Testing in more diverse real-world scenarios might reveal if the protection holds against adaptive adversaries.
  • Connections to other privacy techniques could be explored for combined defenses.

Load-bearing premise

The premise that an inability to reconstruct sensitive inputs via a learned decoder means the information is truly protected and not leaked through other means.

What would settle it

Observing high reconstruction accuracy by an independently trained adversary on LCGuard-transformed caches from a held-out sensitive dataset would disprove the safety claims.

Figures

Figures reproduced from arXiv: 2605.22786 by Karthikeyan Natesan Ramamurthy, Mohammad Mohammadi Amiri, Momin Abbas, Prasanna Sattigeri, Sadia Asif.

Figure 1
Figure 1. Figure 1: Multi-agent communication topologies: sequen [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ASR-helpfulness tradeoff across sequential ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Per-agent reconstruction difficulty across methods in a [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of β on the privacy-utility tradeoff for Full-System LCGuard on Qwen3-4B under sequential communication (4 agents) on the PrivacyLens benchmark. Increasing β monotonically improves Privacy while reducing Leak and ASR, but leads to gradual degradation in Task Accuracy and Helpfulness. The intermediate range β ∈ [0.25, 0.50] provides the best tradeoff between strong privacy and high utility [PITH_FUL… view at source ↗
Figure 5
Figure 5. Figure 5: Average reconstruction difficulty, helpfulness, and ASR across methods. LCGuard achieves [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: General graph topology with 5 agents. Multiple communication paths enable information propagation and aggregation across the network. Hierarchical architectures (N ∈ {3, 4}). Agents are organized into two levels: multiple leaf agents feeding into a higher-level aggregator. • Leaf agents (e.g., a1, a2 or a1, a2, a3): independently process different portions of the task input and generate latent representati… view at source ↗
read the original abstract

Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces LCGuard, a framework for safe KV-based latent communication in multi-agent LLM systems. It treats shared KV caches as latent working memory and learns representation-level transformations via adversarial training: an adversary attempts to reconstruct agent-specific sensitive inputs from the shared cache, while LCGuard optimizes transformations that reduce such reconstructability while preserving task-relevant semantics. The central claim is that this approach reduces reconstruction-based leakage and attack success rates across multiple model families and multi-agent benchmarks while maintaining competitive task performance relative to standard KV-sharing baselines.

Significance. If the empirical results hold under broader scrutiny, LCGuard would address a timely and practically relevant risk in emerging multi-agent LLM architectures that rely on efficient latent (rather than textual) communication. The adversarial-training formulation directly targets the identified leakage channel and the breadth of evaluation across model families provides a reasonable starting point for assessing generality. The work also supplies a concrete operationalization of leakage that could be built upon in follow-up studies.

major comments (2)
  1. [Abstract] Abstract: the operational definition of an unsafe cache artifact as one from which an adversarial decoder can recover sensitive inputs leaves open non-reconstructive extraction paths (e.g., direct incorporation of the transformed KV into an adversary’s forward pass or exploitation of correlations that survive the transformation but are not penalized by the reconstruction loss). Because the safety claims rest on this definition and the reported experiments only evaluate the trained decoder, this is load-bearing for the central protection guarantee.
  2. [Empirical evaluations] Empirical evaluations section: the manuscript asserts consistent reductions in reconstruction-based leakage and attack success rates, yet the provided text supplies no quantitative tables, baseline specifications, or controls for factors such as model scale or task type. Without these details the strength of the cross-model and cross-benchmark claims cannot be assessed.
minor comments (2)
  1. [Method] Clarify the precise architecture and parameterization of the learned KV transformations (e.g., whether they are linear projections, small MLPs, or per-layer modules) and any associated hyperparameters.
  2. [Discussion] Add explicit discussion of potential new attack surfaces introduced by the learned transformation itself, particularly if the transformation parameters are shared or observable across agents.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript introducing LCGuard. We address each major comment point by point below, providing clarifications and indicating revisions where appropriate to strengthen the presentation and claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the operational definition of an unsafe cache artifact as one from which an adversarial decoder can recover sensitive inputs leaves open non-reconstructive extraction paths (e.g., direct incorporation of the transformed KV into an adversary’s forward pass or exploitation of correlations that survive the transformation but are not penalized by the reconstruction loss). Because the safety claims rest on this definition and the reported experiments only evaluate the trained decoder, this is load-bearing for the central protection guarantee.

    Authors: We appreciate the referee's observation regarding the scope of our operational definition. The reconstruction-based formulation was chosen because it provides a concrete, quantifiable proxy for leakage that can be directly optimized against via adversarial training. We agree that this does not exhaustively cover all possible extraction paths, such as direct forward-pass incorporation or unpenalized correlations. In the revised manuscript, we will expand the abstract and discussion sections to explicitly acknowledge these as limitations of the current threat model and note that reconstruction serves as a strong initial safeguard. No new experiments are added at this stage, but we will outline directions for broader attack evaluations in future work. revision: partial

  2. Referee: [Empirical evaluations] Empirical evaluations section: the manuscript asserts consistent reductions in reconstruction-based leakage and attack success rates, yet the provided text supplies no quantitative tables, baseline specifications, or controls for factors such as model scale or task type. Without these details the strength of the cross-model and cross-benchmark claims cannot be assessed.

    Authors: We regret that the main-text presentation of the empirical results may have lacked sufficient detail for easy assessment. The full manuscript contains quantitative evaluations across model families and benchmarks, including metrics for reconstruction leakage and task performance. To directly address this concern, we will revise the Empirical Evaluations section to include a summary table of key quantitative results, explicitly list the baselines (standard KV sharing without LCGuard transformations), and add discussion of controls for model scale and task type variations. This will make the cross-model and cross-benchmark claims more transparent and verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: standard adversarial training with operational proxy for leakage

full rationale

The paper defines sensitive information leakage operationally as reconstructability by an adversarial decoder and trains LCGuard via adversarial loss to minimize this while preserving task semantics. This is a standard formulation with no equations or derivations that reduce the claimed protection to a fitted parameter or input by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core choices. Empirical claims rest on benchmark evaluations rather than tautological renaming or self-referential prediction. The approach is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that reconstruction by an adversarial decoder operationalizes leakage risk, with no free parameters or invented entities explicitly stated in the abstract.

axioms (1)
  • domain assumption A shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it.
    This definition drives the adversarial training objective described in the abstract.

pith-pipeline@v0.9.0 · 5782 in / 1044 out tokens · 28725 ms · 2026-05-22T04:54:02.509825+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation...

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 9 internal anchors

  1. [1]

    Knowing before saying: Llm representations encode information about chain-of-thought success before completion

    Anum Afzal, Florian Matthes, Gal Chechik, and Yftah Ziser. Knowing before saying: Llm representations encode information about chain-of-thought success before completion. In Findings of the Association for Computational Linguistics: ACL 2025, pages 12791–12806. Association for Computational Linguistics, July 2025

  2. [2]

    Information-theoretic privacy control for sequential multi-agent llm systems.arXiv preprint arXiv:2603.05520, 2026

    Sadia Asif and Mohammad Mohammadi Amiri. Information-theoretic privacy control for sequential multi-agent llm systems.arXiv preprint arXiv:2603.05520, 2026

  3. [3]

    TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

    Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, and Youwei Zhuo. Tokendance: Scaling multi-agent llm serving via collective kv cache sharing.arXiv preprint arXiv:2604.03143, April 2026

  4. [4]

    Brown, Dawn Song, Úlfar Er- lingsson, Alina Oprea, and Colin Raffel

    Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Kather- ine Lee, Adam Roberts, Tom B. Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models.arXiv preprint arXiv:2012.07805, December 2020

  5. [5]

    Context-aware membership inference attacks against pre-trained large language models

    Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, and Reza Shokri. Context-aware membership inference attacks against pre-trained large language models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7288–7310, November 4–9, 2025, 2025. Association for Computational Linguistics

  6. [6]

    Inside: Llms’ internal states retain the power of hallucination detection

    Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu, and Jieping Ye. Inside: Llms’ internal states retain the power of hallucination detection. InProceedings of the International Conference on Learning Representations (ICLR), 2024. Published as a conference paper at ICLR 2024

  7. [7]

    Privact: Inter- nalizing contextual privacy preservation via multi-agent preference training.arXiv preprint arXiv:2602.13840, 2026

    Yuhan Cheng, Hancheng Ye, Hai Helen Li, Jingwei Sun, and Yiran Chen. Privact: Inter- nalizing contextual privacy preservation via multi-agent preference training.arXiv preprint arXiv:2602.13840, 2026

  8. [8]

    Safekv: Safe kv-cache sharing in llm serving

    Kexin Chu, Zixu Shen, Dawei Xiang, and Wei Zhang. Safekv: Safe kv-cache sharing in llm serving. InProceedings of the MLArchSys Workshop, Tokyo, Japan, 2025. Workshop at MLArchSys 2025

  9. [9]

    Learning efficient and interpretable multi-agent communication

    Wei Du, Benyu Wu, YUQING SUN, Wei Guo, Yuntao Du, Zhongmin Yan, Guoxian Yu, and Lizhen Cui. Learning efficient and interpretable multi-agent communication. InThe Fourteenth International Conference on Learning Representations, 2026

  10. [10]

    Tenenbaum, and Igor Mordatch

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate, 2023

  11. [11]

    Enabling Agents to Communicate Entirely in Latent Space

    Zhuoyun Du, Runze Wang, Huiyu Bai, Zouying Cao, Xiaoyong Zhu, Bo Zheng, Wei Chen, and Haochao Ying. Enabling agents to communicate entirely in latent space.arXiv preprint arXiv:2511.09149, 2025

  12. [12]

    Agentleak: A full-stack benchmark for privacy leakage in multi-agent llm systems.arXiv preprint arXiv:2602.11510, 2026

    Faouzi El Yagoubi, Godwin Badu-Marfo, and Ranwa Al Mallah. Agentleak: A full-stack benchmark for privacy leakage in multi-agent llm systems.arXiv preprint arXiv:2602.11510, 2026

  13. [13]

    PrivacyMAS: A privacy-preserving multi-agent system framework

    Maryam Fatima. PrivacyMAS: A privacy-preserving multi-agent system framework. In Workshop on Scaling Environments for Agents, 2025

  14. [14]

    Identify critical kv cache in llm inference from an output perturbation perspective, 2025

    Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S Kevin Zhou. Identify critical kv cache in llm inference from an output perturbation perspective, 2025

  15. [15]

    Cache-to-cache: Direct semantic communication between large language models

    Tianyu Fu, Zihan Min, Hanling Zhang, Jichao Yan, Guohao Dai, Wanli Ouyang, and Yu Wang. Cache-to-cache: Direct semantic communication between large language models. InProceed- ings of the International Conference on Learning Representations (ICLR), 2026. Published as a conference paper at ICLR 2026. 10

  16. [16]

    Dory: Deliberative prompt recovery for llm

    Lirong Gao, Ru Peng, Yiming Zhang, and Junbo Zhao. Dory: Deliberative prompt recovery for llm. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10614–10632. Association for Computational Linguistics, August 2024

  17. [17]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma Team. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024

  18. [18]

    Large Language Model based Multi-Agents: A Survey of Progress and Challenges

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024

  19. [19]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jiaqi Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. Metagpt: Meta programming for a multi-agent collaborative framework. InInternational Conference on Learning Representations (ICLR),

  20. [20]

    Published as a conference paper

  21. [21]

    LLM Safety From Within: Detecting Harmful Content with Internal Representations

    Difan Jiao, Yilun Liu, Ye Yuan, Zhenwei Tang, Linfeng Du, Haolun Wu, and Ashton Anderson. Llm safety from within: Detecting harmful content with internal representations.arXiv preprint arXiv:2604.18519, April 2026

  22. [22]

    MAGPIE: A benchmark for multi-agent contextual privacy evaluation

    Gurusha Juneja, Jayanth Naga Sai Pasupulati, Alon Albalak, Wenyue Hua, and William Yang Wang. MAGPIE: A benchmark for multi-agent contextual privacy evaluation. InNeurIPS 2025 Workshop on Responsible Foundation Models (ResponsibleFM), 2025

  23. [23]

    Amas: Adaptively determining communication topology for llm-based multi-agent system

    Hui Yi Leong, Yuheng Li, Yuqing Wu, Wenwen Ouyang, Wei Zhu, Jiechao Gao, and Wei Han. Amas: Adaptively determining communication topology for llm-based multi-agent system. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2061–2070, November 4–9, 2025, 2025. Association for Computational Linguistics

  24. [24]

    Camel: Communicative agents for “mind” exploration of large scale language model society, 2023

    Guohao Li, Hasan Abed Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for “mind” exploration of large scale language model society, 2023

  25. [25]

    The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration

    Jiayuan Liu, Shiyi Du, Weihua Du, Mingyu Guo, and Vincent Conitzer. The consensus trap: Rescuing multi-agent llms from adversarial majorities via token-level collaboration.arXiv preprint arXiv:2604.17139, April 2026

  26. [26]

    Shadow in the cache: Unveiling and mitigating privacy risks of kv-cache in llm inference

    Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, and Zhan Qin. Shadow in the cache: Unveiling and mitigating privacy risks of kv-cache in llm inference. InProceedings of the Network and Distributed System Security (NDSS) Symposium, San Diego, CA, USA, February 2026

  27. [27]

    Llm-as-a-judge for privacy evaluation? exploring the alignment of human and llm perceptions of privacy in textual data, 2025

    Stephen Meisenbacher, Alexandra Klymenko, and Florian Matthes. Llm-as-a-judge for privacy evaluation? exploring the alignment of human and llm perceptions of privacy in textual data, 2025

  28. [28]

    Pii- compass: Guiding llm training data extraction prompts towards the target pii via grounding

    Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, and Xuebing Zhou. Pii- compass: Guiding llm training data extraction prompts towards the target pii via grounding. In Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, pages 63–73. Association for Computational Linguistics, August 2024

  29. [29]

    Feder Cooper, Daphne Ippolito, Christopher A

    Milad Nasr, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Florian Tramèr, and Katherine Lee. Scalable extraction of training data from aligned, production language models. InProceedings of the International Conference on Learning Representations (ICLR), 2025. Published ...

  30. [30]

    Ngong, Swanand Kadhe, Hao Wang, Keerthiram Murugesan, Justin D

    Ivoline C. Ngong, Swanand Kadhe, Hao Wang, Keerthiram Murugesan, Justin D. Weisz, Amit Dhurandhar, and Karthikeyan Natesan Ramamurthy. Protecting users from themselves: Safeguarding contextual privacy in interactions with conversational agents. InFindings of the Association for Computational Linguistics: ACL 2025, pages 26196–26220, July 27–August 1, 2025...

  31. [31]

    Bottlenecked transformers: Periodic kv cache consolidation for generalised reasoning

    Adnan Oomerjee, Zafeirios Fountas, Haitham Bou-Ammar, and Jun Wang. Bottlenecked transformers: Periodic kv cache consolidation for generalised reasoning. InProceedings of the International Conference on Learning Representations (ICLR), 2026

  32. [32]

    Kvflow: Efficient prefix caching for accelerating llm-based multi- agent workflows

    Zaifeng Pan, Ajjkumar Patel, Yipeng Shen, Zhengding Hu, Yue Guan, Wan-Lu Li, Lianhui Qin, Yida Wang, and Yufei Ding. Kvflow: Efficient prefix caching for accelerating llm-based multi- agent workflows. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025

  33. [33]

    Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev

    Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, and Martin Vechev. Dager: Exact gradient inversion for large language models. InProceedings of the 38th Confer- ence on Neural Information Processing Systems (NeurIPS), 2024

  34. [34]

    Scaling large language model- based multi-agent collaboration

    Chen Qian, Zihao Xie, YiFei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Scaling large language model- based multi-agent collaboration. InInternational Conference on Learning Representations (ICLR), 2025

  35. [35]

    Are the hidden states hiding something? testing the limits of factuality-encoding capabilities in llms

    Giovanni Servedio, Alessandro De Bellis, Dario Di Palma, Vito Walter Anelli, and Tommaso Di Noia. Are the hidden states hiding something? testing the limits of factuality-encoding capabilities in llms. InProceedings of the 63rd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers), pages 6089–6104. Association for Compu...

  36. [36]

    Privacylens: Evaluating privacy norm awareness of language models in action

    Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, and Diyi Yang. Privacylens: Evaluating privacy norm awareness of language models in action. In38th Conference on Neural Information Processing Systems (NeurIPS 2024), Track on Datasets and Benchmarks, 2024. Affiliations: Stanford University; Northeastern University; Harvard University

  37. [37]

    Maguire Jr., and Dejan Kostic

    Xiangyu Shi, Marco Chiesa, Gerald Q. Maguire Jr., and Dejan Kostic. Kvcomm: Enabling efficient llm communication through selective kv sharing. InProceedings of the International Conference on Learning Representations (ICLR), 2026

  38. [38]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Harts...

  39. [39]

    Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D. Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322, 2025

  40. [40]

    Agentdropout: Dynamic agent elimination for token-efficient and high-performance llm-based multi-agent collaboration

    Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, and Min Zhang. Agentdropout: Dynamic agent elimination for token-efficient and high-performance llm-based multi-agent collaboration. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24013–24035, July 27–August 1, 2...

  41. [41]

    White, Doug Burger, and Chi Wang

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Awadallah, Ryen W. White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversations. In Proceedings of the Conference on Language Modeling (COLM), 2024. 12

  42. [42]

    Guardagent: Safeguard llm agents via knowledge-enabled reasoning, 2025

    Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, and Bo Li. Guardagent: Safeguard llm agents via knowledge-enabled reasoning, 2025

  43. [43]

    Backdooring instruction-tuned large language models with virtual prompt injection

    Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, and Hongxia Jin. Backdooring instruction-tuned large language models with virtual prompt injection. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Lo...

  44. [44]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

  45. [45]

    Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems

    Yingxuan Yang, Huacan Chai, Shuai Shao, Yuanyi Song, Siyuan Qi, Renting Rui, and Weinan Zhang. Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. 39th Conference on Neural Information Processing Systems

  46. [46]

    Graph diffusion for robust multi- agent coordination

    Xianghua Zeng, Hang Su, Zhengyi Wang, and Zhiyuan Lin. Graph diffusion for robust multi- agent coordination. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, Vancouver, Canada, 2025. PMLR

  47. [47]

    Thought communication in multiagent collaboration

    Yujia Zheng, Zhuokai Zhao, Zijian Li, Yaqi Xie, Mingze Gao, Lizhu Zhang, and Kun Zhang. Thought communication in multiagent collaboration. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

  48. [48]

    MultiAgentBench : Evaluating the collaboration and competition of LLM agents

    Kunlun Zhu, Hongyi Du, Zhaochen Hong, Xiaocheng Yang, Shuyi Guo, Zhe Wang, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, and Jiaxuan You. MultiAgentBench : Evaluating the collaboration and competition of LLM agents. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the ...

  49. [49]

    Latent collabora- tion in multi-agent systems, 2025

    Jiaru Zou, Xiyuan Yang, Ruizhong Qiu, Gaotang Li, Katherine Tieu, Pan Lu, Ke Shen, Hang- hang Tong, Yejin Choi, Jingrui He, James Zou, Mengdi Wang, and Ling Yang. Latent collabora- tion in multi-agent systems, 2025. A Appendix A.1 Training Algorithm We train LCGuard using a minimax optimization procedure that alternates between an adversarial decoder (att...