See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

Alvaro Velasquez; Jonathan Tremblay; Meng Wu; Qing Qu; Rene Vidal; Sijia Liu; Siyi Chen; Stan Birchfield; Valts Blukis; Xiaoyan Zhang

arxiv: 2606.13594 · v1 · pith:7HI6GGHAnew · submitted 2026-06-11 · 💻 cs.MA

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

Siyi Chen , Xiaoyan Zhang , Meng Wu , Jonathan Tremblay , Valts Blukis , Stan Birchfield , Rene Vidal , Alvaro Velasquez

show 2 more authors

Sijia Liu Qing Qu

This is my paper

Pith reviewed 2026-06-27 04:55 UTC · model grok-4.3

classification 💻 cs.MA

keywords multi-agent communicationKV cacheheterogeneous agentslatent alignmentcontext-aware transferlanguage model agents

0 comments

The pith

Heterogeneous language models can transfer both observations and reasoning by aligning their KV caches with a lightweight transformation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether agents built on different models can communicate by directly passing latent KV caches instead of decoding to text and re-encoding. It finds that context-aware transfer mainly needs sparse reasoning signals while context-unaware transfer, where the receiver sees nothing, demands dense preservation of contextual knowledge. To exploit this structure the authors introduce a cross-model cache transformation trained first for reconstruction and then for generation. Experiments on six transfer directions among Qwen3-4B, 8B and 14B models across in-domain and out-of-domain tasks show the method beats earlier heterogeneous approaches, matches or exceeds text communication at two-to-three times lower compute, and continues to work in the context-unaware regime where baselines collapse.

Core claim

A lightweight cross-model cache transformation trained in two phases enables dense KV-cache communication between heterogeneous agents, allowing them to share both what one agent sees and how it thinks.

What carries the argument

lightweight cross-model cache transformation with two-phase training (reconstruction followed by generation) that aligns KV caches across different model sizes.

If this is right

Multi-agent systems can exchange perceptions and internal states at roughly two-to-three times lower compute than text messaging in context-aware settings.
Context-unaware transfer becomes viable, allowing a receiver with no local input to still benefit from another agent's cache.
The same alignment works across model sizes from 4B to 14B on both in-domain and out-of-domain benchmarks.
Prior heterogeneous methods that rely on shared inputs or steering alone are outperformed by the dense alignment approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lightweight transformation could be tested on model families with larger architectural differences to check how far the duality generalizes.
Deployed agent teams might reduce bandwidth and latency by substituting cache transfers for text in time-critical coordination tasks.
If the reconstruction phase can be made even cheaper, the method might extend to real-time streaming communication among many agents.

Load-bearing premise

That KV-cache states from models of different sizes contain alignable information about both reasoning signals and contextual knowledge.

What would settle it

Failure of the transformed caches to improve downstream task performance when the receiver model is given only the cache and no input tokens.

Figures

Figures reproduced from arXiv: 2606.13594 by Alvaro Velasquez, Jonathan Tremblay, Meng Wu, Qing Qu, Rene Vidal, Sijia Liu, Siyi Chen, Stan Birchfield, Valts Blukis, Xiaoyan Zhang.

**Figure 1.** Figure 1: See what I see, know what I think. We study real latent mind reading across heterogeneous agents, in which one agent can read both what another agent sees and what it thinks. Guided by our latent communication information structure analysis, we learn dense alignment between agents and evaluate context-aware and context-unaware settings. Dense alignment is accurate and efficient in both regimes, surpassing… view at source ↗

**Figure 2.** Figure 2: Sparse vs. dense heterogeneous alignment. Prior sparse methods partially align reasoning (mainly in context-aware transfer) and do not preserve dense context. Our dense alignment maps sender caches into receiver-compatible caches to support both robust reasoning and dense context transfer across context-aware and context-unaware regimes. Contribution of this work. We revisit the information requirement of … view at source ↗

**Figure 3.** Figure 3: Text communication decodes and re-encodes mes [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Compressed-sensing head selection: random ablation masks estimate sender-head impor [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Sparse reasoning signal vs. dense context signal (Qwen3-4B self-communication). Accuracy is plotted against the number of KV groups kept (𝐾 of 288). Solid blue: context-aware CS filtering, where the receiver still sees the input; 𝐾 = 0 is the single-agent receiver baseline. Open blue squares: random KV-group selection. Solid red: context-unaware CS filtering, where the receiver relies only on transmitted K… view at source ↗

**Figure 6.** Figure 6: Overview of dense alignment: transform sender KV caches into receiver-compatible caches [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: PCA of KV cache latents: transformed sender caches directly overlap receiver-native [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase training: reconstruction followed by generation. Across all six directions of {Qwen3-4B, 8B, 14B} and six in-domain and out-of-domain benchmarks, our method outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at roughly 2 to 3 times lower compute, and remains effective in context-unaware transfer where prior methods collapse.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a working empirical recipe for aligning KV-caches across different-sized Qwen models so one agent can pass both observations and reasoning to another without text.

read the letter

The central result is that a lightweight cross-model cache transformation, trained first for reconstruction then for generation, lets heterogeneous agents do useful latent transfer. It covers all six directions among the 4B, 8B, and 14B Qwen3 variants and beats prior heterogeneous baselines on six in- and out-of-domain tasks. In context-aware settings it matches or exceeds text communication at 2-3x lower compute; in the harder context-unaware case where the receiver has no input, it still works while earlier methods collapse.

The two-phase training and the explicit handling of the sparse-versus-dense information split are the concrete advances over earlier homogeneous KV-cache work or restrictive heterogeneous steering methods. The experiments are broad enough to show the pattern holds across model sizes and task types.

The main limitation is that all models come from the same Qwen3 family, so the transformation has not been tested on architectures that differ more sharply. The information-structure duality is presented as useful motivation rather than a formal result, and the letter does not include ablations that isolate how much each training phase contributes. Those gaps are real but do not undermine the reported empirical pattern.

This is useful reading for anyone working on multi-agent LLM systems who cares about communication cost. A reader who wants concrete numbers on latent transfer across model scales will get value from the benchmarks. The work is coherent on its own terms and has enough empirical coverage to justify sending it to referees rather than desk-rejecting it.

Referee Report

0 major / 3 minor

Summary. The paper claims that heterogeneous agents can achieve effective 'mind reading' via dense KV-cache communication using a lightweight cross-model cache transformation trained in two phases (reconstruction then generation). Motivated by an information-structure duality (sparse reasoning signals for context-aware transfer vs. dense knowledge preservation for context-unaware), the method is evaluated across all six transfer directions among Qwen3-4B/8B/14B models on six in- and out-of-domain benchmarks. It reportedly outperforms prior heterogeneous baselines, matches or exceeds text communication in context-aware settings at 2-3x lower compute, and succeeds in context-unaware transfer where baselines collapse.

Significance. If the empirical results hold, the work is significant for multi-agent systems: it moves beyond homogeneous KV-cache methods and restrictive heterogeneous approaches (shared input, steering-only) to enable practical cross-model latent alignment. The two-phase training and duality analysis offer a concrete, scalable path to lower-cost communication that preserves both observations and reasoning, with potential impact on efficient agent coordination.

minor comments (3)

The abstract states results 'across all six directions' and 'six benchmarks' but does not name the specific transfer pairs or benchmarks; adding these in §4 or a table would improve reproducibility without altering the central claim.
The information-structure duality is described as revealed by analysis yet presented only motivationally; a brief quantitative illustration (e.g., sparsity metrics or activation statistics) in the methods section would strengthen the motivation without requiring new theorems.
Compute claims ('roughly 2 to 3 times lower') are given relative to text communication; specifying the exact metric (tokens, FLOPs, or wall-clock) and the baseline implementation details would clarify the comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work, the recognition of its significance for multi-agent systems, and the recommendation for minor revision. We are pleased that the empirical results and the proposed two-phase training approach were viewed favorably.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical method for heterogeneous KV-cache communication using a lightweight cross-model transformation trained in two phases, motivated by an information-structure duality analysis. All central claims rest on reported outperformance across six transfer directions, three model sizes, and six benchmarks, with direct comparisons to baselines and text communication. No equations, fitted parameters renamed as predictions, self-citation load-bearing theorems, or self-definitional reductions appear in the derivation chain; the work is self-contained against external benchmarks and does not reduce its results to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the duality analysis is treated as an empirical observation rather than a formal axiom.

pith-pipeline@v0.9.1-grok · 5783 in / 997 out tokens · 22483 ms · 2026-06-27T04:55:50.209403+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 13 linked inside Pith

[1]

First conference on language modeling , year=

Autogen: Enabling next-gen LLM applications via multi-agent conversations , author=. First conference on language modeling , year=
[2]

Advances in Neural Information Processing Systems , volume=

Rema: Learning to meta-think for llms with multi-agent reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=
[3]

International Conference on Learning Representations , volume=

MetaGPT: Meta programming for a multi-agent collaborative framework , author=. International Conference on Learning Representations , volume=
[4]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Optima: Optimizing effectiveness and efficiency for llm-based multi-agent system , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[5]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Agentdropout: Dynamic agent elimination for token-efficient and high-performance llm-based multi-agent collaboration , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[6]

International Conference on Learning Representations , volume=

Cut the crap: An economical communication pipeline for llm-based multi-agent systems , author=. International Conference on Learning Representations , volume=
[7]

Advances in Neural Information Processing Systems , volume=

Thought communication in multiagent collaboration , author=. Advances in Neural Information Processing Systems , volume=
[8]

International Conference on Learning Representations , volume=

Let models speak ciphers: Multiagent debate through embeddings , author=. International Conference on Learning Representations , volume=
[9]

arXiv preprint arXiv:2412.06769 , year=

Training large language models to reason in a continuous latent space , author=. arXiv preprint arXiv:2412.06769 , year=

Pith/arXiv arXiv
[10]

arXiv preprint arXiv:2511.09149 , year=

Enabling agents to communicate entirely in latent space , author=. arXiv preprint arXiv:2511.09149 , year=

Pith/arXiv arXiv
[11]

arXiv preprint arXiv:2501.14082 , year=

Communicating activations between language model agents , author=. arXiv preprint arXiv:2501.14082 , year=

arXiv
[12]

arXiv preprint arXiv:2510.03346 , year=

KVComm: Enabling Efficient LLM Communication through Selective KV Sharing , author=. arXiv preprint arXiv:2510.03346 , year=

arXiv
[13]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Augmenting Multi-Agent Communication with State Delta Trajectory , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[14]

arXiv preprint arXiv:2510.03215 , year=

Cache-to-cache: Direct semantic communication between large language models , author=. arXiv preprint arXiv:2510.03215 , year=

arXiv
[15]

arXiv preprint arXiv:2604.13349 , year=

When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration , author=. arXiv preprint arXiv:2604.13349 , year=

Pith/arXiv arXiv
[16]

arXiv preprint arXiv:2604.25917 , year=

Recursive multi-agent systems , author=. arXiv preprint arXiv:2604.25917 , year=

Pith/arXiv arXiv
[17]

arXiv preprint arXiv:2509.21164 , year=

Mixture of thoughts: Learning to aggregate what experts think, not just what they say , author=. arXiv preprint arXiv:2509.21164 , year=

arXiv
[18]

arXiv preprint arXiv:2602.03695 , year=

Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems , author=. arXiv preprint arXiv:2602.03695 , year=

Pith/arXiv arXiv
[19]

arXiv preprint arXiv:2602.15382 , year=

The vision wormhole: Latent-space communication in heterogeneous multi-agent systems , author=. arXiv preprint arXiv:2602.15382 , year=

Pith/arXiv arXiv
[20]

arXiv preprint arXiv:2411.02820 , year=

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving , author=. arXiv preprint arXiv:2411.02820 , year=

arXiv
[21]

arXiv preprint arXiv:2511.20639 , year=

Latent collaboration in multi-agent systems , author=. arXiv preprint arXiv:2511.20639 , year=

Pith/arXiv arXiv
[22]

arXiv preprint arXiv:2603.03335 , year=

Identifying Functional Heads of Large Language Models with Compressed Sensing , author=. arXiv preprint arXiv:2603.03335 , year=

arXiv
[23]

International Conference on Learning Representations , volume=

Aflow: Automating agentic workflow generation , author=. International Conference on Learning Representations , volume=
[24]

arXiv preprint arXiv:2501.06322 , year=

Multi-agent collaboration mechanisms: A survey of llms , author=. arXiv preprint arXiv:2501.06322 , year=

Pith/arXiv arXiv
[25]

Advances in Neural Information Processing Systems , volume=

Sirius: Self-improving multi-agent systems via bootstrapped reasoning , author=. Advances in Neural Information Processing Systems , volume=
[26]

arXiv preprint arXiv:2604.21794 , year=

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems , author=. arXiv preprint arXiv:2604.21794 , year=

Pith/arXiv arXiv
[27]

arXiv preprint arXiv:2110.14168 , year=

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

Pith/arXiv arXiv
[28]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv
[29]

arXiv preprint arXiv:1803.05457 , year=

Think you have solved question answering? try arc, the ai2 reasoning challenge , author=. arXiv preprint arXiv:1803.05457 , year=

Pith/arXiv arXiv
[30]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[31]

2025 International Joint Conference on Neural Networks (IJCNN) , pages=

Llm-medqa: Enhancing medical question answering through case studies in large language models , author=. 2025 International Joint Conference on Neural Networks (IJCNN) , pages=. 2025 , organization=

2025
[32]

Are we done with mmlu? , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[33]

EMNLP , year=

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , author=. EMNLP , year=

[1] [1]

First conference on language modeling , year=

Autogen: Enabling next-gen LLM applications via multi-agent conversations , author=. First conference on language modeling , year=

[2] [2]

Advances in Neural Information Processing Systems , volume=

Rema: Learning to meta-think for llms with multi-agent reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

[3] [3]

International Conference on Learning Representations , volume=

MetaGPT: Meta programming for a multi-agent collaborative framework , author=. International Conference on Learning Representations , volume=

[4] [4]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Optima: Optimizing effectiveness and efficiency for llm-based multi-agent system , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[5] [5]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Agentdropout: Dynamic agent elimination for token-efficient and high-performance llm-based multi-agent collaboration , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[6] [6]

International Conference on Learning Representations , volume=

Cut the crap: An economical communication pipeline for llm-based multi-agent systems , author=. International Conference on Learning Representations , volume=

[7] [7]

Advances in Neural Information Processing Systems , volume=

Thought communication in multiagent collaboration , author=. Advances in Neural Information Processing Systems , volume=

[8] [8]

International Conference on Learning Representations , volume=

Let models speak ciphers: Multiagent debate through embeddings , author=. International Conference on Learning Representations , volume=

[9] [9]

arXiv preprint arXiv:2412.06769 , year=

Training large language models to reason in a continuous latent space , author=. arXiv preprint arXiv:2412.06769 , year=

Pith/arXiv arXiv

[10] [10]

arXiv preprint arXiv:2511.09149 , year=

Enabling agents to communicate entirely in latent space , author=. arXiv preprint arXiv:2511.09149 , year=

Pith/arXiv arXiv

[11] [11]

arXiv preprint arXiv:2501.14082 , year=

Communicating activations between language model agents , author=. arXiv preprint arXiv:2501.14082 , year=

arXiv

[12] [12]

arXiv preprint arXiv:2510.03346 , year=

KVComm: Enabling Efficient LLM Communication through Selective KV Sharing , author=. arXiv preprint arXiv:2510.03346 , year=

arXiv

[13] [13]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Augmenting Multi-Agent Communication with State Delta Trajectory , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[14] [14]

arXiv preprint arXiv:2510.03215 , year=

Cache-to-cache: Direct semantic communication between large language models , author=. arXiv preprint arXiv:2510.03215 , year=

arXiv

[15] [15]

arXiv preprint arXiv:2604.13349 , year=

When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration , author=. arXiv preprint arXiv:2604.13349 , year=

Pith/arXiv arXiv

[16] [16]

arXiv preprint arXiv:2604.25917 , year=

Recursive multi-agent systems , author=. arXiv preprint arXiv:2604.25917 , year=

Pith/arXiv arXiv

[17] [17]

arXiv preprint arXiv:2509.21164 , year=

Mixture of thoughts: Learning to aggregate what experts think, not just what they say , author=. arXiv preprint arXiv:2509.21164 , year=

arXiv

[18] [18]

arXiv preprint arXiv:2602.03695 , year=

Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems , author=. arXiv preprint arXiv:2602.03695 , year=

Pith/arXiv arXiv

[19] [19]

arXiv preprint arXiv:2602.15382 , year=

The vision wormhole: Latent-space communication in heterogeneous multi-agent systems , author=. arXiv preprint arXiv:2602.15382 , year=

Pith/arXiv arXiv

[20] [20]

arXiv preprint arXiv:2411.02820 , year=

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving , author=. arXiv preprint arXiv:2411.02820 , year=

arXiv

[21] [21]

arXiv preprint arXiv:2511.20639 , year=

Latent collaboration in multi-agent systems , author=. arXiv preprint arXiv:2511.20639 , year=

Pith/arXiv arXiv

[22] [22]

arXiv preprint arXiv:2603.03335 , year=

Identifying Functional Heads of Large Language Models with Compressed Sensing , author=. arXiv preprint arXiv:2603.03335 , year=

arXiv

[23] [23]

International Conference on Learning Representations , volume=

Aflow: Automating agentic workflow generation , author=. International Conference on Learning Representations , volume=

[24] [24]

arXiv preprint arXiv:2501.06322 , year=

Multi-agent collaboration mechanisms: A survey of llms , author=. arXiv preprint arXiv:2501.06322 , year=

Pith/arXiv arXiv

[25] [25]

Advances in Neural Information Processing Systems , volume=

Sirius: Self-improving multi-agent systems via bootstrapped reasoning , author=. Advances in Neural Information Processing Systems , volume=

[26] [26]

arXiv preprint arXiv:2604.21794 , year=

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems , author=. arXiv preprint arXiv:2604.21794 , year=

Pith/arXiv arXiv

[27] [27]

arXiv preprint arXiv:2110.14168 , year=

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

Pith/arXiv arXiv

[28] [28]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv

[29] [29]

arXiv preprint arXiv:1803.05457 , year=

Think you have solved question answering? try arc, the ai2 reasoning challenge , author=. arXiv preprint arXiv:1803.05457 , year=

Pith/arXiv arXiv

[30] [30]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[31] [31]

2025 International Joint Conference on Neural Networks (IJCNN) , pages=

Llm-medqa: Enhancing medical question answering through case studies in large language models , author=. 2025 International Joint Conference on Neural Networks (IJCNN) , pages=. 2025 , organization=

2025

[32] [32]

Are we done with mmlu? , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025

[33] [33]

EMNLP , year=

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , author=. EMNLP , year=