LACO: Adaptive Latent Communication for Collaborative Driving

Dongman Lee; Tianhao Chen; Yuheng Wu

arxiv: 2605.22504 · v1 · pith:SPCBXVL5new · submitted 2026-05-21 · 💻 cs.AI · cs.CV

LACO: Adaptive Latent Communication for Collaborative Driving

Tianhao Chen , Yuheng Wu , Dongman Lee This is my paper

Pith reviewed 2026-05-22 06:16 UTC · model grok-4.3

classification 💻 cs.AI cs.CV

keywords collaborative drivinglatent communicationmulti-agent systemsautonomous vehiclesCARLAknowledge distillationlatency reductionpartial observability

0 comments

The pith

LACO enables low-latency collaborative driving by communicating refined latent states instead of language tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to establish that pretrained driving models can be adapted for multi-vehicle collaboration through latent communication rather than high-latency language exchanges. The key insight is that directly fusing latent states from multiple agents leads to confusion over which vehicle owns which decision. LACO counters this with three mechanisms: iterative deliberation to build shared understanding, saliency attribution to pick only important info for sending, and knowledge distillation to anchor decisions to the ego vehicle. A reader would care because safer, more efficient autonomous driving depends on quick coordination among cars with limited views of the road. Experiments in the CARLA simulator support reduced latency while keeping performance levels high.

Core claim

Motivated by the discovery that direct fusion of latent states entangles decision representations across vehicles in multi-agent settings, LACO provides a training-free paradigm for latent communication in collaborative driving. It incorporates Iterative Latent Deliberation for latent reasoning, Cross-Horizon Saliency Attribution for efficient information selection, and Structured Semantic Knowledge Distillation to stabilize ego-centric decision making, leading to notable reductions in communication and inference latency in closed-loop CARLA experiments without compromising collaborative driving performance.

What carries the argument

Agent identity confusion arising from direct fusion of latent states, which the paper identifies as the core problem that Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation are designed to solve.

If this is right

Collaborative vehicles achieve coordination with substantially lower communication overhead.
Real-time inference becomes feasible for safety-critical decisions under partial observability.
Pretrained single-agent models can be deployed in multi-agent scenarios without retraining.
Driving performance metrics such as safety and efficiency stay comparable to language-based approaches.
Scalability benefits arise as each vehicle communicates only selected salient latent information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This latent communication strategy might generalize to other multi-agent perception-action tasks like robot teams navigating shared spaces.
Future work could explore combining LACO with large language models for hybrid latent-language exchanges in edge cases.
Validating the approach on physical testbeds with real V2V hardware would test assumptions about communication channels.
The saliency selection could reduce data transmission costs in bandwidth-constrained networks beyond driving.

Load-bearing premise

Direct fusion of latent states from different vehicles creates an unavoidable entanglement of decision representations known as agent identity confusion.

What would settle it

A controlled experiment in CARLA where a simple direct fusion baseline for latent states matches LACO's latency reduction and collaborative performance levels would challenge the need for the proposed ILD, CHSA, and SSKD components.

Figures

Figures reproduced from arXiv: 2605.22504 by Dongman Lee, Tianhao Chen, Yuheng Wu.

**Figure 2.** Figure 2: (a) Agent Identity Confusion: Under full-depth fusion, the ego vehicle over-attends to collaborator latent states, resulting in policy hijacking and erroneous maneuvers. (b) Attention Sparsity: A small fraction (≈ 30%) of tokens captures the majority of attention mass, revealing substantial spatial redundancy. (c) Attention Entropy Analysis: Layer-wise entropy exhibits a global-to-local transition, followe… view at source ↗

**Figure 3.** Figure 3: Overview of LACO. Each agent performs Iterative Latent Deliberation (ILD) to embed spatial state and intent into its KV cache, selects decision-critical tokens via Cross-Horizon Saliency Attribution (CHSA), and transmits only early-layer representations through Shallow-Stream Knowledge Distillation (SSKD). The receiver integrates the message with its own reasoning stack for collaborated inference. instead… view at source ↗

**Figure 4.** Figure 4: Qualitative Visualization in closed-loop settings. (Left) [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Inference Speed-up Analysis. LACO bypasses the autoregressive linguistic bottleneck by reusing prefilled latent states. 0 20 40 60 80 100 Latent Steps 26 28 30 32 34 36 38 Driving Score 28.4 31.6 32.6 29.3 32.5 34.1 29.8 32.5 34.1 30.5 32.9 34.0 30.1 32.8 33.3 ORION SimLingo LMDrive [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 1.** Figure 1: Overview of the Town05 evaluation map used in the LangCoop benchmark. The map contains multiple predefined routes and diverse urban road structures including intersections, roundabouts, and multi-lane roads, providing challenging scenarios for evaluating collaborative autonomous driving systems. Route Completion. The Route Completion (RC) measures the fraction of the predefined route successfully traversed… view at source ↗

**Figure 2.** Figure 2: Method Camera Configuration LiDAR Orion 6 surround-view cameras – SimLingo 1 front-view camera – LMDrive 4 cameras (multi-view) 1 LiDAR [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 2.** Figure 2: Example observations from the six surround-view cameras used in the Orion system. The cameras provide a 360-degree perception coverage around the ego vehicle. Language-based Communication. Following prior work, we adopt a languagebased collaboration paradigm where each agent first performs an autoregressive reasoning process similar to chain-of-thought (CoT). During this stage, the model analyzes the dri… view at source ↗

read the original abstract

Collaborative driving aims to improve safety and efficiency by enabling connected vehicles to coordinate under partial observability. Recent approaches have evolved from sharing visual features for perception to exchanging language-based reasoning through foundation models for behavioral coordination. Though communicating in language provides intuitive information, it introduces two challenges: high latency caused by autoregressive decoding and information loss caused by compressing rich internal representations into discrete tokens. To address these challenges, we analyze latent communication in collaborative driving under inherent limitations of multi-agent settings. Our analysis reveals agent identity confusion, where direct fusion of latent states entangles decision representations across vehicles. Motivated by this, we propose LACO, a training-free \textbf{LA}tent \textbf{CO}mmunication paradigm that seamlessly adapts pretrained driving models to collaborative settings. LACO introduces Iterative Latent Deliberation (ILD) for latent reasoning, Cross-Horizon Saliency Attribution (CHSA) for communication-efficient information selection, and Structured Semantic Knowledge Distillation (SSKD) to stabilize ego-centric decision making. Closed-loop experiments in CARLA show that LACO notably reduces communication and inference latency while maintaining strong collaborative driving performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LACO gives a training-free latent-space alternative to language-based coordination in multi-agent driving that targets latency, but the asserted problem of agent identity confusion from direct fusion is not isolated with ablations or visualizations.

read the letter

The core idea is straightforward: instead of sending language tokens or raw visual features between vehicles, LACO keeps everything in the latent space of pretrained driving models and adds three lightweight modules to make the fusion work. Iterative Latent Deliberation lets agents reason over the shared latents, Cross-Horizon Saliency Attribution decides what is worth sending, and Structured Semantic Knowledge Distillation tries to keep each vehicle’s own decisions from getting tangled with the others. The claim is that this cuts both communication volume and inference time while preserving closed-loop performance in CARLA. That combination is new relative to the feature-sharing and language baselines the abstract cites, and the training-free adaptation is a practical plus for anyone who already has a working single-agent model. The experiments are closed-loop, which is the right setting for this kind of work. The soft spot is the motivation. The paper states that direct latent fusion produces “agent identity confusion” that entangles decision representations, and it uses this to justify the three modules. Yet the abstract supplies no representation visualizations, no attribution maps, and no ablation that compares plain fusion against the proposed components while holding everything else fixed. Without that isolation, it is hard to know whether the gains come from fixing the claimed confusion or from other effects such as better compression or simpler selection. The quantitative results are also missing from the abstract, so the size of the latency reduction and the performance margin remain unclear. This paper is for groups already working on V2V coordination or efficient multi-agent perception. A reader who cares about practical deployment constraints will find the framing useful even if they end up questioning the central assumption. It is coherent enough and grounded enough in a real deployment pain point that it should go to peer review rather than a desk reject; the referees can ask for the missing ablations and numbers.

Referee Report

2 major / 2 minor

Summary. The paper proposes LACO, a training-free latent communication paradigm for collaborative driving that adapts pretrained models via Iterative Latent Deliberation (ILD), Cross-Horizon Saliency Attribution (CHSA), and Structured Semantic Knowledge Distillation (SSKD). It claims that direct latent fusion causes agent identity confusion by entangling decision representations, and that the proposed components reduce communication and inference latency while preserving strong closed-loop performance in CARLA experiments.

Significance. If the quantitative results and ablations hold, LACO could offer a practical way to enable low-latency multi-agent coordination in autonomous driving without the overhead of language-based communication or full retraining, building on existing pretrained models.

major comments (2)

[Introduction / Analysis] The core motivation—that direct fusion of latent states produces agent identity confusion entangling decision representations across vehicles—is presented without isolating evidence. No visualizations, attribution maps, or controlled ablation (e.g., direct fusion vs. ILD+CHSA+SSKD) demonstrates this entanglement as the dominant failure mode rather than latency, compression artifacts, or other multi-agent dynamics. This assumption underpins the design of all three components and requires a dedicated subsection with supporting experiments.
[Experiments] The abstract and results claim notable reductions in communication and inference latency with maintained collaborative performance, yet supply no quantitative metrics, baselines, error bars, or ablation tables. Without these, it is impossible to evaluate whether the data support the central performance claim or to compare against simpler fusion baselines.

minor comments (2)

[Method] Clarify the exact pretrained models being adapted and any assumptions about their latent spaces in the method description.
[Method] Add explicit definitions or pseudocode for ILD, CHSA, and SSKD to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We agree that the manuscript would benefit from stronger isolating evidence for the agent identity confusion claim and from more detailed quantitative experimental reporting. We address each major comment below and will incorporate the requested additions in the revised version.

read point-by-point responses

Referee: [Introduction / Analysis] The core motivation—that direct fusion of latent states produces agent identity confusion entangling decision representations across vehicles—is presented without isolating evidence. No visualizations, attribution maps, or controlled ablation (e.g., direct fusion vs. ILD+CHSA+SSKD) demonstrates this entanglement as the dominant failure mode rather than latency, compression artifacts, or other multi-agent dynamics. This assumption underpins the design of all three components and requires a dedicated subsection with supporting experiments.

Authors: We acknowledge that the current analysis section motivates agent identity confusion primarily through qualitative description rather than through dedicated isolating experiments. In the revision we will add a new subsection that includes (i) saliency attribution maps contrasting direct latent fusion with the ILD+CHSA pipeline and (ii) controlled ablations that isolate identity confusion from latency and compression effects. These additions will directly test whether entanglement is the dominant failure mode. revision: yes
Referee: [Experiments] The abstract and results claim notable reductions in communication and inference latency with maintained collaborative performance, yet supply no quantitative metrics, baselines, error bars, or ablation tables. Without these, it is impossible to evaluate whether the data support the central performance claim or to compare against simpler fusion baselines.

Authors: The manuscript reports closed-loop CARLA results but presents them at a high level without the requested numerical detail. We will expand the experiments section to include explicit latency numbers (communication and inference), direct-fusion baselines, standard-error bars from multiple random seeds, and full ablation tables for ILD, CHSA, and SSKD. This will allow direct comparison and statistical evaluation of the performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: analysis-to-method chain is independent of inputs by construction

full rationale

The paper's derivation begins with an analysis of latent communication under multi-agent limits, which 'reveals agent identity confusion' from direct fusion, motivating the three components of LACO. No equations, fitted parameters renamed as predictions, or self-citation chains appear that reduce this revelation or the ILD/CHSA/SSKD modules back to the inputs by definition. The approach is explicitly training-free adaptation of existing pretrained models, with empirical validation in closed-loop CARLA experiments providing external grounding. This is a standard self-contained empirical proposal without load-bearing reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that pretrained single-agent driving models contain reusable latent representations suitable for multi-agent coordination and that the three introduced modules can resolve identity confusion without retraining. No free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Direct fusion of latent states from multiple agents causes agent identity confusion that entangles decision representations.
This finding from the authors' analysis is stated as the motivation for designing LACO's components rather than using simple fusion.

invented entities (3)

Iterative Latent Deliberation (ILD) no independent evidence
purpose: Enable latent reasoning across agents
New module introduced to perform iterative deliberation in latent space.
Cross-Horizon Saliency Attribution (CHSA) no independent evidence
purpose: Select communication-efficient information
New technique for attributing and selecting salient features across time horizons.
Structured Semantic Knowledge Distillation (SSKD) no independent evidence
purpose: Stabilize ego-centric decision making
New distillation approach to maintain per-vehicle focus while incorporating shared latents.

pith-pipeline@v0.9.0 · 5721 in / 1512 out tokens · 77715 ms · 2026-05-22T06:16:29.060338+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our analysis reveals agent identity confusion, where direct fusion of latent states entangles decision representations across vehicles. ... Iterative Latent Deliberation (ILD) ... Cross-Horizon Saliency Attribution (CHSA) ... Shallow-Stream Knowledge Distillation (SSKD)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Closed-loop experiments in CARLA show that LACO notably reduces communication and inference latency while maintaining strong collaborative driving performance.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 9 internal anchors

[1]

IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

Arnold, E., Dianati, M., De Temple, R., Fallah, S.: Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors. IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

work page 2020
[2]

In: 2019 IEEE 39th International Conference on distributed computing systems (ICDCS)

Chen, Q., Tang, S., Yang, Q., Fu, S.: Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In: 2019 IEEE 39th International Conference on distributed computing systems (ICDCS). pp. 514–524. IEEE (2019)

work page 2019
[3]

Chiang, W.L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., Zheng, L., Zhuang, S., Zhuang, Y., Gonzalez, J.E., Stoica, I., Xing, E.P.: Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality (March 2023),https://lmsys.org/ blog/2023-03-30-vicuna/

work page 2023
[4]

arXiv preprint arXiv:2502.09980 (2025)

Chiu, H.k., Hachiuma, R., Wang, C.Y., Smith, S.F., Wang, Y.C.F., Chen, M.H.: V2v-llm:Vehicle-to-vehiclecooperativeautonomousdrivingwithmulti-modallarge language models. arXiv preprint arXiv:2502.09980 (2025)

work page arXiv 2025
[5]

arXiv preprint arXiv:2601.06123 (2026)

Dery, L.M., Yahav, Z., Prior, H., Feng, Q., Shen, J., Szlam, A.: Latent space communication via kv cache alignment. arXiv preprint arXiv:2601.06123 (2026)

work page arXiv 2026
[6]

In: Conference on robot learning

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on robot learning. pp. 1–16. PMLR (2017)

work page 2017
[7]

Enabling Agents to Communicate Entirely in Latent Space

Du, Z., Wang, R., Bai, H., Cao, Z., Zhu, X., Cheng, Y., Zheng, B., Chen, W., Ying, H.: Enabling agents to communicate entirely in latent space. arXiv preprint arXiv:2511.09149 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Fu, H., Zhang, D., Zhao, Z., Cui, J., Liang, D., Zhang, C., Zhang, D., Xie, H., Wang, B., Bai, X.: Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 24823–24834 (2025)

work page 2025
[9]

arXiv preprint arXiv:2510.03215 (2025)

Fu, T., Min, Z., Zhang, H., Yan, J., Dai, G., Ouyang, W., Wang, Y.: Cache- to-cache: Direct semantic communication between large language models. arXiv preprint arXiv:2510.03215 (2025)

work page arXiv 2025
[10]

In: Proceedings of the Computer Vision and Pattern Recog- nition Conference

Gao, X., Wu, Y., Wang, R., Liu, C., Zhou, Y., Tu, Z.: Langcoop: Collaborative driving with language. In: Proceedings of the Computer Vision and Pattern Recog- nition Conference. pp. 4226–4237 (2025)

work page 2025
[11]

arXiv preprint arXiv:2501.18616 (2025)

Gao, X., Xu, R., Li, J., Wang, Z., Fan, Z., Tu, Z.: Stamp: Scalable task and model- agnostic collaborative perception. arXiv preprint arXiv:2501.18616 (2025)

work page arXiv 2025
[12]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

IEEE Intelligent Trans- portation Systems Magazine15(6), 131–151 (2023) 16 T.Chen et al

Han, Y., Zhang, H., Li, H., Jin, Y., Lang, C., Li, Y.: Collaborative perception in autonomous driving: Methods, datasets, and challenges. IEEE Intelligent Trans- portation Systems Magazine15(6), 131–151 (2023) 16 T.Chen et al

work page 2023
[14]

Training Large Language Models to Reason in a Continuous Latent Space

Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., Tian, Y.: Train- ing large language models to reason in a continuous latent space. arXiv preprint arXiv:2412.06769 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

In: The twelfth international conference on learning rep- resentations (2023)

Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., et al.: Metagpt: Meta programming for a multi-agent collaborative framework. In: The twelfth international conference on learning rep- resentations (2023)

work page 2023
[16]

Advances in neural information processing systems35, 4874–4886 (2022)

Hu, Y., Fang, S., Lei, Z., Zhong, Y., Chen, S.: Where2comm: Communication- efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems35, 4874–4886 (2022)

work page 2022
[17]

EMMA: End-to-End Multimodal Model for Autonomous Driving

Hwang, J.J., Xu, R., Lin, H., Hung, W.C., Ji, J., Choi, K., Huang, D., He, T., Cov- ington, P., Sapp, B., et al.: Emma: End-to-end multimodal model for autonomous driving. arXiv preprint arXiv:2410.23262 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

IEEE Transactions on Intelligent Vehicles (2024)

Jiang, K., Cai, X., Cui, Z., Li, A., Ren, Y., Yu, H., Yang, H., Fu, D., Wen, L., Cai, P.: Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models. IEEE Transactions on Intelligent Vehicles (2024)

work page 2024
[19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Jiang, S., Huang, Z., Qian, K., Luo, Z., Zhu, T., Zhong, Y., Tang, Y., Kong, M., Wang, Y., Jiao, S., et al.: A survey on vision-language-action models for au- tonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4524–4536 (2025)

work page 2025
[20]

arXiv preprint arXiv:2412.11673 (2024)

Karypidis, E., Kakogeorgiou, I., Gidaris, S., Komodakis, N.: Dino-foresight: Look- ing into the future with dino. arXiv preprint arXiv:2412.11673 (2024)

work page arXiv 2024
[21]

In: Proceedings of the 29th symposium on operating systems prin- ciples

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th symposium on operating systems prin- ciples. pp. 611–626 (2023)

work page 2023
[22]

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Li, Y., Xiong, K., Guo, X., Li, F., Yan, S., Xu, G., Zhou, L., Chen, L., Sun, H., Wang, B., et al.: Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving. arXiv preprint arXiv:2506.08052 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Advances in neural information processing systems36, 34892–34916 (2023)

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023)

work page 2023
[24]

In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition

Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: Multi-agent perception via communication graph grouping. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 4106–4115 (2020)

work page 2020
[25]

Renz, K., Chen, L., Arani, E., Sinavski, O.: Simlingo: Vision-only closed-loop au- tonomousdrivingwithlanguage-actionalignment.In:ProceedingsoftheComputer Vision and Pattern Recognition Conference. pp. 11993–12003 (2025)

work page 2025
[26]

arXiv preprint arXiv:2502.17416 (2025)

Saunshi, N., Dikkala, N., Li, Z., Kumar, S., Reddi, S.J.: Reasoning with latent thoughts: On the power of looped transformers. arXiv preprint arXiv:2502.17416 (2025)

work page arXiv 2025
[27]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Shao, H., Hu, Y., Wang, L., Song, G., Waslander, S.L., Liu, Y., Li, H.: Lmdrive: Closed-loop end-to-end driving with large language models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15120– 15130 (2024)

work page 2024
[28]

In: European conference on computer vision

Sima, C., Renz, K., Chitta, K., Chen, L., Zhang, H., Xie, C., Beißwenger, J., Luo, P., Geiger, A., Li, H.: Drivelm: Driving with graph visual question answering. In: European conference on computer vision. pp. 256–274. Springer (2024)

work page 2024
[29]

Betweenunderthinkingandoverthinking: Anempiricalstudyofreasoninglengthandcorrectnessinllms.arXivpreprintarXiv:2505.00127,2025

Su, J., Healey, J., Nakov, P., Cardie, C.: Between underthinking and overthinking: An empirical study of reasoning length and correctness in llms. arXiv preprint arXiv:2505.00127 (2025) LACO 17

work page arXiv 2025
[30]

arXiv preprint arXiv:2505.16552 (2025)

Tan, W., Li, J., Ju, J., Luo, Z., Song, R., Luan, J.: Think silently, think fast: Dy- namic latent compression of llm reasoning chains. arXiv preprint arXiv:2505.16552 (2025)

work page arXiv 2025
[31]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Advances in neural information pro- cessing systems30(2017)

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017)

work page 2017
[33]

In: European conference on computer vision

Wang,T.H.,Manivasagam,S.,Liang,M.,Yang,B.,Zeng,W.,Urtasun,R.:V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In: European conference on computer vision. pp. 605–621. Springer (2020)

work page 2020
[34]

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Wang, Y., Luo, W., Bai, J., Cao, Y., Che, T., Chen, K., Chen, Y., Diamond, J., Ding, Y., Ding, W., et al.: Alpamayo-r1: Bridging reasoning and action pre- diction for generalizable autonomous driving in the long tail. arXiv preprint arXiv:2511.00088 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

arXiv preprint arXiv:2503.02239 (2025)

Wu, K., Li, P., Zhou, Y., Gan, R., You, J., Cheng, Y., Zhu, J., Parker, S.T., Ran, B., Noyce, D.A., et al.: V2x-llm: Enhancing v2x integration and understanding in connected vehicle corridors. arXiv preprint arXiv:2503.02239 (2025)

work page arXiv 2025
[36]

arXiv preprint arXiv:2510.19250 (2025)

Wu, Y., Gao, X., Tau, Q., Tu, Z., Lee, D.: Background fades, foreground leads: Curriculum-guided background pruning for efficient foreground-centric collabora- tive perception. arXiv preprint arXiv:2510.19250 (2025)

work page arXiv 2025
[37]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Xu, J., Zhang, Y., Cai, Z., Huang, D.: Cosdh: communication-efficient collabora- tive perception via supply-demand awareness and intermediate-late hybridization. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6834–6843 (2025)

work page 2025
[38]

In: European confer- ence on computer vision

Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., Ma, J.: V2x-vit: Vehicle-to- everything cooperative perception with vision transformer. In: European confer- ence on computer vision. pp. 107–124. Springer (2022)

work page 2022
[39]

In: 2022 International Conference on Robotics and Automation (ICRA)

Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2583–

work page 2022
[40]

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Yang, Z., Chai, Y., Jia, X., Li, Q., Shao, Y., Zhu, X., Su, H., Yan, J.: Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving. arXiv preprint arXiv:2505.16278 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

arXiv preprint arXiv:2510.12872 (2025)

Ye, H., Gao, Z., Ma, M., Wang, Q., Fu, Y., Chung, M.Y., Lin, Y., Liu, Z., Zhang, J., Zhuo, D., et al.: Kvcomm: Online cross-context kv-cache communication for efficient llm-based multi-agent systems. arXiv preprint arXiv:2510.12872 (2025)

work page arXiv 2025
[42]

Transportation Research Part C: Emerging Technologies183, 105457 (2026)

You, J., Jiang, Z., Huang, Z., Shi, H., Gan, R., Wu, K., Cheng, X., Li, X., Ran, B.: V2x-vlm: End-to-end v2x cooperative autonomous driving through large vision- language models. Transportation Research Part C: Emerging Technologies183, 105457 (2026)

work page 2026
[43]

Advances in Neural Information Processing Systems37, 132208–132237 (2024)

Zhang, Y., Sun, R., Chen, Y., Pfister, T., Zhang, R., Arik, S.: Chain of agents: Large language models collaborating on long-context tasks. Advances in Neural Information Processing Systems37, 132208–132237 (2024)

work page 2024
[44]

Scaling Latent Reasoning via Looped Language Models

Zhu, R.J., Wang, Z., Hua, K., Zhang, T., Li, Z., Que, H., Wei, B., Wen, Z., Yin, F., Xing, H., et al.: Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741 (2025) 18 T.Chen et al

work page internal anchor Pith review Pith/arXiv arXiv 2025
[45]

Zou, J., Yang, X., Qiu, R., Li, G., Tieu, K., Lu, P., Shen, K., Tong, H., Choi, Y., He, J., et al.: Latent collaboration in multi-agent systems. arXiv preprint arXiv:2511.20639 (2025) LACO 1 LACO: Adaptive Latent Communication for Collaborative Driving Supplementary Material This supplementary material provides additional details to support the main paper...

work page arXiv 2025

[1] [1]

IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

Arnold, E., Dianati, M., De Temple, R., Fallah, S.: Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors. IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

work page 2020

[2] [2]

In: 2019 IEEE 39th International Conference on distributed computing systems (ICDCS)

Chen, Q., Tang, S., Yang, Q., Fu, S.: Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In: 2019 IEEE 39th International Conference on distributed computing systems (ICDCS). pp. 514–524. IEEE (2019)

work page 2019

[3] [3]

Chiang, W.L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., Zheng, L., Zhuang, S., Zhuang, Y., Gonzalez, J.E., Stoica, I., Xing, E.P.: Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality (March 2023),https://lmsys.org/ blog/2023-03-30-vicuna/

work page 2023

[4] [4]

arXiv preprint arXiv:2502.09980 (2025)

Chiu, H.k., Hachiuma, R., Wang, C.Y., Smith, S.F., Wang, Y.C.F., Chen, M.H.: V2v-llm:Vehicle-to-vehiclecooperativeautonomousdrivingwithmulti-modallarge language models. arXiv preprint arXiv:2502.09980 (2025)

work page arXiv 2025

[5] [5]

arXiv preprint arXiv:2601.06123 (2026)

Dery, L.M., Yahav, Z., Prior, H., Feng, Q., Shen, J., Szlam, A.: Latent space communication via kv cache alignment. arXiv preprint arXiv:2601.06123 (2026)

work page arXiv 2026

[6] [6]

In: Conference on robot learning

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on robot learning. pp. 1–16. PMLR (2017)

work page 2017

[7] [7]

Enabling Agents to Communicate Entirely in Latent Space

Du, Z., Wang, R., Bai, H., Cao, Z., Zhu, X., Cheng, Y., Zheng, B., Chen, W., Ying, H.: Enabling agents to communicate entirely in latent space. arXiv preprint arXiv:2511.09149 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Fu, H., Zhang, D., Zhao, Z., Cui, J., Liang, D., Zhang, C., Zhang, D., Xie, H., Wang, B., Bai, X.: Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 24823–24834 (2025)

work page 2025

[9] [9]

arXiv preprint arXiv:2510.03215 (2025)

Fu, T., Min, Z., Zhang, H., Yan, J., Dai, G., Ouyang, W., Wang, Y.: Cache- to-cache: Direct semantic communication between large language models. arXiv preprint arXiv:2510.03215 (2025)

work page arXiv 2025

[10] [10]

In: Proceedings of the Computer Vision and Pattern Recog- nition Conference

Gao, X., Wu, Y., Wang, R., Liu, C., Zhou, Y., Tu, Z.: Langcoop: Collaborative driving with language. In: Proceedings of the Computer Vision and Pattern Recog- nition Conference. pp. 4226–4237 (2025)

work page 2025

[11] [11]

arXiv preprint arXiv:2501.18616 (2025)

Gao, X., Xu, R., Li, J., Wang, Z., Fan, Z., Tu, Z.: Stamp: Scalable task and model- agnostic collaborative perception. arXiv preprint arXiv:2501.18616 (2025)

work page arXiv 2025

[12] [12]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

IEEE Intelligent Trans- portation Systems Magazine15(6), 131–151 (2023) 16 T.Chen et al

Han, Y., Zhang, H., Li, H., Jin, Y., Lang, C., Li, Y.: Collaborative perception in autonomous driving: Methods, datasets, and challenges. IEEE Intelligent Trans- portation Systems Magazine15(6), 131–151 (2023) 16 T.Chen et al

work page 2023

[14] [14]

Training Large Language Models to Reason in a Continuous Latent Space

Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., Tian, Y.: Train- ing large language models to reason in a continuous latent space. arXiv preprint arXiv:2412.06769 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

In: The twelfth international conference on learning rep- resentations (2023)

Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., et al.: Metagpt: Meta programming for a multi-agent collaborative framework. In: The twelfth international conference on learning rep- resentations (2023)

work page 2023

[16] [16]

Advances in neural information processing systems35, 4874–4886 (2022)

Hu, Y., Fang, S., Lei, Z., Zhong, Y., Chen, S.: Where2comm: Communication- efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems35, 4874–4886 (2022)

work page 2022

[17] [17]

EMMA: End-to-End Multimodal Model for Autonomous Driving

Hwang, J.J., Xu, R., Lin, H., Hung, W.C., Ji, J., Choi, K., Huang, D., He, T., Cov- ington, P., Sapp, B., et al.: Emma: End-to-end multimodal model for autonomous driving. arXiv preprint arXiv:2410.23262 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

IEEE Transactions on Intelligent Vehicles (2024)

Jiang, K., Cai, X., Cui, Z., Li, A., Ren, Y., Yu, H., Yang, H., Fu, D., Wen, L., Cai, P.: Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models. IEEE Transactions on Intelligent Vehicles (2024)

work page 2024

[19] [19]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Jiang, S., Huang, Z., Qian, K., Luo, Z., Zhu, T., Zhong, Y., Tang, Y., Kong, M., Wang, Y., Jiao, S., et al.: A survey on vision-language-action models for au- tonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4524–4536 (2025)

work page 2025

[20] [20]

arXiv preprint arXiv:2412.11673 (2024)

Karypidis, E., Kakogeorgiou, I., Gidaris, S., Komodakis, N.: Dino-foresight: Look- ing into the future with dino. arXiv preprint arXiv:2412.11673 (2024)

work page arXiv 2024

[21] [21]

In: Proceedings of the 29th symposium on operating systems prin- ciples

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th symposium on operating systems prin- ciples. pp. 611–626 (2023)

work page 2023

[22] [22]

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Li, Y., Xiong, K., Guo, X., Li, F., Yan, S., Xu, G., Zhou, L., Chen, L., Sun, H., Wang, B., et al.: Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving. arXiv preprint arXiv:2506.08052 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Advances in neural information processing systems36, 34892–34916 (2023)

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023)

work page 2023

[24] [24]

In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition

Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: Multi-agent perception via communication graph grouping. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 4106–4115 (2020)

work page 2020

[25] [25]

Renz, K., Chen, L., Arani, E., Sinavski, O.: Simlingo: Vision-only closed-loop au- tonomousdrivingwithlanguage-actionalignment.In:ProceedingsoftheComputer Vision and Pattern Recognition Conference. pp. 11993–12003 (2025)

work page 2025

[26] [26]

arXiv preprint arXiv:2502.17416 (2025)

Saunshi, N., Dikkala, N., Li, Z., Kumar, S., Reddi, S.J.: Reasoning with latent thoughts: On the power of looped transformers. arXiv preprint arXiv:2502.17416 (2025)

work page arXiv 2025

[27] [27]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Shao, H., Hu, Y., Wang, L., Song, G., Waslander, S.L., Liu, Y., Li, H.: Lmdrive: Closed-loop end-to-end driving with large language models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15120– 15130 (2024)

work page 2024

[28] [28]

In: European conference on computer vision

Sima, C., Renz, K., Chitta, K., Chen, L., Zhang, H., Xie, C., Beißwenger, J., Luo, P., Geiger, A., Li, H.: Drivelm: Driving with graph visual question answering. In: European conference on computer vision. pp. 256–274. Springer (2024)

work page 2024

[29] [29]

Betweenunderthinkingandoverthinking: Anempiricalstudyofreasoninglengthandcorrectnessinllms.arXivpreprintarXiv:2505.00127,2025

Su, J., Healey, J., Nakov, P., Cardie, C.: Between underthinking and overthinking: An empirical study of reasoning length and correctness in llms. arXiv preprint arXiv:2505.00127 (2025) LACO 17

work page arXiv 2025

[30] [30]

arXiv preprint arXiv:2505.16552 (2025)

Tan, W., Li, J., Ju, J., Luo, Z., Song, R., Luan, J.: Think silently, think fast: Dy- namic latent compression of llm reasoning chains. arXiv preprint arXiv:2505.16552 (2025)

work page arXiv 2025

[31] [31]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

Advances in neural information pro- cessing systems30(2017)

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017)

work page 2017

[33] [33]

In: European conference on computer vision

Wang,T.H.,Manivasagam,S.,Liang,M.,Yang,B.,Zeng,W.,Urtasun,R.:V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In: European conference on computer vision. pp. 605–621. Springer (2020)

work page 2020

[34] [34]

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Wang, Y., Luo, W., Bai, J., Cao, Y., Che, T., Chen, K., Chen, Y., Diamond, J., Ding, Y., Ding, W., et al.: Alpamayo-r1: Bridging reasoning and action pre- diction for generalizable autonomous driving in the long tail. arXiv preprint arXiv:2511.00088 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[35] [35]

arXiv preprint arXiv:2503.02239 (2025)

Wu, K., Li, P., Zhou, Y., Gan, R., You, J., Cheng, Y., Zhu, J., Parker, S.T., Ran, B., Noyce, D.A., et al.: V2x-llm: Enhancing v2x integration and understanding in connected vehicle corridors. arXiv preprint arXiv:2503.02239 (2025)

work page arXiv 2025

[36] [36]

arXiv preprint arXiv:2510.19250 (2025)

Wu, Y., Gao, X., Tau, Q., Tu, Z., Lee, D.: Background fades, foreground leads: Curriculum-guided background pruning for efficient foreground-centric collabora- tive perception. arXiv preprint arXiv:2510.19250 (2025)

work page arXiv 2025

[37] [37]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Xu, J., Zhang, Y., Cai, Z., Huang, D.: Cosdh: communication-efficient collabora- tive perception via supply-demand awareness and intermediate-late hybridization. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6834–6843 (2025)

work page 2025

[38] [38]

In: European confer- ence on computer vision

Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., Ma, J.: V2x-vit: Vehicle-to- everything cooperative perception with vision transformer. In: European confer- ence on computer vision. pp. 107–124. Springer (2022)

work page 2022

[39] [39]

In: 2022 International Conference on Robotics and Automation (ICRA)

Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2583–

work page 2022

[40] [40]

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Yang, Z., Chai, Y., Jia, X., Li, Q., Shao, Y., Zhu, X., Su, H., Yan, J.: Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving. arXiv preprint arXiv:2505.16278 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [41]

arXiv preprint arXiv:2510.12872 (2025)

Ye, H., Gao, Z., Ma, M., Wang, Q., Fu, Y., Chung, M.Y., Lin, Y., Liu, Z., Zhang, J., Zhuo, D., et al.: Kvcomm: Online cross-context kv-cache communication for efficient llm-based multi-agent systems. arXiv preprint arXiv:2510.12872 (2025)

work page arXiv 2025

[42] [42]

Transportation Research Part C: Emerging Technologies183, 105457 (2026)

You, J., Jiang, Z., Huang, Z., Shi, H., Gan, R., Wu, K., Cheng, X., Li, X., Ran, B.: V2x-vlm: End-to-end v2x cooperative autonomous driving through large vision- language models. Transportation Research Part C: Emerging Technologies183, 105457 (2026)

work page 2026

[43] [43]

Advances in Neural Information Processing Systems37, 132208–132237 (2024)

Zhang, Y., Sun, R., Chen, Y., Pfister, T., Zhang, R., Arik, S.: Chain of agents: Large language models collaborating on long-context tasks. Advances in Neural Information Processing Systems37, 132208–132237 (2024)

work page 2024

[44] [44]

Scaling Latent Reasoning via Looped Language Models

Zhu, R.J., Wang, Z., Hua, K., Zhang, T., Li, Z., Que, H., Wei, B., Wen, Z., Yin, F., Xing, H., et al.: Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741 (2025) 18 T.Chen et al

work page internal anchor Pith review Pith/arXiv arXiv 2025

[45] [45]

Zou, J., Yang, X., Qiu, R., Li, G., Tieu, K., Lu, P., Shen, K., Tong, H., Choi, Y., He, J., et al.: Latent collaboration in multi-agent systems. arXiv preprint arXiv:2511.20639 (2025) LACO 1 LACO: Adaptive Latent Communication for Collaborative Driving Supplementary Material This supplementary material provides additional details to support the main paper...

work page arXiv 2025