pith. sign in

arxiv: 2605.22504 · v1 · pith:SPCBXVL5new · submitted 2026-05-21 · 💻 cs.AI · cs.CV

LACO: Adaptive Latent Communication for Collaborative Driving

Pith reviewed 2026-05-22 06:16 UTC · model grok-4.3

classification 💻 cs.AI cs.CV
keywords collaborative drivinglatent communicationmulti-agent systemsautonomous vehiclesCARLAknowledge distillationlatency reductionpartial observability
0
0 comments X

The pith

LACO enables low-latency collaborative driving by communicating refined latent states instead of language tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to establish that pretrained driving models can be adapted for multi-vehicle collaboration through latent communication rather than high-latency language exchanges. The key insight is that directly fusing latent states from multiple agents leads to confusion over which vehicle owns which decision. LACO counters this with three mechanisms: iterative deliberation to build shared understanding, saliency attribution to pick only important info for sending, and knowledge distillation to anchor decisions to the ego vehicle. A reader would care because safer, more efficient autonomous driving depends on quick coordination among cars with limited views of the road. Experiments in the CARLA simulator support reduced latency while keeping performance levels high.

Core claim

Motivated by the discovery that direct fusion of latent states entangles decision representations across vehicles in multi-agent settings, LACO provides a training-free paradigm for latent communication in collaborative driving. It incorporates Iterative Latent Deliberation for latent reasoning, Cross-Horizon Saliency Attribution for efficient information selection, and Structured Semantic Knowledge Distillation to stabilize ego-centric decision making, leading to notable reductions in communication and inference latency in closed-loop CARLA experiments without compromising collaborative driving performance.

What carries the argument

Agent identity confusion arising from direct fusion of latent states, which the paper identifies as the core problem that Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation are designed to solve.

If this is right

  • Collaborative vehicles achieve coordination with substantially lower communication overhead.
  • Real-time inference becomes feasible for safety-critical decisions under partial observability.
  • Pretrained single-agent models can be deployed in multi-agent scenarios without retraining.
  • Driving performance metrics such as safety and efficiency stay comparable to language-based approaches.
  • Scalability benefits arise as each vehicle communicates only selected salient latent information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This latent communication strategy might generalize to other multi-agent perception-action tasks like robot teams navigating shared spaces.
  • Future work could explore combining LACO with large language models for hybrid latent-language exchanges in edge cases.
  • Validating the approach on physical testbeds with real V2V hardware would test assumptions about communication channels.
  • The saliency selection could reduce data transmission costs in bandwidth-constrained networks beyond driving.

Load-bearing premise

Direct fusion of latent states from different vehicles creates an unavoidable entanglement of decision representations known as agent identity confusion.

What would settle it

A controlled experiment in CARLA where a simple direct fusion baseline for latent states matches LACO's latency reduction and collaborative performance levels would challenge the need for the proposed ILD, CHSA, and SSKD components.

Figures

Figures reproduced from arXiv: 2605.22504 by Dongman Lee, Tianhao Chen, Yuheng Wu.

Figure 1
Figure 1. Figure 1: Communication interfaces for multi-agent VLA collaboration. Top: [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Agent Identity Confusion: Under full-depth fusion, the ego vehicle over-attends to collaborator latent states, resulting in policy hijacking and erroneous maneuvers. (b) Attention Sparsity: A small fraction (≈ 30%) of tokens captures the majority of attention mass, revealing substantial spatial redundancy. (c) Attention Entropy Analysis: Layer-wise entropy exhibits a global-to-local transition, followe… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of LACO. Each agent performs Iterative Latent Deliberation (ILD) to embed spatial state and intent into its KV cache, selects decision-critical tokens via Cross-Horizon Saliency Attribution (CHSA), and transmits only early-layer representa￾tions through Shallow-Stream Knowledge Distillation (SSKD). The receiver integrates the message with its own reasoning stack for collaborated inference. instead… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Visualization in closed-loop settings. (Left) [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Inference Speed-up Analysis. LACO bypasses the autoregressive lin￾guistic bottleneck by reusing prefilled la￾tent states. 0 20 40 60 80 100 Latent Steps 26 28 30 32 34 36 38 Driving Score 28.4 31.6 32.6 29.3 32.5 34.1 29.8 32.5 34.1 30.5 32.9 34.0 30.1 32.8 33.3 ORION SimLingo LMDrive [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 1
Figure 1. Figure 1: Overview of the Town05 evaluation map used in the LangCoop benchmark. The map contains multiple predefined routes and diverse urban road structures including intersections, roundabouts, and multi-lane roads, providing challenging scenarios for evaluating collaborative autonomous driving systems. Route Completion. The Route Completion (RC) measures the fraction of the predefined route successfully traversed… view at source ↗
Figure 2
Figure 2. Figure 2: Method Camera Configuration LiDAR Orion 6 surround-view cameras – SimLingo 1 front-view camera – LMDrive 4 cameras (multi-view) 1 LiDAR [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example observations from the six surround-view cameras used in the Orion system. The cameras provide a 360-degree perception coverage around the ego vehicle. Language-based Communication. Following prior work, we adopt a language￾based collaboration paradigm where each agent first performs an autoregres￾sive reasoning process similar to chain-of-thought (CoT). During this stage, the model analyzes the dri… view at source ↗
read the original abstract

Collaborative driving aims to improve safety and efficiency by enabling connected vehicles to coordinate under partial observability. Recent approaches have evolved from sharing visual features for perception to exchanging language-based reasoning through foundation models for behavioral coordination. Though communicating in language provides intuitive information, it introduces two challenges: high latency caused by autoregressive decoding and information loss caused by compressing rich internal representations into discrete tokens. To address these challenges, we analyze latent communication in collaborative driving under inherent limitations of multi-agent settings. Our analysis reveals agent identity confusion, where direct fusion of latent states entangles decision representations across vehicles. Motivated by this, we propose LACO, a training-free \textbf{LA}tent \textbf{CO}mmunication paradigm that seamlessly adapts pretrained driving models to collaborative settings. LACO introduces Iterative Latent Deliberation (ILD) for latent reasoning, Cross-Horizon Saliency Attribution (CHSA) for communication-efficient information selection, and Structured Semantic Knowledge Distillation (SSKD) to stabilize ego-centric decision making. Closed-loop experiments in CARLA show that LACO notably reduces communication and inference latency while maintaining strong collaborative driving performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes LACO, a training-free latent communication paradigm for collaborative driving that adapts pretrained models via Iterative Latent Deliberation (ILD), Cross-Horizon Saliency Attribution (CHSA), and Structured Semantic Knowledge Distillation (SSKD). It claims that direct latent fusion causes agent identity confusion by entangling decision representations, and that the proposed components reduce communication and inference latency while preserving strong closed-loop performance in CARLA experiments.

Significance. If the quantitative results and ablations hold, LACO could offer a practical way to enable low-latency multi-agent coordination in autonomous driving without the overhead of language-based communication or full retraining, building on existing pretrained models.

major comments (2)
  1. [Introduction / Analysis] The core motivation—that direct fusion of latent states produces agent identity confusion entangling decision representations across vehicles—is presented without isolating evidence. No visualizations, attribution maps, or controlled ablation (e.g., direct fusion vs. ILD+CHSA+SSKD) demonstrates this entanglement as the dominant failure mode rather than latency, compression artifacts, or other multi-agent dynamics. This assumption underpins the design of all three components and requires a dedicated subsection with supporting experiments.
  2. [Experiments] The abstract and results claim notable reductions in communication and inference latency with maintained collaborative performance, yet supply no quantitative metrics, baselines, error bars, or ablation tables. Without these, it is impossible to evaluate whether the data support the central performance claim or to compare against simpler fusion baselines.
minor comments (2)
  1. [Method] Clarify the exact pretrained models being adapted and any assumptions about their latent spaces in the method description.
  2. [Method] Add explicit definitions or pseudocode for ILD, CHSA, and SSKD to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We agree that the manuscript would benefit from stronger isolating evidence for the agent identity confusion claim and from more detailed quantitative experimental reporting. We address each major comment below and will incorporate the requested additions in the revised version.

read point-by-point responses
  1. Referee: [Introduction / Analysis] The core motivation—that direct fusion of latent states produces agent identity confusion entangling decision representations across vehicles—is presented without isolating evidence. No visualizations, attribution maps, or controlled ablation (e.g., direct fusion vs. ILD+CHSA+SSKD) demonstrates this entanglement as the dominant failure mode rather than latency, compression artifacts, or other multi-agent dynamics. This assumption underpins the design of all three components and requires a dedicated subsection with supporting experiments.

    Authors: We acknowledge that the current analysis section motivates agent identity confusion primarily through qualitative description rather than through dedicated isolating experiments. In the revision we will add a new subsection that includes (i) saliency attribution maps contrasting direct latent fusion with the ILD+CHSA pipeline and (ii) controlled ablations that isolate identity confusion from latency and compression effects. These additions will directly test whether entanglement is the dominant failure mode. revision: yes

  2. Referee: [Experiments] The abstract and results claim notable reductions in communication and inference latency with maintained collaborative performance, yet supply no quantitative metrics, baselines, error bars, or ablation tables. Without these, it is impossible to evaluate whether the data support the central performance claim or to compare against simpler fusion baselines.

    Authors: The manuscript reports closed-loop CARLA results but presents them at a high level without the requested numerical detail. We will expand the experiments section to include explicit latency numbers (communication and inference), direct-fusion baselines, standard-error bars from multiple random seeds, and full ablation tables for ILD, CHSA, and SSKD. This will allow direct comparison and statistical evaluation of the performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: analysis-to-method chain is independent of inputs by construction

full rationale

The paper's derivation begins with an analysis of latent communication under multi-agent limits, which 'reveals agent identity confusion' from direct fusion, motivating the three components of LACO. No equations, fitted parameters renamed as predictions, or self-citation chains appear that reduce this revelation or the ILD/CHSA/SSKD modules back to the inputs by definition. The approach is explicitly training-free adaptation of existing pretrained models, with empirical validation in closed-loop CARLA experiments providing external grounding. This is a standard self-contained empirical proposal without load-bearing reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that pretrained single-agent driving models contain reusable latent representations suitable for multi-agent coordination and that the three introduced modules can resolve identity confusion without retraining. No free parameters or invented physical entities are mentioned.

axioms (1)
  • domain assumption Direct fusion of latent states from multiple agents causes agent identity confusion that entangles decision representations.
    This finding from the authors' analysis is stated as the motivation for designing LACO's components rather than using simple fusion.
invented entities (3)
  • Iterative Latent Deliberation (ILD) no independent evidence
    purpose: Enable latent reasoning across agents
    New module introduced to perform iterative deliberation in latent space.
  • Cross-Horizon Saliency Attribution (CHSA) no independent evidence
    purpose: Select communication-efficient information
    New technique for attributing and selecting salient features across time horizons.
  • Structured Semantic Knowledge Distillation (SSKD) no independent evidence
    purpose: Stabilize ego-centric decision making
    New distillation approach to maintain per-vehicle focus while incorporating shared latents.

pith-pipeline@v0.9.0 · 5721 in / 1512 out tokens · 77715 ms · 2026-05-22T06:16:29.060338+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 9 internal anchors

  1. [1]

    IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

    Arnold, E., Dianati, M., De Temple, R., Fallah, S.: Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors. IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

  2. [2]

    In: 2019 IEEE 39th International Conference on distributed computing systems (ICDCS)

    Chen, Q., Tang, S., Yang, Q., Fu, S.: Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In: 2019 IEEE 39th International Conference on distributed computing systems (ICDCS). pp. 514–524. IEEE (2019)

  3. [3]

    Chiang, W.L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., Zheng, L., Zhuang, S., Zhuang, Y., Gonzalez, J.E., Stoica, I., Xing, E.P.: Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality (March 2023),https://lmsys.org/ blog/2023-03-30-vicuna/

  4. [4]

    arXiv preprint arXiv:2502.09980 (2025)

    Chiu, H.k., Hachiuma, R., Wang, C.Y., Smith, S.F., Wang, Y.C.F., Chen, M.H.: V2v-llm:Vehicle-to-vehiclecooperativeautonomousdrivingwithmulti-modallarge language models. arXiv preprint arXiv:2502.09980 (2025)

  5. [5]

    arXiv preprint arXiv:2601.06123 (2026)

    Dery, L.M., Yahav, Z., Prior, H., Feng, Q., Shen, J., Szlam, A.: Latent space communication via kv cache alignment. arXiv preprint arXiv:2601.06123 (2026)

  6. [6]

    In: Conference on robot learning

    Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on robot learning. pp. 1–16. PMLR (2017)

  7. [7]

    Enabling Agents to Communicate Entirely in Latent Space

    Du, Z., Wang, R., Bai, H., Cao, Z., Zhu, X., Cheng, Y., Zheng, B., Chen, W., Ying, H.: Enabling agents to communicate entirely in latent space. arXiv preprint arXiv:2511.09149 (2025)

  8. [8]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Fu, H., Zhang, D., Zhao, Z., Cui, J., Liang, D., Zhang, C., Zhang, D., Xie, H., Wang, B., Bai, X.: Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 24823–24834 (2025)

  9. [9]

    arXiv preprint arXiv:2510.03215 (2025)

    Fu, T., Min, Z., Zhang, H., Yan, J., Dai, G., Ouyang, W., Wang, Y.: Cache- to-cache: Direct semantic communication between large language models. arXiv preprint arXiv:2510.03215 (2025)

  10. [10]

    In: Proceedings of the Computer Vision and Pattern Recog- nition Conference

    Gao, X., Wu, Y., Wang, R., Liu, C., Zhou, Y., Tu, Z.: Langcoop: Collaborative driving with language. In: Proceedings of the Computer Vision and Pattern Recog- nition Conference. pp. 4226–4237 (2025)

  11. [11]

    arXiv preprint arXiv:2501.18616 (2025)

    Gao, X., Xu, R., Li, J., Wang, Z., Fan, Z., Tu, Z.: Stamp: Scalable task and model- agnostic collaborative perception. arXiv preprint arXiv:2501.18616 (2025)

  12. [12]

    Large Language Model based Multi-Agents: A Survey of Progress and Challenges

    Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024)

  13. [13]

    IEEE Intelligent Trans- portation Systems Magazine15(6), 131–151 (2023) 16 T.Chen et al

    Han, Y., Zhang, H., Li, H., Jin, Y., Lang, C., Li, Y.: Collaborative perception in autonomous driving: Methods, datasets, and challenges. IEEE Intelligent Trans- portation Systems Magazine15(6), 131–151 (2023) 16 T.Chen et al

  14. [14]

    Training Large Language Models to Reason in a Continuous Latent Space

    Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., Tian, Y.: Train- ing large language models to reason in a continuous latent space. arXiv preprint arXiv:2412.06769 (2024)

  15. [15]

    In: The twelfth international conference on learning rep- resentations (2023)

    Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., et al.: Metagpt: Meta programming for a multi-agent collaborative framework. In: The twelfth international conference on learning rep- resentations (2023)

  16. [16]

    Advances in neural information processing systems35, 4874–4886 (2022)

    Hu, Y., Fang, S., Lei, Z., Zhong, Y., Chen, S.: Where2comm: Communication- efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems35, 4874–4886 (2022)

  17. [17]

    EMMA: End-to-End Multimodal Model for Autonomous Driving

    Hwang, J.J., Xu, R., Lin, H., Hung, W.C., Ji, J., Choi, K., Huang, D., He, T., Cov- ington, P., Sapp, B., et al.: Emma: End-to-end multimodal model for autonomous driving. arXiv preprint arXiv:2410.23262 (2024)

  18. [18]

    IEEE Transactions on Intelligent Vehicles (2024)

    Jiang, K., Cai, X., Cui, Z., Li, A., Ren, Y., Yu, H., Yang, H., Fu, D., Wen, L., Cai, P.: Koma: Knowledge-driven multi-agent framework for autonomous driving with large language models. IEEE Transactions on Intelligent Vehicles (2024)

  19. [19]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Jiang, S., Huang, Z., Qian, K., Luo, Z., Zhu, T., Zhong, Y., Tang, Y., Kong, M., Wang, Y., Jiao, S., et al.: A survey on vision-language-action models for au- tonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4524–4536 (2025)

  20. [20]

    arXiv preprint arXiv:2412.11673 (2024)

    Karypidis, E., Kakogeorgiou, I., Gidaris, S., Komodakis, N.: Dino-foresight: Look- ing into the future with dino. arXiv preprint arXiv:2412.11673 (2024)

  21. [21]

    In: Proceedings of the 29th symposium on operating systems prin- ciples

    Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th symposium on operating systems prin- ciples. pp. 611–626 (2023)

  22. [22]

    ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

    Li, Y., Xiong, K., Guo, X., Li, F., Yan, S., Xu, G., Zhou, L., Chen, L., Sun, H., Wang, B., et al.: Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving. arXiv preprint arXiv:2506.08052 (2025)

  23. [23]

    Advances in neural information processing systems36, 34892–34916 (2023)

    Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023)

  24. [24]

    In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition

    Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: Multi-agent perception via communication graph grouping. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 4106–4115 (2020)

  25. [25]

    Renz, K., Chen, L., Arani, E., Sinavski, O.: Simlingo: Vision-only closed-loop au- tonomousdrivingwithlanguage-actionalignment.In:ProceedingsoftheComputer Vision and Pattern Recognition Conference. pp. 11993–12003 (2025)

  26. [26]

    arXiv preprint arXiv:2502.17416 (2025)

    Saunshi, N., Dikkala, N., Li, Z., Kumar, S., Reddi, S.J.: Reasoning with latent thoughts: On the power of looped transformers. arXiv preprint arXiv:2502.17416 (2025)

  27. [27]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Shao, H., Hu, Y., Wang, L., Song, G., Waslander, S.L., Liu, Y., Li, H.: Lmdrive: Closed-loop end-to-end driving with large language models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15120– 15130 (2024)

  28. [28]

    In: European conference on computer vision

    Sima, C., Renz, K., Chitta, K., Chen, L., Zhang, H., Xie, C., Beißwenger, J., Luo, P., Geiger, A., Li, H.: Drivelm: Driving with graph visual question answering. In: European conference on computer vision. pp. 256–274. Springer (2024)

  29. [29]

    Betweenunderthinkingandoverthinking: Anempiricalstudyofreasoninglengthandcorrectnessinllms.arXivpreprintarXiv:2505.00127,2025

    Su, J., Healey, J., Nakov, P., Cardie, C.: Between underthinking and overthinking: An empirical study of reasoning length and correctness in llms. arXiv preprint arXiv:2505.00127 (2025) LACO 17

  30. [30]

    arXiv preprint arXiv:2505.16552 (2025)

    Tan, W., Li, J., Ju, J., Luo, Z., Song, R., Luan, J.: Think silently, think fast: Dy- namic latent compression of llm reasoning chains. arXiv preprint arXiv:2505.16552 (2025)

  31. [31]

    LLaMA: Open and Efficient Foundation Language Models

    Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  32. [32]

    Advances in neural information pro- cessing systems30(2017)

    Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017)

  33. [33]

    In: European conference on computer vision

    Wang,T.H.,Manivasagam,S.,Liang,M.,Yang,B.,Zeng,W.,Urtasun,R.:V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In: European conference on computer vision. pp. 605–621. Springer (2020)

  34. [34]

    Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

    Wang, Y., Luo, W., Bai, J., Cao, Y., Che, T., Chen, K., Chen, Y., Diamond, J., Ding, Y., Ding, W., et al.: Alpamayo-r1: Bridging reasoning and action pre- diction for generalizable autonomous driving in the long tail. arXiv preprint arXiv:2511.00088 (2025)

  35. [35]

    arXiv preprint arXiv:2503.02239 (2025)

    Wu, K., Li, P., Zhou, Y., Gan, R., You, J., Cheng, Y., Zhu, J., Parker, S.T., Ran, B., Noyce, D.A., et al.: V2x-llm: Enhancing v2x integration and understanding in connected vehicle corridors. arXiv preprint arXiv:2503.02239 (2025)

  36. [36]

    arXiv preprint arXiv:2510.19250 (2025)

    Wu, Y., Gao, X., Tau, Q., Tu, Z., Lee, D.: Background fades, foreground leads: Curriculum-guided background pruning for efficient foreground-centric collabora- tive perception. arXiv preprint arXiv:2510.19250 (2025)

  37. [37]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Xu, J., Zhang, Y., Cai, Z., Huang, D.: Cosdh: communication-efficient collabora- tive perception via supply-demand awareness and intermediate-late hybridization. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6834–6843 (2025)

  38. [38]

    In: European confer- ence on computer vision

    Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., Ma, J.: V2x-vit: Vehicle-to- everything cooperative perception with vision transformer. In: European confer- ence on computer vision. pp. 107–124. Springer (2022)

  39. [39]

    In: 2022 International Conference on Robotics and Automation (ICRA)

    Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2583–

  40. [40]

    DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

    Yang, Z., Chai, Y., Jia, X., Li, Q., Shao, Y., Zhu, X., Su, H., Yan, J.: Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving. arXiv preprint arXiv:2505.16278 (2025)

  41. [41]

    arXiv preprint arXiv:2510.12872 (2025)

    Ye, H., Gao, Z., Ma, M., Wang, Q., Fu, Y., Chung, M.Y., Lin, Y., Liu, Z., Zhang, J., Zhuo, D., et al.: Kvcomm: Online cross-context kv-cache communication for efficient llm-based multi-agent systems. arXiv preprint arXiv:2510.12872 (2025)

  42. [42]

    Transportation Research Part C: Emerging Technologies183, 105457 (2026)

    You, J., Jiang, Z., Huang, Z., Shi, H., Gan, R., Wu, K., Cheng, X., Li, X., Ran, B.: V2x-vlm: End-to-end v2x cooperative autonomous driving through large vision- language models. Transportation Research Part C: Emerging Technologies183, 105457 (2026)

  43. [43]

    Advances in Neural Information Processing Systems37, 132208–132237 (2024)

    Zhang, Y., Sun, R., Chen, Y., Pfister, T., Zhang, R., Arik, S.: Chain of agents: Large language models collaborating on long-context tasks. Advances in Neural Information Processing Systems37, 132208–132237 (2024)

  44. [44]

    Scaling Latent Reasoning via Looped Language Models

    Zhu, R.J., Wang, Z., Hua, K., Zhang, T., Li, Z., Que, H., Wei, B., Wen, Z., Yin, F., Xing, H., et al.: Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741 (2025) 18 T.Chen et al

  45. [45]

    Zou, J., Yang, X., Qiu, R., Li, G., Tieu, K., Lu, P., Shen, K., Tong, H., Choi, Y., He, J., et al.: Latent collaboration in multi-agent systems. arXiv preprint arXiv:2511.20639 (2025) LACO 1 LACO: Adaptive Latent Communication for Collaborative Driving Supplementary Material This supplementary material provides additional details to support the main paper...