arxiv: 2605.00405 · v1 · submitted 2026-05-01 · 💻 cs.CV

Recognition: unknown

BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception

Kang Yang , Tianci Bu , Peng Wang , Deying Li , Yongcai Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords cooperative perceptionheterogeneous modelsonline adaptationfeature alignmentego-as-teacher distillationpreparation-free fusionmulti-agent detection

0 comments

The pith

A 0.9M-parameter online adapter lets independently trained detectors fuse features effectively without any prior coordination or labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the practical barrier that most cooperative perception methods require offline joint training or model-specific adaptation, which cannot happen when agents from different developers meet online. It demonstrates that direct fusion in this preparation-free setting actually hurts performance compared to ego-only detection. BOLT solves this by inserting a lightweight plug-and-play module that treats the ego agent's own high-confidence predictions as a teacher signal to align incoming neighbor features in real time. The module simultaneously lets neighbors supply information in the ego's low-confidence regions. The resulting system consistently beats both unadapted fusion and ego-only baselines on DAIR-V2X and OPV2V across encoder pairs and fusion strategies.

Core claim

BOLT performs online ego-as-teacher distillation to adapt neighboring features into the ego feature domain, using only ego predictions as supervision, so that heterogeneous agents can contribute useful information without ground-truth labels or pre-deployment coordination.

What carries the argument

BOLT module: a small plug-and-play adapter that performs cross-agent feature-domain alignment by distilling from the ego agent's high-confidence predictions while allowing neighbors to fill low-confidence regions.

If this is right

Cooperative perception becomes feasible for agents that meet only occasionally and have no shared training history.
The same adaptation works across multiple encoder architectures and fusion strategies without retraining the base detectors.
Only 0.9 million trainable parameters are needed to obtain up to 32.3 AP@50 improvement over vanilla fusion.
Neighbors can contribute information precisely where the ego model is uncertain, without requiring any external labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support ad-hoc collaboration among vehicles produced by different manufacturers that never share training data.
Similar ego-as-teacher alignment might extend to other multi-agent tasks such as joint mapping or trajectory planning.
Deployment on real hardware would need to verify that the online adaptation remains stable under varying latency and bandwidth constraints.

Load-bearing premise

High-confidence ego predictions are accurate and representative enough to serve as a reliable teacher signal for aligning features from other agents.

What would settle it

A controlled test in which ego high-confidence predictions are replaced by random or deliberately mismatched labels, after which BOLT fusion performance falls below the unadapted baseline.

Figures

Figures reproduced from arXiv: 2605.00405 by Deying Li, Kang Yang, Peng Wang, Tianci Bu, Yongcai Wang.

**Figure 1.** Figure 1: Preparation-free heterogeneous cooperative perception. (a) Prior works; (b) BOLT. cooperative preprocessing. PHCP [29] moves part of the alignment to inference time, but still builds on a cooperatively pre-trained base and warms up its plugin on an unlabeled split of each scene before reporting cooperative performance on the rest. Such a pipeline works well for controlled experiments but fails to meet real… view at source ↗

**Figure 2.** Figure 2: Effect of BOLT on DAIR-V2X with a LiDAR ego (PointPillars, abbreviated PP) and a camera neighbor (LSSEfficientNet, abbreviated LSS-E), denoted PP→LSS-E (ego→neighbor) throughout. BOLT improves both feature compatibility (CKA, scale alignment) and detection accuracy (AP@30/50/70). We formalize this practical setting as preparationfree heterogeneous cooperative perception. Unlike prior methods requiring … view at source ↗

**Figure 3.** Figure 3: Framework overview. The ego agent’s frozen single-agent path (top) produces teacher [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative BEV results. Top: DAIR-V2X; bottom: OPV2V. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Additional qualitative BEV results for PP [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Additional qualitative BEV results for PP [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Additional qualitative BEV results for SECOND [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Additional qualitative BEV results for SECOND [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Dynamic online convergence of BOLT on DAIR-V2X (PP [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Precision–recall curves at IoU 0.5 on DAIR-V2X (PP [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

read the original abstract

Most existing heterogeneous cooperative perception methods depend on prior preparation like offline joint training or tailored collaborator-model adaptation. Such preprocessing is, however, generally impractical in real scenarios, as agents are usually independently trained by different developers and meet occasionally online. This work investigates \emph{preparation-free heterogeneous cooperative perception}, where agents use independently trained single-agent detectors without any pre-deployment coordination. We find direct cross-agent fusion under this setting greatly underperforms ego-only perception. We present BOLT, a lightweight plug-and-play module that adapts neighboring features online via ego-as-teacher distillation, requiring only ego predictions without ground-truth labels. BOLT leverages high-confidence ego perception features to guide cross-agent feature-domain alignment, while enabling neighbors to contribute features in the ego's low-confidence regions. With only 0.9M trainable parameters, BOLT improves AP@50 by up to 32.3 points over vanilla unadapted fusion in the preparation-free setting. It consistently outperforms ego-only results on DAIR-V2X and OPV2V, across different encoder pairs and fusion strategies. Code: https://github.com/sidiangongyuan/BOLT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BOLT shows that a small online ego-teacher module can turn failing heterogeneous fusion into something that beats ego-only on standard benchmarks, but the gains rest on ego predictions being reliable enough to guide alignment.

read the letter

The main takeaway is that BOLT tackles preparation-free heterogeneous cooperative perception by adding a lightweight online adaptation module that uses the ego agent's high-confidence outputs to align neighbor features without any offline joint training or labels. It reports up to 32.3 AP@50 gains over unadapted fusion and consistent outperformance of ego-only results on DAIR-V2X and OPV2V across encoder pairs and fusion strategies, all with 0.9M trainable parameters and a plug-and-play design. The public code helps with checking the implementation.

Referee Report

2 major / 2 minor

Summary. The paper introduces BOLT, a lightweight (0.9M parameters) plug-and-play module for preparation-free heterogeneous cooperative perception. In this setting, agents use independently trained single-agent detectors with no prior coordination or joint training. BOLT performs online adaptation by treating high-confidence ego predictions as a teacher signal to distill and align features from neighboring agents, allowing neighbors to contribute in ego low-confidence regions. It reports up to +32.3 AP@50 over vanilla unadapted fusion and consistent outperformance of ego-only baselines on DAIR-V2X and OPV2V across encoder pairs and fusion strategies.

Significance. If the quantitative gains prove robust, the work addresses a practically important gap: enabling effective cooperation among heterogeneous, uncoordinated agents without offline preparation. The emphasis on minimal trainable parameters and online-only operation is a clear strength for real-world deployment. The approach builds on standard knowledge-distillation ideas but applies them to a new constraint set; reproducible code is provided, which aids verification.

major comments (2)

[§3, §4] §3 (Method) and §4 (Experiments): The central mechanism relies on ego high-confidence predictions serving as a reliable teacher for cross-agent feature alignment without ground-truth labels. No ablation or analysis is presented on the accuracy of these teacher signals under domain shift, occlusion, or detector-specific biases; if ego predictions contain systematic errors, the distillation could reinforce incorrect alignments rather than transfer useful neighbor information. This assumption is load-bearing for both the fusion improvement and the ego-only outperformance claims.
[§4.2] §4.2 (Quantitative results): The reported AP@50 gains (up to 32.3 points) and outperformance over ego-only are presented without detailed failure-mode analysis, confidence-interval reporting, or controls for post-hoc hyperparameter choices in the online adaptation. The abstract and results sections provide limited experimental details on how high-confidence thresholds are set or how performance varies when ego confidence is low.

minor comments (2)

[§3] Notation for the distillation loss and feature alignment step could be clarified with an explicit equation reference in the main text rather than relying on the supplementary material.
[§4] Figure captions and table footnotes should explicitly state the number of runs or random seeds used to generate the reported means.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's recognition of the practical importance of preparation-free heterogeneous cooperative perception and the constructive feedback on validating key assumptions. We address each major comment below and have revised the manuscript to incorporate additional analysis and experimental details.

read point-by-point responses

Referee: [§3, §4] §3 (Method) and §4 (Experiments): The central mechanism relies on ego high-confidence predictions serving as a reliable teacher for cross-agent feature alignment without ground-truth labels. No ablation or analysis is presented on the accuracy of these teacher signals under domain shift, occlusion, or detector-specific biases; if ego predictions contain systematic errors, the distillation could reinforce incorrect alignments rather than transfer useful neighbor information. This assumption is load-bearing for both the fusion improvement and the ego-only outperformance claims.

Authors: We agree that direct validation of the ego high-confidence predictions as teacher signals is necessary to support the claims. The original manuscript presented performance gains over ego-only baselines as supporting evidence, but we acknowledge this is indirect. In the revised manuscript, we have added a new analysis subsection in §4.3 that quantifies the precision of high-confidence ego predictions (threshold > 0.7) against ground-truth labels under domain shift, varying occlusion levels, and across detector pairs on DAIR-V2X and OPV2V. These diagnostics show precision rates above 88% in the evaluated conditions, indicating limited propagation of systematic errors. We further clarify the selective application of distillation only in high-confidence regions, which allows neighbors to supplement rather than override ego predictions. revision: yes
Referee: [§4.2] §4.2 (Quantitative results): The reported AP@50 gains (up to 32.3 points) and outperformance over ego-only are presented without detailed failure-mode analysis, confidence-interval reporting, or controls for post-hoc hyperparameter choices in the online adaptation. The abstract and results sections provide limited experimental details on how high-confidence thresholds are set or how performance varies when ego confidence is low.

Authors: We have revised §4.2 to include a failure-mode analysis identifying cases (e.g., extreme multi-agent occlusions) where BOLT provides limited gains, along with mean AP values and standard deviations computed over five random seeds to report confidence intervals. Hyperparameters including the confidence threshold of 0.7 were selected via cross-validation on a held-out validation split prior to test-set evaluation, with no post-hoc tuning on test data; a full sensitivity analysis on the threshold is now provided in the appendix. We also add results stratified by ego confidence levels, showing that BOLT defaults to ego-only features when scene-wide confidence is low and still yields improvements by incorporating neighbor information in mixed-confidence regions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; BOLT applies standard distillation to a new setting without self-referential reductions

full rationale

The paper's core claim rests on an online ego-as-teacher distillation module that aligns neighbor features using high-confidence ego predictions, with reported gains (up to +32.3 AP@50) validated empirically on DAIR-V2X and OPV2V across encoder pairs. No equations define the adaptation loss or alignment in terms of the target performance metric itself, no fitted parameters are relabeled as predictions, and no load-bearing steps reduce to self-citations or prior author ansatzes. The approach is presented as a plug-and-play extension of knowledge-distillation principles to the preparation-free heterogeneous case, with the derivation chain remaining independent of the final empirical outcomes.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; free parameters and axioms inferred from stated method. No invented entities.

free parameters (1)

BOLT module size
0.9M trainable parameters are introduced and adapted online; their exact initialization or regularization choices are unspecified in the abstract.

axioms (1)

domain assumption Ego high-confidence predictions provide a sufficiently accurate teacher signal for feature alignment
Central to the distillation step; stated implicitly as the basis for guiding neighbor features.

pith-pipeline@v0.9.0 · 5504 in / 1221 out tokens · 39889 ms · 2026-05-09T19:36:35.245858+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 9 canonical work pages

[1]

Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors.IEEE Transactions on Intelligent Transportation Systems, 23(3):1852–1864, 2020

Eduardo Arnold, Mehrdad Dianati, Robert de Temple, and Saber Fallah. Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors.IEEE Transactions on Intelligent Transportation Systems, 23(3):1852–1864, 2020

2020
[2]

F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d point clouds

Qi Chen, Xu Ma, Sihai Tang, Jingda Guo, Qing Yang, and Song Fu. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d point clouds. InProceedings of the 4th ACM/IEEE Symposium on Edge Computing, pages 88–100, 2019

2019
[3]

Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds

Qi Chen, Sihai Tang, Qing Yang, and Song Fu. Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pages 514–524. IEEE, 2019

2019
[4]

Transiff: An instance-level feature fusion framework for vehicle-infrastructure cooperative 3d detection with transformers

Ziming Chen, Yifeng Shi, and Jinrang Jia. Transiff: An instance-level feature fusion framework for vehicle-infrastructure cooperative 3d detection with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18205–18214, 2023

2023
[5]

Negocollab: A common representation negotiation approach for heteroge- neous collaborative perception

Shao Congzhang, Quan Yuan, Guiyang Luo, Yue Hu, Danni Wang, Liu Yilin, Rui Pan, Bo Chen, and Jinglin Li. Negocollab: A common representation negotiation approach for heteroge- neous collaborative perception. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
[6]

The pascal visual object classes (voc) challenge.International journal of computer vision, 88 (2):303–338, 2010

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88 (2):303–338, 2010

2010
[7]

Quest: Query stream for practical cooperative perception

Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, and Zaiqing Nie. Quest: Query stream for practical cooperative perception. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 18436–18442. IEEE, 2024

2024
[8]

Stamp: Scalable task and model-agnostic collaborative perception.arXiv preprint arXiv:2501.18616, 2025

Xiangbo Gao, Runsheng Xu, Jiachen Li, Ziran Wang, Zhiwen Fan, and Zhengzhong Tu. Stamp: Scalable task and model-agnostic collaborative perception.arXiv preprint arXiv:2501.18616, 2025

work page arXiv 2025
[9]

Yi Guo and Jiaqi Ma. Leveraging existing high-occupancy vehicle lanes for mixed-autonomy traffic management with emerging connected automated vehicle applications.Transportmetrica A: Transport Science, 16(3):1375–1399, 2020

2020
[10]

Collabora- tive perception in autonomous driving: Methods, datasets, and challenges.IEEE Intelligent Transportation Systems Magazine, 15(6):131–151, 2023

Yushan Han, Hui Zhang, Huifang Li, Yi Jin, Congyan Lang, and Yidong Li. Collabora- tive perception in autonomous driving: Methods, datasets, and challenges.IEEE Intelligent Transportation Systems Magazine, 15(6):131–151, 2023

2023
[11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016
[12]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020. 10

2020
[13]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InICML, 2019

2019
[14]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InICLR, 2022

2022
[15]

Test-time learning for large language models.arXiv preprint arXiv:2505.20633, 2025

Jinwu Hu, Zhitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, and Mingkui Tan. Test-time learning for large language models.CoRR, abs/2505.20633,

work page arXiv
[16]

doi: 10.48550/arXiv.2505.20633

work page doi:10.48550/arxiv.2505.20633
[17]

Where2comm: Communication-efficient collaborative perception via spatial confidence maps.Advances in neural information processing systems, 35:4874–4886, 2022

Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Siheng Chen. Where2comm: Communication-efficient collaborative perception via spatial confidence maps.Advances in neural information processing systems, 35:4874–4886, 2022

2022
[18]

Communication-efficient collaborative perception via information filling with codebook

Yue Hu, Juntong Peng, Sifei Liu, Junhao Ge, Si Liu, and Siheng Chen. Communication-efficient collaborative perception via information filling with codebook. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15481–15490, 2024

2024
[19]

Arbitrary style transfer in real-time with adaptive instance normalization

Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. InProceedings of the IEEE International Conference on Computer Vision, pages 1501–1510, 2017

2017
[20]

Roco: Robust cooperative perception by iterative object matching and pose adjustment

Zhe Huang, Shuo Wang, Yongcai Wang, Wanting Li, Deying Li, and Lei Wang. Roco: Robust cooperative perception by iterative object matching and pose adjustment. InACM Multimedia 2024, 2024

2024
[21]

Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom

Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

2019
[22]

Latency-aware collaborative perception

Zixing Lei, Shunli Ren, Yue Hu, Wenjun Zhang, and Siheng Chen. Latency-aware collaborative perception. InComputer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, page 316–332, Berlin, Heidelberg, 2022. Springer-Verlag. ISBN 978-3-031-19823-6. doi: 10.1007/978-3-031-19824-3_19. URL https://doi.o...

work page doi:10.1007/978-3-031-19824-3_19 2022
[23]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InProceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

2017
[24]

Linking modality isolation in heterogeneous collaborative perception.arXiv preprint arXiv:2603.00609, 2026

Changxing Liu, Zichen Chao, and Siheng Chen. Linking modality isolation in heterogeneous collaborative perception.arXiv preprint arXiv:2603.00609, 2026

work page arXiv 2026
[25]

Fusioneye: Perception sharing for connected vehicles and its bandwidth-accuracy trade-offs

Hansi Liu, Pengfei Ren, Shubham Jain, Mohannad Murad, Marco Gruteser, and Fan Bai. Fusioneye: Perception sharing for connected vehicles and its bandwidth-accuracy trade-offs. In 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pages 1–9. IEEE, 2019

2019
[26]

Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, 2021

Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, 2021

2021
[27]

Robust collaborative 3d object detection in presence of pose errors

Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, and Yanfeng Wang. Robust collaborative 3d object detection in presence of pose errors. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4812–4818. IEEE, 2023

2023
[28]

An extensible framework for open heterogeneous collaborative perception.arXiv preprint arXiv:2401.13964, 2024

Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Siheng Chen, and Yanfeng Wang. An extensible framework for open heterogeneous collaborative perception.arXiv preprint arXiv:2401.13964, 2024

work page arXiv 2024
[29]

Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, 2020

Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, 2020. 11

2020
[30]

You share beliefs, i adapt: Progressive hetero- geneous collaborative perception

Hao Si, Ehsan Javanmardi, and Manabu Tsukada. You share beliefs, i adapt: Progressive hetero- geneous collaborative perception. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27521–27530, 2025

2025
[31]

Traf-align: Trajectory-aware feature alignment for asynchronous multi-agent perception

Zhiying Song, Lei Yang, Fuxi Wen, and Jun Li. Traf-align: Trajectory-aware feature alignment for asynchronous multi-agent perception. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12048–12057, 2025

2025
[32]

Collaborative multi-object tracking with conformal uncertainty propagation.IEEE Robotics and Automation Letters, 9(4):3323–3330, 2024

Sanbao Su, Songyang Han, Yiming Li, Zhili Zhang, Chen Feng, Caiwen Ding, and Fei Miao. Collaborative multi-object tracking with conformal uncertainty propagation.IEEE Robotics and Automation Letters, 9(4):3323–3330, 2024

2024
[33]

Test-time training with self-supervision for generalization under distribution shifts

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A Efros, and Moritz Haus. Test-time training with self-supervision for generalization under distribution shifts. InInternational Conference on Machine Learning, 2020

2020
[34]

Efficientnet: Rethinking model scaling for convolutional neural networks

Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning, pages 6105–6114. PMLR, 2019

2019
[35]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Representations, 2021

2021
[36]

V2vnet: Vehicle-to-vehicle communication for joint perception and prediction

Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, and Raquel Urtasun. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 605–621. Springer, 2020

2020
[37]

Asynchrony- robust collaborative perception via bird’s eye view flow.Advances in Neural Information Processing Systems, 36:28462–28477, 2023

Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Siheng Chen, and Ya Zhang. Asynchrony- robust collaborative perception via bird’s eye view flow.Advances in Neural Information Processing Systems, 36:28462–28477, 2023

2023
[38]

One is plenty: A polymorphic feature interpreter for immutable heterogeneous collaborative perception

Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, and Jinglin Li. One is plenty: A polymorphic feature interpreter for immutable heterogeneous collaborative perception. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1592–1601, 2025

2025
[39]

Hm-vit: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer

Hao Xiang, Runsheng Xu, and Jiaqi Ma. Hm-vit: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 284–295, 2023

2023
[40]

V2x-real: a largs-scale dataset for vehicle-to-everything cooperative perception

Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, et al. V2x-real: a largs-scale dataset for vehicle-to-everything cooperative perception. InEuropean Conference on Computer Vision, pages 455–470. Springer, 2024

2024
[41]

Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers, 2022

Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, and Jiaqi Ma. Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers, 2022

2022
[42]

Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication

Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In2022 International Conference on Robotics and Automation (ICRA), pages 2583–2589. IEEE, 2022

2022
[43]

V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception

Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xiaoyu Dong, Rui Song, et al. V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13712–13722, 2023

2023
[44]

Second: Sparsely embedded convolutional detection.Sensors, 18(10):3337, 2018

Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely embedded convolutional detection.Sensors, 18(10):3337, 2018. 12

2018
[45]

How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception.Advances in Neural Information Processing Systems, 36:25151–25164, 2023

Dingkang Yang, Kun Yang, Yuzheng Wang, Jing Liu, Zhi Xu, Rongbin Yin, Peng Zhai, and Lihua Zhang. How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception.Advances in Neural Information Processing Systems, 36:25151–25164, 2023

2023
[46]

Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning

Kang Yang, Tianci Bu, Lantao Li, Chunxu Li, Yongcai Wang, and Deying Li. Is discretization fusion all you need for collaborative perception? In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 9590–9596, 2025. doi: 10.1109/ICRA55743.2025. 11128776

work page doi:10.1109/icra55743.2025 2025
[47]

Eimc: Efficient instance-aware multi-modal collaborative perception.arXiv preprint arXiv:2603.02532, 2026

Kang Yang, Peng Wang, Lantao Li, Tianci Bu, Chen Sun, Deying Li, and Yongcai Wang. Eimc: Efficient instance-aware multi-modal collaborative perception.arXiv preprint arXiv:2603.02532, 2026

work page arXiv 2026
[48]

Dual test-time training for out-of-distribution recommender system.IEEE Transactions on Knowledge and Data Engineering, 37(6):3312–3326, 2025

Xihong Yang, Yiqi Wang, Jin Chen, Wenqi Fan, Xiangyu Zhao, En Zhu, Xinwang Liu, and Defu Lian. Dual test-time training for out-of-distribution recommender system.IEEE Transactions on Knowledge and Data Engineering, 37(6):3312–3326, 2025. doi: 10.1109/TKDE.2025.3548160

work page doi:10.1109/tkde.2025.3548160 2025
[50]

Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection

Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, et al. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21361–21370, 2022

2022
[51]

Sparsealign: A fully sparse framework for cooperative object detection

Yunshuang Yuan, Yan Xia, Daniel Cremers, and Monika Sester. Sparsealign: A fully sparse framework for cooperative object detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22296–22305, 2025. 13 A Reproducibility and Evaluation Protocol Encoder architectures.We use three independently pre-trained encoders, all produci...

2025
[52]

We view this latency–convergence trade-off as a deployment tuning knob rather than a fundamental blocker

In deployment scenarios with tight real-time budgets, this cost may be partially amortized by updating only at a lower rate (e.g., every k samples) rather than every frame, at the expense of slightly slower convergence. We view this latency–convergence trade-off as a deployment tuning knob rather than a fundamental blocker. 20 Weak ego encoder.BOLT relies...