Recognition: unknown
BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception
Pith reviewed 2026-05-09 19:36 UTC · model grok-4.3
The pith
A 0.9M-parameter online adapter lets independently trained detectors fuse features effectively without any prior coordination or labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BOLT performs online ego-as-teacher distillation to adapt neighboring features into the ego feature domain, using only ego predictions as supervision, so that heterogeneous agents can contribute useful information without ground-truth labels or pre-deployment coordination.
What carries the argument
BOLT module: a small plug-and-play adapter that performs cross-agent feature-domain alignment by distilling from the ego agent's high-confidence predictions while allowing neighbors to fill low-confidence regions.
If this is right
- Cooperative perception becomes feasible for agents that meet only occasionally and have no shared training history.
- The same adaptation works across multiple encoder architectures and fusion strategies without retraining the base detectors.
- Only 0.9 million trainable parameters are needed to obtain up to 32.3 AP@50 improvement over vanilla fusion.
- Neighbors can contribute information precisely where the ego model is uncertain, without requiring any external labels.
Where Pith is reading between the lines
- The approach could support ad-hoc collaboration among vehicles produced by different manufacturers that never share training data.
- Similar ego-as-teacher alignment might extend to other multi-agent tasks such as joint mapping or trajectory planning.
- Deployment on real hardware would need to verify that the online adaptation remains stable under varying latency and bandwidth constraints.
Load-bearing premise
High-confidence ego predictions are accurate and representative enough to serve as a reliable teacher signal for aligning features from other agents.
What would settle it
A controlled test in which ego high-confidence predictions are replaced by random or deliberately mismatched labels, after which BOLT fusion performance falls below the unadapted baseline.
Figures
read the original abstract
Most existing heterogeneous cooperative perception methods depend on prior preparation like offline joint training or tailored collaborator-model adaptation. Such preprocessing is, however, generally impractical in real scenarios, as agents are usually independently trained by different developers and meet occasionally online. This work investigates \emph{preparation-free heterogeneous cooperative perception}, where agents use independently trained single-agent detectors without any pre-deployment coordination. We find direct cross-agent fusion under this setting greatly underperforms ego-only perception. We present BOLT, a lightweight plug-and-play module that adapts neighboring features online via ego-as-teacher distillation, requiring only ego predictions without ground-truth labels. BOLT leverages high-confidence ego perception features to guide cross-agent feature-domain alignment, while enabling neighbors to contribute features in the ego's low-confidence regions. With only 0.9M trainable parameters, BOLT improves AP@50 by up to 32.3 points over vanilla unadapted fusion in the preparation-free setting. It consistently outperforms ego-only results on DAIR-V2X and OPV2V, across different encoder pairs and fusion strategies. Code: https://github.com/sidiangongyuan/BOLT.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces BOLT, a lightweight (0.9M parameters) plug-and-play module for preparation-free heterogeneous cooperative perception. In this setting, agents use independently trained single-agent detectors with no prior coordination or joint training. BOLT performs online adaptation by treating high-confidence ego predictions as a teacher signal to distill and align features from neighboring agents, allowing neighbors to contribute in ego low-confidence regions. It reports up to +32.3 AP@50 over vanilla unadapted fusion and consistent outperformance of ego-only baselines on DAIR-V2X and OPV2V across encoder pairs and fusion strategies.
Significance. If the quantitative gains prove robust, the work addresses a practically important gap: enabling effective cooperation among heterogeneous, uncoordinated agents without offline preparation. The emphasis on minimal trainable parameters and online-only operation is a clear strength for real-world deployment. The approach builds on standard knowledge-distillation ideas but applies them to a new constraint set; reproducible code is provided, which aids verification.
major comments (2)
- [§3, §4] §3 (Method) and §4 (Experiments): The central mechanism relies on ego high-confidence predictions serving as a reliable teacher for cross-agent feature alignment without ground-truth labels. No ablation or analysis is presented on the accuracy of these teacher signals under domain shift, occlusion, or detector-specific biases; if ego predictions contain systematic errors, the distillation could reinforce incorrect alignments rather than transfer useful neighbor information. This assumption is load-bearing for both the fusion improvement and the ego-only outperformance claims.
- [§4.2] §4.2 (Quantitative results): The reported AP@50 gains (up to 32.3 points) and outperformance over ego-only are presented without detailed failure-mode analysis, confidence-interval reporting, or controls for post-hoc hyperparameter choices in the online adaptation. The abstract and results sections provide limited experimental details on how high-confidence thresholds are set or how performance varies when ego confidence is low.
minor comments (2)
- [§3] Notation for the distillation loss and feature alignment step could be clarified with an explicit equation reference in the main text rather than relying on the supplementary material.
- [§4] Figure captions and table footnotes should explicitly state the number of runs or random seeds used to generate the reported means.
Simulated Author's Rebuttal
We appreciate the referee's recognition of the practical importance of preparation-free heterogeneous cooperative perception and the constructive feedback on validating key assumptions. We address each major comment below and have revised the manuscript to incorporate additional analysis and experimental details.
read point-by-point responses
-
Referee: [§3, §4] §3 (Method) and §4 (Experiments): The central mechanism relies on ego high-confidence predictions serving as a reliable teacher for cross-agent feature alignment without ground-truth labels. No ablation or analysis is presented on the accuracy of these teacher signals under domain shift, occlusion, or detector-specific biases; if ego predictions contain systematic errors, the distillation could reinforce incorrect alignments rather than transfer useful neighbor information. This assumption is load-bearing for both the fusion improvement and the ego-only outperformance claims.
Authors: We agree that direct validation of the ego high-confidence predictions as teacher signals is necessary to support the claims. The original manuscript presented performance gains over ego-only baselines as supporting evidence, but we acknowledge this is indirect. In the revised manuscript, we have added a new analysis subsection in §4.3 that quantifies the precision of high-confidence ego predictions (threshold > 0.7) against ground-truth labels under domain shift, varying occlusion levels, and across detector pairs on DAIR-V2X and OPV2V. These diagnostics show precision rates above 88% in the evaluated conditions, indicating limited propagation of systematic errors. We further clarify the selective application of distillation only in high-confidence regions, which allows neighbors to supplement rather than override ego predictions. revision: yes
-
Referee: [§4.2] §4.2 (Quantitative results): The reported AP@50 gains (up to 32.3 points) and outperformance over ego-only are presented without detailed failure-mode analysis, confidence-interval reporting, or controls for post-hoc hyperparameter choices in the online adaptation. The abstract and results sections provide limited experimental details on how high-confidence thresholds are set or how performance varies when ego confidence is low.
Authors: We have revised §4.2 to include a failure-mode analysis identifying cases (e.g., extreme multi-agent occlusions) where BOLT provides limited gains, along with mean AP values and standard deviations computed over five random seeds to report confidence intervals. Hyperparameters including the confidence threshold of 0.7 were selected via cross-validation on a held-out validation split prior to test-set evaluation, with no post-hoc tuning on test data; a full sensitivity analysis on the threshold is now provided in the appendix. We also add results stratified by ego confidence levels, showing that BOLT defaults to ego-only features when scene-wide confidence is low and still yields improvements by incorporating neighbor information in mixed-confidence regions. revision: yes
Circularity Check
No significant circularity; BOLT applies standard distillation to a new setting without self-referential reductions
full rationale
The paper's core claim rests on an online ego-as-teacher distillation module that aligns neighbor features using high-confidence ego predictions, with reported gains (up to +32.3 AP@50) validated empirically on DAIR-V2X and OPV2V across encoder pairs. No equations define the adaptation loss or alignment in terms of the target performance metric itself, no fitted parameters are relabeled as predictions, and no load-bearing steps reduce to self-citations or prior author ansatzes. The approach is presented as a plug-and-play extension of knowledge-distillation principles to the preparation-free heterogeneous case, with the derivation chain remaining independent of the final empirical outcomes.
Axiom & Free-Parameter Ledger
free parameters (1)
- BOLT module size
axioms (1)
- domain assumption Ego high-confidence predictions provide a sufficiently accurate teacher signal for feature alignment
Reference graph
Works this paper leans on
-
[1]
Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors.IEEE Transactions on Intelligent Transportation Systems, 23(3):1852–1864, 2020
Eduardo Arnold, Mehrdad Dianati, Robert de Temple, and Saber Fallah. Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors.IEEE Transactions on Intelligent Transportation Systems, 23(3):1852–1864, 2020
2020
-
[2]
F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d point clouds
Qi Chen, Xu Ma, Sihai Tang, Jingda Guo, Qing Yang, and Song Fu. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d point clouds. InProceedings of the 4th ACM/IEEE Symposium on Edge Computing, pages 88–100, 2019
2019
-
[3]
Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds
Qi Chen, Sihai Tang, Qing Yang, and Song Fu. Cooper: Cooperative perception for connected autonomous vehicles based on 3d point clouds. In2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pages 514–524. IEEE, 2019
2019
-
[4]
Transiff: An instance-level feature fusion framework for vehicle-infrastructure cooperative 3d detection with transformers
Ziming Chen, Yifeng Shi, and Jinrang Jia. Transiff: An instance-level feature fusion framework for vehicle-infrastructure cooperative 3d detection with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18205–18214, 2023
2023
-
[5]
Negocollab: A common representation negotiation approach for heteroge- neous collaborative perception
Shao Congzhang, Quan Yuan, Guiyang Luo, Yue Hu, Danni Wang, Liu Yilin, Rui Pan, Bo Chen, and Jinglin Li. Negocollab: A common representation negotiation approach for heteroge- neous collaborative perception. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
-
[6]
The pascal visual object classes (voc) challenge.International journal of computer vision, 88 (2):303–338, 2010
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88 (2):303–338, 2010
2010
-
[7]
Quest: Query stream for practical cooperative perception
Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, and Zaiqing Nie. Quest: Query stream for practical cooperative perception. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 18436–18442. IEEE, 2024
2024
-
[8]
Xiangbo Gao, Runsheng Xu, Jiachen Li, Ziran Wang, Zhiwen Fan, and Zhengzhong Tu. Stamp: Scalable task and model-agnostic collaborative perception.arXiv preprint arXiv:2501.18616, 2025
-
[9]
Yi Guo and Jiaqi Ma. Leveraging existing high-occupancy vehicle lanes for mixed-autonomy traffic management with emerging connected automated vehicle applications.Transportmetrica A: Transport Science, 16(3):1375–1399, 2020
2020
-
[10]
Collabora- tive perception in autonomous driving: Methods, datasets, and challenges.IEEE Intelligent Transportation Systems Magazine, 15(6):131–151, 2023
Yushan Han, Hui Zhang, Huifang Li, Yi Jin, Congyan Lang, and Yidong Li. Collabora- tive perception in autonomous driving: Methods, datasets, and challenges.IEEE Intelligent Transportation Systems Magazine, 15(6):131–151, 2023
2023
-
[11]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016
2016
-
[12]
Momentum contrast for unsupervised visual representation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020. 10
2020
-
[13]
Parameter-efficient transfer learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InICML, 2019
2019
-
[14]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InICLR, 2022
2022
-
[15]
Test-time learning for large language models.arXiv preprint arXiv:2505.20633, 2025
Jinwu Hu, Zhitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, and Mingkui Tan. Test-time learning for large language models.CoRR, abs/2505.20633,
-
[16]
doi: 10.48550/arXiv.2505.20633
-
[17]
Where2comm: Communication-efficient collaborative perception via spatial confidence maps.Advances in neural information processing systems, 35:4874–4886, 2022
Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Siheng Chen. Where2comm: Communication-efficient collaborative perception via spatial confidence maps.Advances in neural information processing systems, 35:4874–4886, 2022
2022
-
[18]
Communication-efficient collaborative perception via information filling with codebook
Yue Hu, Juntong Peng, Sifei Liu, Junhao Ge, Si Liu, and Siheng Chen. Communication-efficient collaborative perception via information filling with codebook. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15481–15490, 2024
2024
-
[19]
Arbitrary style transfer in real-time with adaptive instance normalization
Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. InProceedings of the IEEE International Conference on Computer Vision, pages 1501–1510, 2017
2017
-
[20]
Roco: Robust cooperative perception by iterative object matching and pose adjustment
Zhe Huang, Shuo Wang, Yongcai Wang, Wanting Li, Deying Li, and Lei Wang. Roco: Robust cooperative perception by iterative object matching and pose adjustment. InACM Multimedia 2024, 2024
2024
-
[21]
Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom
Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
2019
-
[22]
Latency-aware collaborative perception
Zixing Lei, Shunli Ren, Yue Hu, Wenjun Zhang, and Siheng Chen. Latency-aware collaborative perception. InComputer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, page 316–332, Berlin, Heidelberg, 2022. Springer-Verlag. ISBN 978-3-031-19823-6. doi: 10.1007/978-3-031-19824-3_19. URL https://doi.o...
-
[23]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InProceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017
2017
-
[24]
Changxing Liu, Zichen Chao, and Siheng Chen. Linking modality isolation in heterogeneous collaborative perception.arXiv preprint arXiv:2603.00609, 2026
-
[25]
Fusioneye: Perception sharing for connected vehicles and its bandwidth-accuracy trade-offs
Hansi Liu, Pengfei Ren, Shubham Jain, Mohannad Murad, Marco Gruteser, and Fan Bai. Fusioneye: Perception sharing for connected vehicles and its bandwidth-accuracy trade-offs. In 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pages 1–9. IEEE, 2019
2019
-
[26]
Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, 2021
Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. Ttt++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, 2021
2021
-
[27]
Robust collaborative 3d object detection in presence of pose errors
Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, and Yanfeng Wang. Robust collaborative 3d object detection in presence of pose errors. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4812–4818. IEEE, 2023
2023
-
[28]
Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Siheng Chen, and Yanfeng Wang. An extensible framework for open heterogeneous collaborative perception.arXiv preprint arXiv:2401.13964, 2024
-
[29]
Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, 2020
Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, 2020. 11
2020
-
[30]
You share beliefs, i adapt: Progressive hetero- geneous collaborative perception
Hao Si, Ehsan Javanmardi, and Manabu Tsukada. You share beliefs, i adapt: Progressive hetero- geneous collaborative perception. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27521–27530, 2025
2025
-
[31]
Traf-align: Trajectory-aware feature alignment for asynchronous multi-agent perception
Zhiying Song, Lei Yang, Fuxi Wen, and Jun Li. Traf-align: Trajectory-aware feature alignment for asynchronous multi-agent perception. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12048–12057, 2025
2025
-
[32]
Collaborative multi-object tracking with conformal uncertainty propagation.IEEE Robotics and Automation Letters, 9(4):3323–3330, 2024
Sanbao Su, Songyang Han, Yiming Li, Zhili Zhang, Chen Feng, Caiwen Ding, and Fei Miao. Collaborative multi-object tracking with conformal uncertainty propagation.IEEE Robotics and Automation Letters, 9(4):3323–3330, 2024
2024
-
[33]
Test-time training with self-supervision for generalization under distribution shifts
Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei A Efros, and Moritz Haus. Test-time training with self-supervision for generalization under distribution shifts. InInternational Conference on Machine Learning, 2020
2020
-
[34]
Efficientnet: Rethinking model scaling for convolutional neural networks
Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning, pages 6105–6114. PMLR, 2019
2019
-
[35]
Tent: Fully test-time adaptation by entropy minimization
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Representations, 2021
2021
-
[36]
V2vnet: Vehicle-to-vehicle communication for joint perception and prediction
Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, and Raquel Urtasun. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 605–621. Springer, 2020
2020
-
[37]
Asynchrony- robust collaborative perception via bird’s eye view flow.Advances in Neural Information Processing Systems, 36:28462–28477, 2023
Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Siheng Chen, and Ya Zhang. Asynchrony- robust collaborative perception via bird’s eye view flow.Advances in Neural Information Processing Systems, 36:28462–28477, 2023
2023
-
[38]
One is plenty: A polymorphic feature interpreter for immutable heterogeneous collaborative perception
Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, and Jinglin Li. One is plenty: A polymorphic feature interpreter for immutable heterogeneous collaborative perception. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1592–1601, 2025
2025
-
[39]
Hm-vit: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer
Hao Xiang, Runsheng Xu, and Jiaqi Ma. Hm-vit: Hetero-modal vehicle-to-vehicle cooperative perception with vision transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 284–295, 2023
2023
-
[40]
V2x-real: a largs-scale dataset for vehicle-to-everything cooperative perception
Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, et al. V2x-real: a largs-scale dataset for vehicle-to-everything cooperative perception. InEuropean Conference on Computer Vision, pages 455–470. Springer, 2024
2024
-
[41]
Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers, 2022
Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, and Jiaqi Ma. Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers, 2022
2022
-
[42]
Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication
Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In2022 International Conference on Robotics and Automation (ICRA), pages 2583–2589. IEEE, 2022
2022
-
[43]
V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception
Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xiaoyu Dong, Rui Song, et al. V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13712–13722, 2023
2023
-
[44]
Second: Sparsely embedded convolutional detection.Sensors, 18(10):3337, 2018
Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely embedded convolutional detection.Sensors, 18(10):3337, 2018. 12
2018
-
[45]
How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception.Advances in Neural Information Processing Systems, 36:25151–25164, 2023
Dingkang Yang, Kun Yang, Yuzheng Wang, Jing Liu, Zhi Xu, Rongbin Yin, Peng Zhai, and Lihua Zhang. How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception.Advances in Neural Information Processing Systems, 36:25151–25164, 2023
2023
-
[46]
Safe multi-agent navigation guided by goal- conditioned safe reinforcement learning
Kang Yang, Tianci Bu, Lantao Li, Chunxu Li, Yongcai Wang, and Deying Li. Is discretization fusion all you need for collaborative perception? In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 9590–9596, 2025. doi: 10.1109/ICRA55743.2025. 11128776
-
[47]
Kang Yang, Peng Wang, Lantao Li, Tianci Bu, Chen Sun, Deying Li, and Yongcai Wang. Eimc: Efficient instance-aware multi-modal collaborative perception.arXiv preprint arXiv:2603.02532, 2026
-
[48]
Xihong Yang, Yiqi Wang, Jin Chen, Wenqi Fan, Xiangyu Zhao, En Zhu, Xinwang Liu, and Defu Lian. Dual test-time training for out-of-distribution recommender system.IEEE Transactions on Knowledge and Data Engineering, 37(6):3312–3326, 2025. doi: 10.1109/TKDE.2025.3548160
-
[50]
Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection
Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, et al. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21361–21370, 2022
2022
-
[51]
Sparsealign: A fully sparse framework for cooperative object detection
Yunshuang Yuan, Yan Xia, Daniel Cremers, and Monika Sester. Sparsealign: A fully sparse framework for cooperative object detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22296–22305, 2025. 13 A Reproducibility and Evaluation Protocol Encoder architectures.We use three independently pre-trained encoders, all produci...
2025
-
[52]
We view this latency–convergence trade-off as a deployment tuning knob rather than a fundamental blocker
In deployment scenarios with tight real-time budgets, this cost may be partially amortized by updating only at a lower rate (e.g., every k samples) rather than every frame, at the expense of slightly slower convergence. We view this latency–convergence trade-off as a deployment tuning knob rather than a fundamental blocker. 20 Weak ego encoder.BOLT relies...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.