pith. sign in

arxiv: 2606.23046 · v1 · pith:UXAJRCK7new · submitted 2026-06-22 · 💻 cs.CV

UECP: Uncertainty-Enhanced Collaborative Perception

Pith reviewed 2026-06-26 09:04 UTC · model grok-4.3

classification 💻 cs.CV
keywords uncertainty mapcollaborative perceptionLiDAR point densityautonomous drivingfeature fusionperception qualitymulti-agent systems
0
0 comments X

The pith

An uncertainty map supervised by LiDAR point density supplies unbiased physical evidence for weighting each agent's contribution during collaborative fusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes an uncertainty map as a replacement for confidence maps in multi-agent perception for autonomous driving. The map is supervised directly by real-time LiDAR point density to evaluate perception quality without correlation to detection results. It supplies scenario-aware evidence that weights agent inputs during fusion. The UECP framework then embeds the map into Uncertainty-Aware Pyramid Fusion through uncertainty-weighted downsampling and uncertainty-guided residual fusion steps. This setup allows the fusion process to respond to actual sensor coverage patterns instead of potentially noisy detection scores.

Core claim

The paper establishes that an uncertainty map, directly supervised by LiDAR point density, functions as a physically grounded and unambiguous metric for perception quality that remains decoupled from detection noise. This metric supplies physical scenario-aware evidence for weighting agent contributions. The UECP framework centers on the Uncertainty-Aware Pyramid Fusion module, which applies a coarse-to-fine strategy consisting of Uncertainty-Weighted Downsampling for high-fidelity feature preservation and Uncertainty-Guided Residual Fusion to reinforce ego features while suppressing noise.

What carries the argument

The uncertainty map, a metric for perception quality directly supervised by LiDAR point density to provide scenario-aware weighting evidence independent of detection noise.

If this is right

  • Weighting of agent features relies on physical sensor signals rather than co-trained confidence scores.
  • The fusion process follows a coarse-to-fine strategy that preserves high-fidelity features from reliable agents.
  • Noise from agents with low sensor coverage is suppressed through uncertainty guidance during residual fusion.
  • Perception performance improves in robustness on real-world datasets where sensor density varies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar density-based supervision could be tested on other sensors if equivalent physical signals exist.
  • The map might allow systems to maintain performance when detection heads are deliberately simplified.
  • Edge cases with sudden density drops could serve as natural test points for the physical grounding claim.

Load-bearing premise

LiDAR point density supplies an unbiased signal of perception quality that remains independent of detection noise and introduces no new biases when used for weighting.

What would settle it

An experiment that artificially varies LiDAR point density while holding detection outputs fixed and observes no corresponding change in fusion performance or agent weighting.

Figures

Figures reproduced from arXiv: 2606.23046 by Deying Li, Kang Yang, Peng Wang, Tianci Bu, Wen Jie, Yongcai Wang.

Figure 1
Figure 1. Figure 1: The difference between traditional confidence map and the proposed uncer￾tainty map. The confidence map is co-learned with the classification head, while the uncertainty map is supervised by the LiDAR point density. 1 Introduction Collaborative perception is critical for autonomous driving, as it enables the ego agent to perceive non-line-of-sight hazards and thereby enhance driving safety [1, 13, 16, 27, … view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of false positives (FP) and false negatives (FN) between the pro￾posed UECP and HEAL on the DAIR-V2X dataset under different IoU thresholds (0.3/0.5/0.7). To address this noise issue, existing methods typically adopt confidence maps derived from the detector’s classification head to adaptively weight features dur￾ing fusion [7,16,18,34,50,56]. Nevertheless, these confidence maps are inherently c… view at source ↗
Figure 3
Figure 3. Figure 3: The overall pipeline of our proposed framework, UECP. The framework be￾gins by processing raw sensor data from each agent independently through a shared BEV Encoder to extract initial features. In parallel, an uncertainty head predicts a physically-grounded uncertainty map for each agent. This map serves as the key input to our Uncertainty-Aware Pyramid Fusion (UAPF) module. The resulting deeply-fused feat… view at source ↗
Figure 4
Figure 4. Figure 4: The overall pipeline of UAPF. Within the UAPF module, our Uncertainty￾Weighted Downsampling (UWD) mechanism first constructs a high-fidelity feature pyramid. Then, at each scale, an Uncertainty-Guided Residual Fusion (UGRF) block robustly integrates collaborative information. 3.3 Uncertainty Enhanced Collaborative Perception Building upon the proposed uncertainty map, we further present the Uncertainty Enh… view at source ↗
Figure 5
Figure 5. Figure 5: Uncertainty-Weighted Downsampling module. cropping, and ϵ = 10−8 ensures numerical stability. Finally, a 3 × 3 Gaussian blur [38] is applied to λ to promote spatial smoothness. Using the adaptive fusion weights λn, the ego agent’s fused representation is obtained through a weighted summation operation followed by a residual update: \hat {\mathcal {F}}_{i} = \mathcal {F}_{i} + \beta \cdot \phi _\text {post}… view at source ↗
Figure 6
Figure 6. Figure 6: A comparative analysis of HEAL’s performance guided by confidence versus that guided by uncertainty maps. 4 Experiments Training details, dataset descriptions, and benchmark configurations are pro￾vided in the supplementary material (Appendix B). 4.1 Effectiveness of the uncertainty map [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Robustness analysis under pose error and latency error on DAIR-V2X and V2V4REAL datasets. baseline (without UAPF), while Tab. 4(b) isolates the effect of the guidance￾map type within the fixed UAPF architecture. Together, they confirm that both the uncertainty map and the UAPF architecture independently contribute to the overall gains. Effectiveness of our key components. We conduct an ablation study to va… view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of collaboration in UECP. Green and red denote ground truth and detection, respectively. 4.4 Qualitative evaluation Visualizations of detection results are provided in the supplementary material (Appendix B.4). Visualization of information selection and representation [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of FLOPs, FPS, and parameter size across collaborative perception methods. Each circle represents a different method, with its size proportional to the number of parameters (larger circles indicate more parameters). The x-axis denotes computational cost (FLOPs, in billions), and the y-axis shows inference speed (FPS, higher is better). UECP achieves a favorable balance, as highlighted in green. … view at source ↗
Figure 10
Figure 10. Figure 10: UECP achieves more accurate detections and less false positive. The first row corresponds to samples from the DAIR-V2X dataset, while the second row shows samples from the V2V4REAL dataset. Green and red boxes denote ground-truth and detection, respectively [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
read the original abstract

Collaborative perception serves as a pivotal solution to enhance the perception capability of individual agents in autonomous driving, where a core challenge lies in seeking reliable evidence to quantify and weight the contribution of each participating agent. Existing methods typically rely on a confidence map, which is co-trained with the detection head, but it is inherently correlated with the detection results and thus fails to provide unbiased physical evidence. Furthermore, how to deeply integrate evidence into the cooperative fusion process remains an open question. To address these issues, this paper first proposes an uncertainty map, a physically grounded and unambiguous metric for evaluating perception quality. This map is directly supervised by real-time sensor signals, i.e., LiDAR point density, ensuring decoupling from detection noise and thereby providing physical scenario-aware evidence for weighting agent contribution. Based on this map, we develop the Uncertainty-Enhanced Collaborative Perception (UECP) framework, centered on the Uncertainty-Aware Pyramid Fusion (UAPF) module. UAPF uses a coarse-to-fine strategy, with two key components: Uncertainty-Weighted Downsampling (UWD) for high-fidelity feature preservation, and Uncertainty-Guided Residual Fusion (UGRF) to reinforce ego features, suppressing noise and ensuring robust fusion. Extensive experiments on real-world datasets show UECP outperforms state-of-the-art methods in effectiveness and robustness by embedding the uncertainty map into fusion. Code will be publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims to introduce an uncertainty map for collaborative perception in autonomous driving, supervised directly by LiDAR point density to yield a physically grounded metric decoupled from detection noise and thus providing unbiased evidence for weighting agent contributions. It presents the UECP framework built around an Uncertainty-Aware Pyramid Fusion (UAPF) module that employs a coarse-to-fine strategy via Uncertainty-Weighted Downsampling (UWD) and Uncertainty-Guided Residual Fusion (UGRF), with experiments on real-world datasets asserted to demonstrate outperformance over state-of-the-art methods.

Significance. If the claimed structural independence of the uncertainty map from detection outputs holds and the fusion components deliver measurable gains, the work could strengthen evidence-based weighting in multi-agent perception systems. The planned public code release would support reproducibility.

major comments (2)
  1. [Abstract] Abstract (central claim paragraph): the assertion that direct supervision by LiDAR point density 'ensuring decoupling from detection noise' supplies unbiased physical evidence is load-bearing for the contribution, yet point density is derived from the identical raw point cloud that drives the detector; sparsity simultaneously reduces density and elevates miss/false-positive rates, so the learned map may encode the same scene-dependent difficulty factors rather than remaining orthogonal after conditioning on the input.
  2. [Abstract] Abstract (experiments sentence): the statement that 'extensive experiments on real-world datasets show UECP outperforms state-of-the-art methods' is presented without any quantitative metrics, ablation tables, or error analysis, preventing verification of whether the uncertainty weighting actually improves fusion robustness or merely correlates with baseline performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and thoughtful comments on our manuscript. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Abstract] Abstract (central claim paragraph): the assertion that direct supervision by LiDAR point density 'ensuring decoupling from detection noise' supplies unbiased physical evidence is load-bearing for the contribution, yet point density is derived from the identical raw point cloud that drives the detector; sparsity simultaneously reduces density and elevates miss/false-positive rates, so the learned map may encode the same scene-dependent difficulty factors rather than remaining orthogonal after conditioning on the input.

    Authors: We agree that scene sparsity and other input-dependent factors influence both point density and detection performance. However, the uncertainty map is supervised directly on the computed point-density values rather than on detection labels or the output of the detection head. This supervision target is a physical sensor-coverage metric independent of the detector's training objective and its specific errors, unlike co-trained confidence maps. The resulting map therefore supplies a distinct signal for fusion weighting. We will revise the abstract wording to state 'decoupling from the detection head' for greater precision. revision: partial

  2. Referee: [Abstract] Abstract (experiments sentence): the statement that 'extensive experiments on real-world datasets show UECP outperforms state-of-the-art methods' is presented without any quantitative metrics, ablation tables, or error analysis, preventing verification of whether the uncertainty weighting actually improves fusion robustness or merely correlates with baseline performance.

    Authors: The abstract is a concise summary; the full manuscript contains the requested quantitative evidence. Sections 4 and 5 report mAP gains on OPV2V and V2V4Real, ablation studies isolating UWD and UGRF, and robustness tests under communication noise and varying agent counts. These results show that the uncertainty-weighted fusion contributes measurable improvements beyond baseline performance. revision: no

Circularity Check

0 steps flagged

No significant circularity; uncertainty map uses external LiDAR density supervision

full rationale

The abstract presents the uncertainty map as directly supervised by an external physical signal (LiDAR point density) rather than derived from or fitted to detection outputs. No equations, self-citations, or derivations are shown that would make the claimed decoupling or weighting equivalent to the detection results by construction. The contrast with confidence maps (co-trained with detection) is explicit, and the fusion modules (UAPF, UWD, UGRF) are described as building on this map without reducing to a redefinition of inputs. This is a standard modeling choice with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; the central claim rests on the unverified premise that point density is a sufficient and unbiased proxy for perception quality. No free parameters, axioms, or invented entities can be audited without the methods section.

pith-pipeline@v0.9.1-grok · 5785 in / 1099 out tokens · 23367 ms · 2026-06-26T09:04:00.660119+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 3 canonical work pages

  1. [1]

    Ieee Access7, 55817–55832 (2019)

    Alotaibi, E.T., Alqefari, S.S., Koubaa, A.: Lsar: Multi-uav collaboration for search and rescue missions. Ieee Access7, 55817–55832 (2019)

  2. [2]

    arXiv preprint arXiv:2107.07511 (2021)

    Angelopoulos, A.N., Bates, S.: A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511 (2021)

  3. [3]

    IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

    Arnold, E., Dianati, M., de Temple, R., Fallah, S.: Cooperative perception for 3d object detection in driving scenarios using infrastructure sensors. IEEE Transac- tions on Intelligent Transportation Systems23(3), 1852–1864 (2020)

  4. [4]

    In: International conference on machine learning

    Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: International conference on machine learning. pp. 1613–1622. PMLR (2015)

  5. [5]

    In: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

    Chen, Q., Ma, X., Tang, S., Guo, J., Yang, Q., Fu, S.: F-cooper: Feature based co- operative perception for autonomous vehicle edge computing system using 3d point clouds. In: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. pp. 88–100 (2019)

  6. [6]

    In: 2019 IEEE 39th Inter- national Conference on Distributed Computing Systems (ICDCS)

    Chen, Q., Tang, S., Yang, Q., Fu, S.: Cooper: Cooperative perception for con- nected autonomous vehicles based on 3d point clouds. In: 2019 IEEE 39th Inter- national Conference on Distributed Computing Systems (ICDCS). pp. 514–524. IEEE (2019)

  7. [7]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Chen, Z., Shi, Y., Jia, J.: Transiff: An instance-level feature fusion framework for vehicle-infrastructure cooperative 3d detection with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 18205–18214 (2023)

  8. [8]

    arXiv preprint arXiv:2410.23910 (2024)

    Durasov, N., Mahmood, R., Choi, J., Law, M.T., Lucas, J., Fua, P., Alvarez, J.M.: Uncertainty estimation for 3d object detection via evidential learning. arXiv preprint arXiv:2410.23910 (2024)

  9. [9]

    In: 2024 IEEE International Conference on Robotics and Automation (ICRA)

    Fan, S., Yu, H., Yang, W., Yuan, J., Nie, Z.: Quest: Query stream for practical cooperative perception. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 18436–18442. IEEE (2024)

  10. [10]

    In: 2018 21st international conference on intelligent transportation systems (ITSC)

    Feng, D., Rosenbaum, L., Dietmayer, K.: Towards safe autonomous driving: Cap- ture uncertainty in the deep neural network for lidar 3d vehicle detection. In: 2018 21st international conference on intelligent transportation systems (ITSC). pp. 3266–3273. IEEE (2018)

  11. [11]

    In: international conference on machine learn- ing

    Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learn- ing. pp. 1050–1059. PMLR (2016)

  12. [12]

    arXiv preprint arXiv:2501.18616 (2025)

    Gao, X., Xu, R., Li, J., Wang, Z., Fan, Z., Tu, Z.: Stamp: Scalable task and model- agnostic collaborative perception. arXiv preprint arXiv:2501.18616 (2025)

  13. [13]

    Transportmetrica A: Transport Science16(3), 1375–1399 (2020)

    Guo, Y., Ma, J.: Leveraging existing high-occupancy vehicle lanes for mixed- autonomy traffic management with emerging connected automated vehicle appli- cations. Transportmetrica A: Transport Science16(3), 1375–1399 (2020)

  14. [14]

    In: Conference on Robot Learning (CoRL)

    He, J.J., Hu, P., Laungani, D., Anguelov, D.: Modeling confidence in lidar-based bev perception. In: Conference on Robot Learning (CoRL). pp. 330–340 (2022) 16 K. Yang et al

  15. [15]

    In: Proceedings of the IEEE conference on com- puter vision and pattern recognition

    Honari, S., Yosinski, J., Vincent, P., Pal, C.: Recombinator networks: Learning coarse-to-fine feature aggregation. In: Proceedings of the IEEE conference on com- puter vision and pattern recognition. pp. 5743–5752 (2016)

  16. [16]

    Advances in neural information processing systems35, 4874–4886 (2022)

    Hu, Y., Fang, S., Lei, Z., Zhong, Y., Chen, S.: Where2comm: Communication- efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems35, 4874–4886 (2022)

  17. [17]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Hu, Y., Lu, Y., Xu, R., Xie, W., Chen, S., Wang, Y.: Collaboration helps camera overtake lidar in 3d detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9243–9252 (June 2023)

  18. [18]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Hu, Y., Peng, J., Liu, S., Ge, J., Liu, S., Chen, S.: Communication-efficient col- laborative perception via information filling with codebook. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15481– 15490 (2024)

  19. [19]

    In: ACM Multimedia 2024 (2024)

    Huang, Z., Wang, S., Wang, Y., Li, W., Li, D., Wang, L.: Roco: Robust cooperative perception by iterative object matching and pose adjustment. In: ACM Multimedia 2024 (2024)

  20. [21]

    Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems30(2017)

  21. [22]

    In: Advances in Neural Information Processing Systems (NeurIPS)

    Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 30 (2017)

  22. [23]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

  23. [24]

    In: IEEE International Conference on Robotics and Automation (ICRA)

    Lee, Y., Chen, C.H., Cheng, H.C., Lin, W.C.: What you see is what you get: A probabilistic volumetric fusion for environment perception. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 8126–8132 (2022)

  24. [25]

    In: European Conference on Computer Vision

    Lei, Z., Ren, S., Hu, Y., Zhang, W., Chen, S.: Latency-aware collaborative percep- tion. In: European Conference on Computer Vision. pp. 316–332. Springer (2022)

  25. [26]

    In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Is- rael, October 23–27, 2022, Proceedings, Part XXXII

    Lei, Z., Ren, S., Hu, Y., Zhang, W., Chen, S.: Latency-aware collaborative percep- tion. In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Is- rael, October 23–27, 2022, Proceedings, Part XXXII. p. 316–332. Springer-Verlag, Berlin, Heidelberg (2022).https://doi.org/10.1007/978-3-031-19824-3_19, https://doi.org/10.1007/978-3-031-19824-3_19

  26. [27]

    In: 2024 IEEE International Conference on Robotics and Automation (ICRA)

    Li, J., Xu, R., Liu, X., Li, B., Zou, Q., Ma, J., Yu, H.: S2r-vit for multi-agent cooperative perception: Bridging the gap from simulation to reality. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 16374–16380. IEEE (2024)

  27. [28]

    IEEE Transactions on Intelligent Vehicles (2023)

    Li, J., Xu, R., Liu, X., Ma, J., Chi, Z., Ma, J., Yu, H.: Learning for vehicle-to- vehicle cooperative perception under lossy communication. IEEE Transactions on Intelligent Vehicles (2023)

  28. [29]

    Advances in Neural Information Processing Systems34, 29541–29552 (2021)

    Li, Y., Ren, S., Wu, P., Chen, S., Feng, C., Zhang, W.: Learning distilled collabora- tion graph for multi-agent perception. Advances in Neural Information Processing Systems34, 29541–29552 (2021)

  29. [30]

    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection.In:ProceedingsoftheIEEEinternationalconferenceoncomputervision. pp. 2980–2988 (2017) UECP 17

  30. [31]

    In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition

    Liu, Y.C., Tian, J., Glaser, N., Kira, Z.: When2com: Multi-agent perception via communication graph grouping. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 4106–4115 (2020)

  31. [32]

    Efficient iterative linear-quadratic approximations for nonlinear multi-player general-sum differential games,

    Liu, Y.C., Tian, J., Ma, C.Y., Glaser, N., Kuo, C.W., Kira, Z.: Who2com: Collab- orative perception via learnable handshake communication. In: 2020 IEEE Inter- national Conference on Robotics and Automation (ICRA). pp. 6876–6883 (2020). https://doi.org/10.1109/ICRA40945.2020.9197364

  32. [33]

    arXiv preprint arXiv:1711.05101 (2017)

    Loshchilov, I.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  33. [34]

    arXiv preprint arXiv:2401.13964 (2024)

    Lu, Y., Hu, Y., Zhong, Y., Wang, D., Chen, S., Wang, Y.: An extensible framework for open heterogeneous collaborative perception. arXiv preprint arXiv:2401.13964 (2024)

  34. [35]

    In: 2023 IEEE International Conference on Robotics and Automation (ICRA)

    Lu, Y., Li, Q., Liu, B., Dianati, M., Feng, C., Chen, S., Wang, Y.: Robust collab- orative 3d object detection in presence of pose errors. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 4812–4818. IEEE (2023)

  35. [36]

    Journal of Transportation Engineering, Part A: Systems146(6), 04020034 (2020)

    Ma, J., Leslie, E., Ghiasi, A., Huang, Z., Guo, Y.: Empirical analysis of a freeway bundled connected-and-automated vehicle application using experimental data. Journal of Transportation Engineering, Part A: Systems146(6), 04020034 (2020)

  36. [37]

    Efficient iterative linear-quadratic approximations for nonlinear multi-player general-sum differential games,

    Michelmore, R., Wicker, M., Laurenti, L., Cardelli, L., Gal, Y., Kwiatkowska, M.: Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control. In: 2020 IEEE International Conference on Robotics and Automa- tion (ICRA). pp. 7344–7350 (2020).https://doi.org/10.1109/ICRA40945.2020. 9196844

  37. [38]

    In: International Conference on Machine Learning

    Park, N., Kim, S.: Blurs behave like ensembles: Spatial smoothings to improve accuracy, uncertainty, and robustness. In: International Conference on Machine Learning. pp. 17390–17419. PMLR (2022)

  38. [39]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Saeedan, F., Weber, N., Goesele, M., Roth, S.: Detail-preserving pooling in deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 9108–9116 (2018)

  39. [40]

    In: Advances in Neural Information Processing Systems (NeurIPS)

    Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify clas- sification uncertainty. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 31 (2018)

  40. [41]

    In: Artificial intelligence and machine learning for multi- domain operations applications

    Smith, L.N., Topin, N.: Super-convergence: Very fast training of neural networks using large learning rates. In: Artificial intelligence and machine learning for multi- domain operations applications. vol. 11006, pp. 369–386. SPIE (2019)

  41. [42]

    a talk at the Stanford Artificial Project in1968, 271–272 (1968)

    Sobel, I., Feldman, G., et al.: A 3x3 isotropic gradient operator for image process- ing. a talk at the Stanford Artificial Project in1968, 271–272 (1968)

  42. [43]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Song, Z., Yang, L., Wen, F., Li, J.: Traf-align: Trajectory-aware feature alignment for asynchronous multi-agent perception. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 12048–12057 (2025)

  43. [44]

    IEEE Robotics and Automation Letters9(4), 3323–3330 (2024)

    Su, S., Han, S., Li, Y., Zhang, Z., Feng, C., Ding, C., Miao, F.: Collaborative multi-object tracking with conformal uncertainty propagation. IEEE Robotics and Automation Letters9(4), 3323–3330 (2024)

  44. [45]

    In: Conference on Robot Learning

    Vadivelu, N., Ren, M., Tu, J., Wang, J., Urtasun, R.: Learning to communicate and correct pose errors. In: Conference on Robot Learning. pp. 1195–1210. PMLR (2021)

  45. [46]

    arXiv preprint arXiv:2503.13504 (2025) 18 K

    Wang, R., Gao, X., Xiang, H., Xu, R., Tu, Z.: Cocmt: Communication- efficient cross-modal transformer for collaborative perception. arXiv preprint arXiv:2503.13504 (2025) 18 K. Yang et al

  46. [47]

    Wang,T.H.,Manivasagam,S.,Liang,M.,Yang,B.,Zeng,W.,Urtasun,R.:V2vnet: Vehicle-to-vehiclecommunicationforjointperceptionandprediction.In:Computer Vision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28,2020, Proceedings, Part II 16. pp. 605–621. Springer (2020)

  47. [48]

    Advances in Neural Information Processing Systems36, 28462–28477 (2023)

    Wei,S.,Wei,Y.,Hu,Y.,Lu,Y.,Zhong,Y.,Chen,S.,Zhang,Y.:Asynchrony-robust collaborative perception via bird’s eye view flow. Advances in Neural Information Processing Systems36, 28462–28477 (2023)

  48. [49]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Xiang, H.,Xu,R.,Ma,J.:Hm-vit:Hetero-modalvehicle-to-vehiclecooperativeper- ception with vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 284–295 (2023)

  49. [50]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Xu, J., Zhang, Y., Cai, Z., Huang, D.: Cosdh: communication-efficient collabora- tive perception via supply-demand awareness and intermediate-late hybridization. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6834–6843 (2025)

  50. [51]

    Xu, R., Tu, Z., Xiang, H., Shao, W., Zhou, B., Ma, J.: Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers (2022)

  51. [52]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Xu, R., Xia, X., Li, J., Li, H., Zhang, S., Tu, Z., Meng, Z., Xiang, H., Dong, X., Song, R., et al.: V2v4real: A real-world large-scale dataset for vehicle-to-vehicle co- operative perception. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13712–13722 (2023)

  52. [53]

    Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., Ma, J.: V2x-vit: Vehicle-to- everything cooperative perception with vision transformer (2022)

  53. [54]

    In: 2022 International Conference on Robotics and Automation (ICRA)

    Xu, R., Xiang, H., Xia, X., Han, X., Li, J., Ma, J.: Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In: 2022 International Conference on Robotics and Automation (ICRA). pp. 2583–

  54. [55]

    Advances in Neural Information Processing Systems36, 25151–25164 (2023)

    Yang, D., Yang, K., Wang, Y., Liu, J., Xu, Z., Yin, R., Zhai, P., Zhang, L.: How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception. Advances in Neural Information Processing Systems36, 25151–25164 (2023)

  55. [56]

    Yang, K., Bu, T., Li, L., Li, C., Wang, Y., Li, D.: Is discretization fusion all you need for collaborative perception? In: 2025 IEEE International Conference on Robotics and Automation (ICRA). pp. 9590–9596 (2025).https://doi.org/10. 1109/ICRA55743.2025.11128776

  56. [57]

    IEEE Transactions on Intelligent Transportation Systems25(2), 2153–2166 (2023)

    Yin, H., Tian, D., Lin, C., Duan, X., Zhou, J., Zhao, D., Cao, D.: V2vformer++: Multi-modal vehicle-to-vehicle cooperative perception via global-local transformer. IEEE Transactions on Intelligent Transportation Systems25(2), 2153–2166 (2023)

  57. [58]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., et al.: Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21361–21370 (2022)

  58. [59]

    In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition

    Yu, H., Yang, W., Ruan, H., Yang, Z., Tang, Y., Gao, X., Hao, X., Shi, Y., Pan, Y., Sun, N., et al.: V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition. pp. 5486–5495 (2023)

  59. [60]

    In: Proceedings of the Computer Vision and Pat- tern Recognition Conference

    Yuan, Y., Xia, Y., Cremers, D., Sester, M.: Sparsealign: A fully sparse framework for cooperative object detection. In: Proceedings of the Computer Vision and Pat- tern Recognition Conference. pp. 22296–22305 (2025)

  60. [61]

    Zhao, B., Zhang, W., Zou, Z.: Bm2cp: Efficient collaborative perception with lidar- camera modalities. arXiv preprint arXiv:2310.14702 (2023) UECP 19 A Method Details A.1 Loss function The training loss of the model is simply the sum of the regression lossLreg, classification lossLcls, direction lossLdir and uncertainty map lossLun: Ltotal =λ regLreg +λ c...