pith. sign in

arxiv: 2605.16087 · v2 · pith:BD73KWXLnew · submitted 2026-05-15 · 💻 cs.RO · cs.AI

Towards Trustworthy and Explainable AI for Perception Models: From Concept to Prototype Vehicle Deployment

Pith reviewed 2026-05-25 06:26 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords trustworthy AIexplainable AIautonomous drivingperceptiontransformer detectoruncertainty calibrationrobustnesssaliency maps
0
0 comments X

The pith

A transformer detector for autonomous driving yields faithful explanations from its attention weights at inference time, plus calibrated uncertainty and robustness improvements, all deployed in a prototype vehicle with a real-time interface

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a complete trustworthy-AI pipeline can be built for 3D perception in driving by taking a transformer detector, extracting explanations directly from its attention maps, adding an uncertainty-calibration module, and applying robustness training. It then validates the explanations with perturbation consistency tests, measures gains in robustness and calibration, and moves the entire system onto a real vehicle where an interface displays the saliency maps, uncertainty state, and documentation artifacts live. A sympathetic reader would care because current deep networks for scene understanding remain black boxes that conflict with safety standards and make debugging or oversight difficult; if the pipeline works, it supplies one concrete route from abstract trustworthy-AI guidelines to an operational perception stack.

Core claim

Building on a transformer-based detector, explanations are derived from the attention mechanism at inference time and validated for faithfulness using perturbation-based consistency tests. An uncertainty estimation and calibration module is integrated, robustness-enhancing training methods are applied, and the resulting system is shown to produce faithful saliency behavior, improved robustness, and well-calibrated uncertainty estimates. The full set of trustworthy-AI elements is finally deployed in a prototype vehicle together with an XAI interface that visualizes documentation artifacts, model uncertainty state, and saliency maps in real time.

What carries the argument

Attention weights extracted from the transformer detector at inference time, used as the source of saliency explanations and validated by perturbation consistency tests, together with a separate uncertainty-calibration module and robustness training.

If this is right

  • Explanations become available at inference time with no extra forward passes required.
  • The perception module can be monitored in real time for uncertainty spikes that may indicate out-of-distribution inputs.
  • Robustness training reduces performance drop under common perturbations such as noise or occlusion.
  • The deployed XAI interface supplies a single screen that combines saliency, uncertainty, and model documentation for human oversight.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same attention-extraction pattern could be tested on other transformer architectures used for 3D detection to see whether faithfulness holds across detector families.
  • If the real-time interface is kept, it might serve as a template for logging artifacts required by future automotive safety standards.
  • Extending the uncertainty calibration to multi-modal sensor fusion would be a direct next step that the current single-detector pipeline leaves open.

Load-bearing premise

Attention weights from the transformer at inference time give faithful accounts of the model's actual decisions, with faithfulness checked only through the described perturbation tests.

What would settle it

A controlled test in which the attention-derived saliency maps are compared against a ground-truth importance measure obtained by systematically ablating input regions and measuring change in the detector's output scores; systematic mismatch would falsify the faithfulness claim.

Figures

Figures reproduced from arXiv: 2605.16087 by Ayushman Choudhuri, Lutz Eckstein, Manas Mehrotra, Shayan Sharifi, Till Beemelmanns.

Figure 1
Figure 1. Figure 1: Proposed Trustworthy AI Approach. We propose a multi-modal perception module that integrates Trustwor￾thy AI components: robust training, calibrated uncertainty quantification, and explainability. An XAI Interface enables transparent monitoring through documentation, visualized uncertainty state, sensor usage, and saliency maps. interpreting and monitoring AI modules, concrete implemen￾tations remain scarc… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Trustworthy AI Approach. From LiDAR point cloud and multi-view camera images, we extract camera ( ) and LiDAR tokens ( ) that interact with object queries ( ) via Cross-Attention and produce calibrated uncertainty￾aware 3D bounding boxes. Attention weights ( ), derived sensor usage, and current model uncertainty state are used for visualization in the XAI Interface, along with supporting Mo… view at source ↗
Figure 3
Figure 3. Figure 3: XAI Faithfulness Test. NuScenes Detection Score (NDS) under increasing perturbation level ρ. The proposed Mean-Fusion yields the best overall trade-off for both tests. XAI Faithfulness. We evaluate explanation faithfulness via perturbation tests on LiDAR and camera inputs, and provide qualitative examples of the obtained saliency maps and modality contributions in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: XAI Attention Maps Visualization. REFERENCES [1] J. Yan, Y. Liu, J. Sun, F. Jia, S. Li, T. Wang, and X. Zhang, “Cross modal transformer via coordinates encoding for 3d object dectection,” International Conference on Computer Vision (ICCV), 2023. [2] Y. Xie, C. Xu, M.-J. Rakotosaona, P. Rim, F. Tombari, K. Keutzer, M. Tomizuka, and W. Zhan, “Sparsefusion: Fusing multi-modal sparse representations for multi-… view at source ↗
read the original abstract

Deep Neural Networks have become the dominant solution for Autonomous Driving perception, but their opacity conflicts with emerging Trustworthy AI guidelines and complicates safety assurance, debugging, and human oversight. While theoretical frameworks for safe and Explainable AI (XAI) exist, concrete implementations of Trustworthy AI for 3D scene understanding remain scarce. We address this gap by proposing a Trustworthy AI perception module that is remarkably robust, integrates faithful explainability, and calibrated uncertainty estimates. Building on a transformer-based detector, we derive explanation from the attention mechanism at inference time and validate their faithfulness using perturbation-based consistency tests. We further integrate an uncertainty estimation and calibration module, and apply robustness-enhancing training methods. Experiments show faithful saliency behavior, improved robustness, and well-calibrated uncertainty estimates. Finally, we deploy these Trustworthy AI elements in a prototype vehicle and provide an XAI Interface that visualizes documentation artifacts, model uncertainty state, and saliency maps, demonstrating the feasibility of trustworthy perception monitoring in real time. Supplementary materials are available at https://tillbeemelmanns.github.io/trustworthy_ai/ .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a Trustworthy AI perception module for autonomous driving based on a transformer detector. Explanations are derived from attention weights at inference time and validated via perturbation-based consistency tests; an uncertainty estimation and calibration module is integrated along with robustness-enhancing training. Experiments are claimed to demonstrate faithful saliency behavior, improved robustness, and well-calibrated uncertainty. The full pipeline is deployed in a prototype vehicle with a real-time XAI interface visualizing saliency maps, uncertainty state, and documentation artifacts.

Significance. If the quantitative results and validation hold, the work is significant for providing one of the few end-to-end implementations of trustworthy AI elements (faithful explainability, calibrated uncertainty, robustness) in a 3D perception system for autonomous driving, including real-vehicle deployment and an operational XAI interface. This bridges theoretical frameworks with practical systems integration and could serve as a reference for safety-critical applications.

major comments (2)
  1. [Abstract] Abstract: The abstract reports positive experimental outcomes on faithfulness, robustness, and calibration but supplies no quantitative numbers, baseline comparisons, or details on post-hoc choices (e.g., perturbation types, calibration method). This absence makes it impossible to assess the magnitude or reliability of the claimed improvements.
  2. [Explanation validation] Explanation validation (referenced in Abstract): The claim that attention-derived saliency maps constitute faithful explanations rests solely on perturbation-based consistency tests. The manuscript provides no comparison to alternative methods (e.g., integrated gradients, occlusion), no ablation on perturbation strategy, and no analysis of whether the tests detect known attention failure modes such as spurious focus on background or non-causal tokens. In a safety-critical 3D detector setting, this leaves the faithfulness component of the trustworthy pipeline insufficiently supported.
minor comments (1)
  1. [Abstract] The supplementary materials link is provided, but the main text would benefit from explicit cross-references to specific quantitative results or figures supporting the 'faithful saliency behavior' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the presentation of our results. We address each major comment below and commit to revisions that improve clarity and support for the claims without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract reports positive experimental outcomes on faithfulness, robustness, and calibration but supplies no quantitative numbers, baseline comparisons, or details on post-hoc choices (e.g., perturbation types, calibration method). This absence makes it impossible to assess the magnitude or reliability of the claimed improvements.

    Authors: We agree that the abstract would benefit from quantitative details to enable readers to evaluate the scale of improvements. In the revised version we will incorporate representative metrics (e.g., faithfulness consistency scores, robustness gains under perturbation, and expected calibration error) together with concise references to the perturbation strategy and calibration procedure employed. revision: yes

  2. Referee: [Explanation validation] Explanation validation (referenced in Abstract): The claim that attention-derived saliency maps constitute faithful explanations rests solely on perturbation-based consistency tests. The manuscript provides no comparison to alternative methods (e.g., integrated gradients, occlusion), no ablation on perturbation strategy, and no analysis of whether the tests detect known attention failure modes such as spurious focus on background or non-causal tokens. In a safety-critical 3D detector setting, this leaves the faithfulness component of the trustworthy pipeline insufficiently supported.

    Authors: We recognize that the current validation relies exclusively on perturbation consistency and lacks explicit comparisons or failure-mode analysis. We will add (i) a comparison of attention-derived maps against integrated gradients and occlusion, (ii) an ablation on perturbation parameters, and (iii) a targeted examination of attention behavior on background or non-causal regions, including discussion of implications for 3D detection safety. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical systems integration only

full rationale

The paper describes an empirical pipeline integrating a transformer detector, attention-derived saliency maps validated via perturbation consistency tests, uncertainty calibration, and robustness training, with vehicle deployment. No mathematical derivation chain, equations, or first-principles results are claimed. No steps reduce any reported outcome to a fitted parameter, self-citation, or self-definition inside the paper. Claims rest on experimental measurements against external benchmarks rather than internal construction, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the empirical performance of standard components (transformer attention, perturbation testing, calibration) rather than new axioms or invented entities. No free parameters are explicitly introduced in the abstract; the work does not postulate new particles, forces, or dimensions.

axioms (1)
  • domain assumption Attention weights from the transformer detector can be treated as explanations whose faithfulness can be assessed via perturbation consistency tests.
    This premise is invoked when the paper states that explanations are derived from the attention mechanism and validated by perturbation-based tests.

pith-pipeline@v0.9.0 · 5740 in / 1572 out tokens · 24774 ms · 2026-05-25T06:26:27.483586+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

  1. [1]

    Cross modal transformer via coordinates encoding for 3d object dectection,

    J. Yan, Y . Liu, J. Sun, F. Jia, S. Li, T. Wang, and X. Zhang, “Cross modal transformer via coordinates encoding for 3d object dectection,” International Conference on Computer Vision (ICCV), 2023

  2. [2]

    Sparsefusion: Fusing multi-modal sparse representations for multi-sensor 3d object detection,

    Y . Xie, C. Xu, M.-J. Rakotosaona, P. Rim, F. Tombari, K. Keutzer, M. Tomizuka, and W. Zhan, “Sparsefusion: Fusing multi-modal sparse representations for multi-sensor 3d object detection,” inConference on Computer Vision and Pattern Recognition (CVPR), 2023

  3. [3]

    Ethics guidelines for trustworthy AI,

    European Commission, “Ethics guidelines for trustworthy AI,” 2019, accessed: 2025-12-28. [Online]. Available: https://digital-strategy.ec. europa.eu/en/library/ethics-guidelines-trustworthy-ai

  4. [4]

    Artificial intelligence risk management framework,

    National Institute of Standards and Technology, “Artificial intelligence risk management framework,” 2023, accessed: 2025-12-28. [Online]. Available: https://airc.nist.gov/airmf-resources/playbook/

  5. [5]

    ISO 26262- 1:2018(en): Road vehicles — functional safety,

    International Organization for Standardization (ISO), “ISO 26262- 1:2018(en): Road vehicles — functional safety,” 2018. [Online]. Available: https://www.iso.org/standard/68383.html

  6. [6]

    ISO 21448:2022: Road vehicles — safety of the intended functionality,

    ——, “ISO 21448:2022: Road vehicles — safety of the intended functionality,” 2022. [Online]. Available: https://www.iso.org/standard/ 77490.html

  7. [7]

    Explainable ai for safe and trustworthy autonomous driving: A systematic review,

    A. Kuznietsov, B. Gyevnar, C. Wang, S. Peters, and S. V . Albrecht, “Explainable ai for safe and trustworthy autonomous driving: A systematic review,”IEEE International Conference on Intelligent Transportation Systems (ITSC), 2024

  8. [8]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the 34th Interna- tional Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, p. 1321–1330. (a)karl.Research vehicle used to deploy the proposed approach. (b)XAI Interface.Embedded into the vehicle’s dashboard. (c)XAI Interfa...

  9. [9]

    Can we trust you? on calibration of a probabilistic object detector for autonomous driving,

    Di Feng, L. Rosenbaum, C. Glaeser, F. Timm, and K. Dietmayer, “Can we trust you? on calibration of a probabilistic object detector for autonomous driving,” inInternational Conference on Intelligent Robots and Systems (IROS), 2019

  10. [10]

    Multi- variate confidence calibration for object detection,

    F. K ¨uppers, J. Kronenberger, A. Shantia, and A. Haselhoff, “Multi- variate confidence calibration for object detection,” inConference on Computer Vision and Pattern Recognition Workshop (CVPR’W), 2020

  11. [11]

    “why should i trust you?

    M. T. Ribeiro, S. Singh, and C. Guestrin, ““why should i trust you?”: Explaining the predictions of any classifier,” inKDD, 2016, pp. 1135– 1144

  12. [12]

    Grad-cam: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, and et al., “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017, pp. 618–626

  13. [13]

    A unified approach to interpreting model predictions,

    S. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”Advances in Neural Information Processing Systems, vol. 30, 2017

  14. [14]

    Interpretable explanations of black boxes by meaningful perturbation,

    R. Fong and A. Vedaldi, “Interpretable explanations of black boxes by meaningful perturbation,” inICCV, 2017, pp. 3429–3437

  15. [15]

    A methodology to enhance transparency for trustworthy artificial intelligence for cooperative, connected, and automated mobility,

    P. N. Ca ˜nas, M. Nieto, O. Otaegui, and I. Rodriguez, “A methodology to enhance transparency for trustworthy artificial intelligence for cooperative, connected, and automated mobility,”SAE International Journal of Connected and Automated Vehicles, vol. 8, 2024

  16. [16]

    Molnar,Interpretable Machine Learning, 3rd ed., 2025

    C. Molnar,Interpretable Machine Learning, 3rd ed., 2025. [Online]. Available: https://christophm.github.io/interpretable-ml-book

  17. [17]

    Guidelines for human-ai interaction,

    S. Amershi, D. Weld, M. V orvoreanu, A. Fourney, B. Nushi, P. Collis- son, J. Suh, S. Iqbal, P. N. Bennett, K. Inkpen, J. Teevan, R. Kikin-Gil, and E. Horvitz, “Guidelines for human-ai interaction,” inProceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ser. CHI ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 1–13

  18. [18]

    Deep inside convolutional networks: Visualising image classification models and saliency maps,

    K. Simonyan and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” inICLR Workshop, 2014

  19. [19]

    Rise: Randomized input sampling for explanation of black-box models,

    V . Petsiuk, A. Das, and K. Saenko, “Rise: Randomized input sampling for explanation of black-box models,” inBMVC, 2018

  20. [20]

    Black-box explanation of object detectors via saliency maps,

    V . Petsiuk, R. Jain, V . Manjunatha, V . I. Morariu, A. Mehra, V . Or- donez, and K. Saenko, “Black-box explanation of object detectors via saliency maps,” inConference on Computer Vision and Pattern Recognition (CVPR), 2021

  21. [21]

    Occam’s laser: Occlusion-based attribution maps for 3d object de- tectors on lidar data,

    D. Schinagl, G. Krispel, H. Possegger, P. M. Roth, and H. Bischof, “Occam’s laser: Occlusion-based attribution maps for 3d object de- tectors on lidar data,” inConference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1131–1140

  22. [22]

    Attention is All you Need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Neural Information Processing Systems (NIPS), 2017

  23. [23]

    Explainable multi-camera 3d object detection with transformer-based saliency maps,

    T. Beemelmanns, W. Zahr, and L. Eckstein, “Explainable multi-camera 3d object detection with transformer-based saliency maps,” inNeurIPS 2023 Workshop on Machine Learning for Autonomous Driving, 2023

  24. [24]

    Attention is not explanation,

    S. Jain and B. Wallace, “Attention is not explanation,” inNAACL, 2019, pp. 3543–3556

  25. [25]

    Multicorrupt: A multi-modal robustness dataset and benchmark of lidar-camera fusion for 3d object detection,

    T. Beemelmanns, Q. Zhang, C. Geller, and L. Eckstein, “Multicorrupt: A multi-modal robustness dataset and benchmark of lidar-camera fusion for 3d object detection,” inIntelligent Vehicles Symposium (IV), 2024

  26. [26]

    Robo3d: Towards robust and reliable 3d perception against corruptions,

    L. Kong, Y . Liu, X. Li, R. Chen, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Robo3d: Towards robust and reliable 3d perception against corruptions,” inInternational Conference on Computer Vision (ICCV), 2023, pp. 19 994–20 006

  27. [27]

    Benchmarking and improving bird’s eye view perception robustness in autonomous driving,

    S. Xie, L. Kong, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Benchmarking and improving bird’s eye view perception robustness in autonomous driving,”IEEE transactions on pattern analysis and machine intelligence (TPAMI), vol. 47, no. 5, pp. 3878–3894, 2025

  28. [28]

    Seeing through fog without seeing fog: Deep multi- modal sensor fusion in unseen adverse weather,

    M. Bijelic, T. Gruber, F. Mannan, F. Kraus, W. Ritter, K. Dietmayer, and F. Heide, “Seeing through fog without seeing fog: Deep multi- modal sensor fusion in unseen adverse weather,” inConference on Computer Vision and Pattern Recognition (CVPR), 2020

  29. [29]

    Canadian adverse driving conditions dataset,

    M. Pitropov, D. E. Garcia, J. Rebello, M. Smart, C. Wang, K. Czar- necki, and S. Waslander, “Canadian adverse driving conditions dataset,”The International Journal of Robotics Research, vol. 40, no. 4-5, pp. 681–690, 12 2020

  30. [30]

    nuScenes: A Multimodal Dataset for Autonomous Driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuScenes: A Multimodal Dataset for Autonomous Driving,” inConference on Computer Vision and Pattern Recognition (CVPR), 2020

  31. [31]

    Scalability in Perception for Autonomous Driving: Waymo Open Dataset,

    P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine,et al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset,” inConference on Computer Vision and Pattern Recognition (CVPR), 2020

  32. [32]

    Benchmarking the robustness of lidar-camera fusion for 3d object detection,

    K. Yu, T. Tang, H. Xie, Z. Lin, Z. Wu, Z. Xia, T. Liang, H. Sun, J. Deng, D. Hao, Y . Wang, X. Liang, and B. Wang, “Benchmarking the robustness of lidar-camera fusion for 3d object detection,”Conference on Computer Vision and Pattern Recognition Workshop (CVPR’W), pp. 3188–3198, 2022

  33. [33]

    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,

    Y . Gal and Z. Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” inInternational Conference on Machine Learning (ICML), 2017

  34. [34]

    Sampling-free epistemic uncertainty estimation using approximated variance propagation,

    J. Postels, F. Ferroni, H. Coskun, N. Navab, and F. Tombari, “Sampling-free epistemic uncertainty estimation using approximated variance propagation,” inConference on Computer Vision and Pattern Recognition (CVPR), October 2019

  35. [35]

    Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles,

    B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles,” inNeural Information Processing Systems (NIPS), 2017

  36. [36]

    Estimating the mean and variance of the target probability distribution,

    D. Nix and A. Weigend, “Estimating the mean and variance of the target probability distribution,” inProceedings of 1994 IEEE Interna- tional Conference on Neural Networks (ICNN’94), vol. 1, 1994

  37. [37]

    Practical confidence and prediction intervals,

    T. Heskes, “Practical confidence and prediction intervals,” inNeural Information Processing Systems (NIPS), M.C. Mozer, M. Jordan, and T. Petsche, Eds., vol. 9. MIT Press, 1996

  38. [38]

    Bayesod: A bayesian approach for uncertainty estimation in deep object detectors,

    A. Harakeh, M. Smart, and S. L. Waslander, “Bayesod: A bayesian approach for uncertainty estimation in deep object detectors,” in International Conference on Robotics and Automation (ICRA), i. b. Institute of Electrical and Electronics Engineers, Ed. IEEE, 2020

  39. [39]

    Uncertainty estimation for deep neural object detectors in safety-critical applications,

    M. T. Le, F. Diehl, T. Brunner, and A. Knoll, “Uncertainty estimation for deep neural object detectors in safety-critical applications,” in IEEE International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 3873–3878

  40. [40]

    Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving,

    J. Choi, D. Chun, H. Kim, and H.-J. Lee, “Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving,” inInternational Conference on Computer Vision (ICCV), 2019, pp. 502–511

  41. [41]

    Bounding box regression with uncertainty for accurate object detection,

    Y . He, C. Zhu, J. Wang, M. Savvides, and X. Zhang, “Bounding box regression with uncertainty for accurate object detection,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2018

  42. [42]

    Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection,

    D. Feng, L. Rosenbaum, and K. Dietmayer, “Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection,” inIEEE International Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 3266–3273

  43. [43]

    Training independent subnet- works for robust prediction,

    M. Havasi, R. Jenatton, S. Fort, J. Z. Liu, J. Snoek, B. Lakshmi- narayanan, A. M. Dai, and D. Tran, “Training independent subnet- works for robust prediction,” inInternational Conference on Learning Representations (ICLR), 2021

  44. [44]

    Lidar-mimo: Efficient uncertainty estimation for lidar-based 3d object detection,

    M. Pitropov, C. Huang, V . Abdelzad, K. Czarnecki, and S. Waslander, “Lidar-mimo: Efficient uncertainty estimation for lidar-based 3d object detection,” inIntelligent Vehicles Symposium (IV), 2022, pp. 813–820

  45. [45]

    OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction,

    S. Heidrich, T. Beemelmanns, A. Nekrasov, B. Leibe, and L. Eck- stein, “OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction,” inInternational Conference on Robotics and Automation (ICRA), 2025

  46. [46]

    Query2uncertainty: Robust uncertainty quantification and calibration for 3d object detection under distribution shift,

    T. Beemelmanns, A. Nekrasov, S. Vilceanu, J. Steinhaus, T. Woopen, B. Leibe, and L. Eckstein, “Query2uncertainty: Robust uncertainty quantification and calibration for 3d object detection under distribution shift,” inConference on Computer Vision and Pattern Recognition (CVPR), 2026

  47. [47]

    Lasernet: An efficient probabilistic 3d object detector for autonomous driving,

    G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, and C. K. Wellington, “Lasernet: An efficient probabilistic 3d object detector for autonomous driving,” inConference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019, pp. 12 669–12 678

  48. [48]

    Leveraging heteroscedastic aleatoric uncertainties for robust real-time lidar 3d object detection,

    D. Feng, L. Rosenbaum, F. Timm, and K. Dietmayer, “Leveraging heteroscedastic aleatoric uncertainties for robust real-time lidar 3d object detection,” inIntelligent Vehicles Symposium (IV). IEEE, 2019, pp. 1280–1287

  49. [49]

    Uncertainty-aware voxel based 3d object detection and tracking with von-mises loss,

    Y . Zhong, M. Zhu, and H. Peng, “Uncertainty-aware voxel based 3d object detection and tracking with von-mises loss,”arXiv preprint arXiv:2011.02553, 2020

  50. [50]

    Robust collaborative 3d object detection in presence of pose errors,

    Y . Lu, Q. Li, B. Liu, M. Dianati, C. Feng, S. Chen, and Y . Wang, “Robust collaborative 3d object detection in presence of pose errors,” inInternational Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 4812–4818

  51. [51]

    Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,

    Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” inInternational Conference on Robotics and Automation (ICRA), 2023

  52. [52]

    Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods,

    J. Platt, “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods,”Adv. Large Margin Classif., vol. 10, June 1999

  53. [53]

    Accurate uncertainties for deep learning using calibrated regression,

    V . Kuleshov, N. Fenner, and S. Ermon, “Accurate uncertainties for deep learning using calibrated regression,” inInternational conference on machine learning. Proceedings of Machine Learning Research (PMLR), 2018, pp. 2796–2804

  54. [54]

    Deep- interaction: 3d object detection via modality interaction,

    Z. Yang, J. Chen, Z. Miao, W. Li, X. Zhu, and L. Zhang, “Deep- interaction: 3d object detection via modality interaction,” inNeural Information Processing Systems (NIPS), 2022

  55. [55]

    TransFusion: Robust Lidar-Camera Fusion for 3d Object Detection with Transformers,

    X. Bai, Z. Hu, X. Zhu, Q. Huang, Y . Chen, H. Fu, and C.-L. Tai, “TransFusion: Robust Lidar-Camera Fusion for 3d Object Detection with Transformers,”Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  56. [56]

    Is- fusion: Instance-scene collaborative fusion for multimodal 3d object detection,

    J. Yin, J. Shen, R. Chen, W. Li, R. Yang, P. Frossard, and W. Wang, “Is- fusion: Instance-scene collaborative fusion for multimodal 3d object detection,” inConference on Computer Vision and Pattern Recognition (CVPR), 2024

  57. [57]

    karl. - A Research Vehicle for Automated and Connected Driving,

    J.-P. Busch, L. Ostendorf, G. Linden, L. Reiher, T. Beemelmanns, B. Lampe, T. Woopen, and L. Eckstein, “karl. - A Research Vehicle for Automated and Connected Driving,” inIntelligent Vehicles Symposium (IV), 2026