pith. machine review for the scientific record. sign in

arxiv: 2605.00291 · v1 · submitted 2026-04-30 · 💻 cs.CV · cs.RO

Recognition: unknown

An End-to-End Decision-Aware Multi-Scale Attention-Based Model for Explainable Autonomous Driving

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:39 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords explainable AIautonomous drivingmulti-scale attentiondecision-aware modelsBDD-OIA datasetnu-AR datasetXAI evaluationJoint F1 metric
0
0 comments X

The pith

A multi-scale attention model that feeds predicted driving decisions back into its reasoning component produces case-specific explanations for autonomous vehicle actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an end-to-end neural network for explainable autonomous driving that combines multi-scale visual attention with direct use of the model's own decision output. By routing the chosen action into the explanation pathway, the architecture generates explanations tied to each specific decision rather than generic scene descriptions. Evaluation uses the standard F1 score plus a newly proposed Joint F1 metric that measures how well explanations align with the actual decision. The model is tested on the BDD-OIA and nu-AR datasets and reported to outperform both classic attention baselines and recent state-of-the-art explainers.

Core claim

The central claim is that an end-to-end decision-aware multi-scale attention network can simultaneously predict driving actions and generate reliable, case-specific textual explanations by conditioning the reasoning module on the predicted decision itself, achieving higher scores on both F1 and the new Joint F1 metric than prior models on the BDD-OIA and nu-AR benchmarks.

What carries the argument

The decision-aware multi-scale attention block, which extracts features at multiple image scales and conditions the explanation generator directly on the model's predicted driving action.

Load-bearing premise

Routing the model's own driving decision into the explanation module produces genuinely better and more reliable explanations rather than explanations that are simply forced to match the decision by construction.

What would settle it

A controlled ablation that replaces the true driving decision with a random or mismatched decision before feeding it into the explanation module; if the Joint F1 score remains essentially unchanged, the decision input would not be causally contributing to explanation quality.

Figures

Figures reproduced from arXiv: 2605.00291 by Amir Abbas Hamidi Imani, Maryam Sadat Hosseini Azad, Randy Goebel, Shahin Atakishiyev, Shahriar Baradaran Shokouhi.

Figure 1
Figure 1. Figure 1: Overview of the problem and its solution investigated in this study. The problem is to understand the driving decision of the ego-vehicle from a dashboard camera image and explain why that decision was made by this black-box model. The output includes the predicted next decision of the vehicle, and textual along with visual reasons behind the prediction. The decision confidence scores are shown as numbers … view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of our proposed model. The structure consists of three main parts. The first part is a feature extractor using semantic segmentation, ResNet50, and a multi-scale feature extraction module for processing input RGB images. The results are then fed to the attention module, which has both channel and spatial attention with connections between the attention blocks. In the third part, the featur… view at source ↗
read the original abstract

The application of computer vision is gradually increasing across various domains. They employ deep learning models with a black-box nature. Without the ability to explain the behavior of neural networks, especially their decision-making processes, it is not possible to recognize their efficiency, predict system failures, or effectively implement them in real-world applications. Due to the inevitable use of deep learning in fully automated driving systems, many methods have been proposed to explain their behavior; however, they suffer from flawed reasoning and unreliable metrics, which have prevented a comprehensive understanding of complex models in autonomous vehicles and hindered the development of truly reliable systems. In this study, we propose a multi-scale attention-based model in which driving decisions are fed into the reasoning component to provide case-specific explanations for each decision simultaneously. For quantitative evaluation of our model's performance, we employ the F1-score metric, and also proposed a new metric called the Joint F1 score to demonstrate the accurate and reliable performance of the model in terms of Explainable Artificial Intelligence (XAI). In addition to the BDD-OIA dataset, the nu-AR dataset is utilized to further validate the generalization capability and robustness of the proposed network. The results demonstrate the superiority of our reasoning network over the classic and state-of-the-art models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes an end-to-end multi-scale attention-based model for explainable autonomous driving. Driving decisions are fed directly into the reasoning component to generate case-specific explanations simultaneously with the decisions. Quantitative evaluation relies on the standard F1-score plus a newly introduced Joint F1 score; the model is tested on the BDD-OIA and nu-AR datasets and claimed to outperform classic and state-of-the-art baselines.

Significance. If the Joint F1 metric can be shown to be non-circular and if ablations confirm that decision conditioning improves explanation fidelity rather than merely enforcing consistency, the approach could strengthen decision-explanation coupling in safety-critical vision systems. The use of two distinct datasets for generalization testing is a positive element.

major comments (3)
  1. [Abstract] Abstract: the superiority claim rests on F1 and a proposed Joint F1 score, yet neither the formula nor the computation of the Joint F1 score is supplied anywhere in the manuscript, rendering the central quantitative argument unverifiable.
  2. [Method] Method (reasoning component description): no ablation isolates the effect of feeding the driving decision token into the multi-scale attention module versus an otherwise identical architecture without that input; without this, it is impossible to distinguish genuine explanatory power from consistency-by-construction.
  3. [Experiments] Experiments: the manuscript asserts quantitative superiority but supplies no tables of per-class F1 scores, no Joint F1 values, no baseline comparisons, and no statistical significance tests, so the empirical support for the claims cannot be assessed.
minor comments (1)
  1. [Method] Notation for the multi-scale attention blocks is introduced without an accompanying diagram or explicit tensor dimensions, making the architecture description difficult to follow precisely.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions that will be incorporated to improve the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the superiority claim rests on F1 and a proposed Joint F1 score, yet neither the formula nor the computation of the Joint F1 score is supplied anywhere in the manuscript, rendering the central quantitative argument unverifiable.

    Authors: We acknowledge the omission of an explicit formula. The Joint F1 metric is introduced in the Experiments section to jointly assess decision correctness and explanation alignment using independent ground-truth labels, but we will add a dedicated subsection with the full mathematical definition, computation steps, and pseudocode in the revised manuscript to ensure the metric is fully verifiable and non-circular. revision: yes

  2. Referee: [Method] Method (reasoning component description): no ablation isolates the effect of feeding the driving decision token into the multi-scale attention module versus an otherwise identical architecture without that input; without this, it is impossible to distinguish genuine explanatory power from consistency-by-construction.

    Authors: We agree that an explicit ablation is required. The revised manuscript will include a new ablation study that removes the driving decision token from the multi-scale attention module while keeping the rest of the architecture identical. Results will quantify the improvement in explanation quality attributable to decision conditioning. revision: yes

  3. Referee: [Experiments] Experiments: the manuscript asserts quantitative superiority but supplies no tables of per-class F1 scores, no Joint F1 values, no baseline comparisons, and no statistical significance tests, so the empirical support for the claims cannot be assessed.

    Authors: We apologize for the insufficient presentation of results. The revised version will add complete tables with per-class F1 scores, Joint F1 values for the proposed model and all baselines on both BDD-OIA and nu-AR, direct numerical comparisons, and statistical significance tests (paired t-tests with p-values) to substantiate all superiority claims. revision: yes

Circularity Check

2 steps flagged

Decision-conditioned explanations and custom Joint F1 metric risk consistency-by-construction

specific steps
  1. self definitional [Abstract]
    "we propose a multi-scale attention-based model in which driving decisions are fed into the reasoning component to provide case-specific explanations for each decision simultaneously"

    The explanations are produced by directly conditioning the reasoning component on the driving decision; therefore any alignment or 'case-specific' property between explanation and decision is guaranteed by the architecture rather than emerging from independent visual reasoning.

  2. fitted input called prediction [Abstract (evaluation)]
    "For quantitative evaluation of our model's performance, we employ the F1-score metric, and also proposed a new metric called the Joint F1 score to demonstrate the accurate and reliable performance of the model in terms of Explainable Artificial Intelligence (XAI)"

    A custom Joint F1 metric is introduced by the authors to validate their decision-aware model. Because the metric is new and its formula is not shown, it may be constructed around joint decision-explanation consistency, turning the reported XAI superiority into a direct consequence of the conditioning rather than external evidence.

full rationale

The paper's central architecture feeds the driving decision as input to the explanation/reasoning module, so case-specific explanations are aligned with the decision by design rather than derived independently from visual features. The authors then introduce a new Joint F1 metric (in addition to standard F1) specifically to quantify the model's XAI performance; absent an explicit formula or ablation isolating the decision token, this metric may be defined to reward the joint conditioning, making reported superiority reduce to the input choice. No self-citation chains, uniqueness theorems, or ansatzes are present, and standard F1 is also reported, so the circularity is partial rather than total.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unstated assumption that attention mechanisms plus decision feedback produce faithful explanations and that the newly introduced Joint F1 score is an unbiased evaluator. No free parameters are enumerated in the abstract; the new metric itself functions as an invented evaluation entity.

axioms (1)
  • domain assumption Attention-based models can produce human-interpretable explanations when conditioned on the model's own decision output.
    Invoked by the proposal to feed decisions into the reasoning component.
invented entities (1)
  • Joint F1 score no independent evidence
    purpose: Quantitative metric that jointly evaluates driving decision accuracy and explanation quality.
    Newly proposed metric whose definition is not supplied in the abstract and whose independence from the model architecture is unverified.

pith-pipeline@v0.9.0 · 5551 in / 1365 out tokens · 43316 ms · 2026-05-09T19:39:55.088775+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    To explain or not to explain: A study on the necessity of explanations for autonomous vehicles,

    Y. Shen, S. Jiang, Y. Chen, and K. D. Campbell, "To explain or not to explain: A study on the necessity of explanations for autonomous vehicles," arXiv preprint arXiv:2006.11684, 2020

  2. [2]

    Effects of Explanation Specificity on Passengers in Autonomous Driving,

    D. Omeiza, R. Bhattacharyya, N. Hawes, M. Jirotka, and L. Kunze, "Effects of Explanation Specificity on Passengers in Autonomous Driving," arXiv preprint arXiv:2307.00633, 2023

  3. [3]

    Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review,

    A. Kuznietsov, B. Gyevnar, C. Wang, S. Peters, and S. V. Albrecht, "Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review," IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 12, pp. 19342 -19364, 2024, doi: 10.1109/TITS.2024.3474469

  4. [4]

    A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts,

    G. Schwalbe and B. Finzel, "A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts," Data Mining and Knowledge Discovery, vol. 38, no. 5, pp. 3043-3101, 2024

  5. [5]

    A unified approach to interpreting model predictions,

    S. M. Lundberg and S. -I. Lee, "A unified approach to interpreting model predictions," Advances in neural information processing systems, vol. 30, 2017

  6. [6]

    Textual explanations for self-driving vehicles,

    J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, "Textual explanations for self-driving vehicles," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 563-578

  7. [7]

    GRIT: Fast, interpretable, and verifiable goal recognition with learned decision trees for autonomous driving,

    C. Brewitt, B. Gyevnar, S. Garcin, and S. V. Albrecht, "GRIT: Fast, interpretable, and verifiable goal recognition with learned decision trees for autonomous driving," in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021: IEEE, pp. 1023-1030

  8. [8]

    From spoken thoughts to automated driving commentary: Predicting and explaining intelligent vehicles’ actions,

    D. Omeiza, S. Anjomshoae, H. Webb, M. Jirotka, and L. Kunze, "From spoken thoughts to automated driving commentary: Predicting and explaining intelligent vehicles’ actions," in 2022 IEEE Intelligent Vehicles Symposium (IV), 2022: IEEE, pp. 1040-1047

  9. [9]

    Recent advancements in end -to-end autonomous driving using deep learning: A survey,

    P. S. Chib and P. Singh, "Recent advancements in end -to-end autonomous driving using deep learning: A survey," IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 103-118, 2023

  10. [10]

    Computer vision for autonomous vehicles: Problems, datasets and state of the art,

    J. Janai, F. Güney, A. Behl, and A. Geiger, "Computer vision for autonomous vehicles: Problems, datasets and state of the art," Foundations and Trends® in Computer Graphics and Vision, vol. 12, no. 1–3, pp. 1-308, 2020

  11. [11]

    Autonomous vehicle localization without prior high-definition map,

    S. Lee and J. -H. Ryu, "Autonomous vehicle localization without prior high-definition map," IEEE Transactions on Robotics, vol. 40, pp. 2888-2906, 2024

  12. [12]

    Fault Detection and Data -driven Optimal Adaptive Fault-tolerant Control for Autonomous Driving using Learning-based SMPC,

    Y. Lu, G. Li, Y. Yue, and Z. Wang, "Fault Detection and Data -driven Optimal Adaptive Fault-tolerant Control for Autonomous Driving using Learning-based SMPC," IEEE Transactions on Intelligent Vehicles, 2024

  13. [13]

    Motion planning for autonomous driving: The state of the art and future perspectives,

    S. Teng, X. Hu, P. Deng, B. Li, Y. Li, Y. Ai, D. Yang, L. Li, Z. Xuanyuan, and F. Zhu, "Motion planning for autonomous driving: The state of the art and future perspectives," IEEE Transactions on Intelligent Vehicles, vol. 8, no. 6, pp. 3692-3711, 2023

  14. [14]

    A survey of lateral stability criterion and control application for autonomous vehicles,

    Z. Zhu, X. Tang, Y. Qin, Y. Huang, and E. Hashemi, "A survey of lateral stability criterion and control application for autonomous vehicles," IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 10, pp. 10382-10399, 2023

  15. [15]

    Explanations in autonomous driving: A survey,

    D. Omeiza, H. Webb, M. Jirotka, and L. Kunze, "Explanations in autonomous driving: A survey," IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 10142-10162, 2021

  16. [16]

    Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems,

    J. Dong, S. Chen, M. Miralinaghi, T. Chen, P. Li, and S. Labi, "Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems," Transportation research part C: emerging technologies, vol. 156, p. 104358, 2023

  17. [17]

    Meaningful explanations of black box AI decision systems,

    D. Pedreschi, F. Giannotti, R. Guidotti, A. Monreale, S. Ruggieri, and F. Turini, "Meaningful explanations of black box AI decision systems," in Proceedings of the AAAI conference on artificial intelligence , 2019, vol. 33, no. 01, pp. 9780-9784

  18. [18]

    A survey on the explainability of supervised machine learning,

    N. Burkart and M. F. Huber, "A survey on the explainability of supervised machine learning," Journal of Artificial Intelligence Research, vol. 70, pp. 245-317, 2021

  19. [19]

    Data science principles for interpretable and explainable AI,

    K. Sankaran, "Data science principles for interpretable and explainable AI," arXiv preprint arXiv:2405.10552, 2024

  20. [20]

    Explainable object -induced action decision for autonomous vehicles,

    Y. Xu, X. Yang, L. Gong, H. -C. Lin, T. -Y. Wu, Y. Li, and N. Vasconcelos, "Explainable object -induced action decision for autonomous vehicles," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9523-9532. 11

  21. [21]

    Multimodal-XAD: Explainable Autonomous Driving Based on Multimodal Environment Descriptions,

    Y. Feng, Z. Feng, W. Hua, and Y. Sun, "Multimodal-XAD: Explainable Autonomous Driving Based on Multimodal Environment Descriptions," IEEE Transactions on Intelligent Transportation Systems, 2024

  22. [22]

    Discovering the rationale of decisions: towards a method for aligning learning and reasoning,

    S. Cor, R. Silja, and V. Bart, "Discovering the rationale of decisions: towards a method for aligning learning and reasoning," in Proceedings of the 18th international conference on artificial intelligence and law , 2021, pp. 235-239

  23. [23]

    How should AI decisions be explained? Requirements for Explanations from the Perspective of European Law,

    B. Fresz, E. Dubovitskaya, D. Brajovic, M. F. Huber, and C. Horz, "How should AI decisions be explained? Requirements for Explanations from the Perspective of European Law," in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , 2024, vol. 7, pp. 438-450

  24. [24]

    " Why should i trust you?

    M. T. Ribeiro, S. Singh, and C. Guestrin, "" Why should i trust you?" Explaining the predictions of any classifier," in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135-1144

  25. [25]

    Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions,

    S. Atakishiyev, M. Salameh, H. Yao, and R. Goebel, "Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions," IEEE Access, 2024

  26. [26]

    Interpretable Multi - Task Prediction Neural Network for Autonomous Vehicles,

    Q. Wang, H. Hu, B. Yang, L. Song, and C. Lv, "Interpretable Multi - Task Prediction Neural Network for Autonomous Vehicles," IEEE Transactions on Intelligent Transportation Systems, 2025

  27. [27]

    Hierarchical interpretable imitation learning for end -to-end autonomous driving,

    S. Teng, L. Chen, Y. Ai, Y. Zhou, Z. Xuanyuan, and X. Hu, "Hierarchical interpretable imitation learning for end -to-end autonomous driving," IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 673-683, 2022

  28. [28]

    Toward Explainable End-to-End Driving Models via Simplified Objectification Constraints,

    C. Zhang, D. Deguchi, J. Chen, and H. Murase, "Toward Explainable End-to-End Driving Models via Simplified Objectification Constraints," IEEE Transactions on Intelligent Transportation Systems, 2024

  29. [29]

    Interpretable end -to-end urban autonomous driving with latent deep reinforcement learning,

    J. Chen, S. E. Li, and M. Tomizuka, "Interpretable end -to-end urban autonomous driving with latent deep reinforcement learning," IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 5068-5078, 2021

  30. [30]

    Explaining autonomous driving actions with visual question answering,

    S. Atakishiyev, M. Salameh, H. Babiker, and R. Goebel, "Explaining autonomous driving actions with visual question answering," in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), 2023: IEEE, pp. 1207-1214

  31. [31]

    Textual explanations for automated commentary driving,

    M. A. Kühn, D. Omeiza, and L. Kunze, "Textual explanations for automated commentary driving," in 2023 IEEE Intelligent Vehicles Symposium (IV), 2023: IEEE, pp. 1-6

  32. [32]

    Octet: Object -aware counterfactual explanations,

    M. Zemni, M. Chen, É. Zablocki, H. Ben -Younes, P. Pérez, and M. Cord, "Octet: Object -aware counterfactual explanations," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 15062-15071

  33. [33]

    Enhancing Interpretability of Autonomous Driving Via Human -Like Cognitive Maps: A Case Study on Lane Change,

    H. Lu, Y. Liu, M. Zhu, C. Lu, H. Yang, and Y. Wang, "Enhancing Interpretability of Autonomous Driving Via Human -Like Cognitive Maps: A Case Study on Lane Change," IEEE Transactions on Intelligent Vehicles, 2024

  34. [34]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778

  35. [35]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation. arXiv 2017," arXiv preprint arXiv:1706.05587, vol. 2, p. 1, 2019

  36. [36]

    Bdd100k: A diverse driving dataset for heterogeneous multitask learning,

    F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, "Bdd100k: A diverse driving dataset for heterogeneous multitask learning," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636-2645

  37. [37]

    Still image action recognition based on interactions between joints and objects,

    S. S. Ashrafi, S. B. Shokouhi, and A. Ayatollahi, "Still image action recognition based on interactions between joints and objects," Multimedia Tools and Applications, vol. 82, no. 17, pp. 25945 -25971, 2023

  38. [38]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834 -848, 2017

  39. [39]

    Cbam: Convolutional block attention module,

    S. Woo, J. Park, J. -Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19

  40. [40]

    DCANet: Learning connected attentions for convolutional neural networks,

    X. Ma, J. Guo, S. Tang, Z. Qiao, Q. Chen, Q. Yang, and S. Fu, "DCANet: Learning connected attentions for convolutional neural networks," arXiv preprint arXiv:2007.05099, 2020

  41. [41]

    Grad -cam: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad -cam: Visual explanations from deep networks via gradient-based localization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618-626

  42. [42]

    Sanity checks for saliency maps,

    J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, "Sanity checks for saliency maps," Advances in neural information processing systems, vol. 31, 2018

  43. [43]

    Do explanations explain? Model knows best,

    A. Khakzar, P. Khorsandi, R. Nobahari, and N. Navab, "Do explanations explain? Model knows best," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 10244-10253

  44. [44]

    An overview of multi -task learning,

    Y. Zhang and Q. Yang, "An overview of multi -task learning," National Science Review, vol. 5, no. 1, pp. 30-43, 2018

  45. [45]

    Nle -dm: Natural-language explanations for decision making of autonomous driving based on semantic scene understanding,

    Y. Feng, W. Hua, and Y. Sun, "Nle -dm: Natural-language explanations for decision making of autonomous driving based on semantic scene understanding," IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 9, pp. 9780-9791, 2023

  46. [46]

    nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, "nuscenes: A multimodal dataset for autonomous driving," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 11621-11631

  47. [47]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255

  48. [48]

    Deep object - centric policies for autonomous driving,

    D. Wang, C. Devin, Q. -Z. Cai, F. Yu, and T. Darrell, "Deep object - centric policies for autonomous driving," in 2019 International Conference on Robotics and Automation (ICRA), 2019: IEEE, pp. 8853- 8859

  49. [49]

    Concept bottleneck models,

    P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang, "Concept bottleneck models," in International conference on machine learning, 2020: PMLR, pp. 5338-5348

  50. [50]

    Concept bottleneck model with additional unsupervised concepts,

    Y. Sawada and K. Nakamura, "Concept bottleneck model with additional unsupervised concepts," IEEE Access, vol. 10, pp. 41758 - 41765, 2022

  51. [51]

    Attention - based interrelation modeling for explainable automated driving,

    Z. Zhang, R. Tian, R. Sherony, J. Domeyer, and Z. Ding, "Attention - based interrelation modeling for explainable automated driving," IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1564-1573, 2022

  52. [52]

    Leveraging driver attention for an end -to-end explainable decision-making from frontal images,

    J. Araluce, L. M. Bergasa, M. Ocaña, Á. Llamazares, and E. López - Guillén, "Leveraging driver attention for an end -to-end explainable decision-making from frontal images," IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 8, pp. 10091 -10102, 2024

  53. [53]

    XAI for Transparent Autonomous Vehicles: A New Approach to Understanding Decision-Making in Self -Driving Cars,

    M. S. H. Azad, A. A. H. Imani, and S. B. Shokouhi, "XAI for Transparent Autonomous Vehicles: A New Approach to Understanding Decision-Making in Self -Driving Cars," in 2024 14th International Conference on Computer and Knowledge Engineering (ICCKE) , 2024: IEEE, pp. 194-199. Maryam Sadat Hosseini Azad received her B.Sc. degree in Electrical and Electronic...