Explainability Through Human-Centric Design for XAI in Lung Cancer Detection
Pith reviewed 2026-05-22 15:09 UTC · model grok-4.3
The pith
XpertXAI embeds expert clinical concepts into a bottleneck model to achieve higher accuracy and explanations that align with radiologist reasoning for lung cancer detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
XpertXAI is a generalizable expert-driven concept bottleneck model built on an InceptionV3 classifier that preserves human-interpretable clinical concepts while scaling from single-disease to multi-pathology detection on chest X-rays; when evaluated against post-hoc explainability techniques and the unsupervised XCBs baseline, it delivers superior predictive accuracy and concept-level explanations that align more closely with expert radiologist annotations and medical ground truth for lung cancer.
What carries the argument
The expert-guided concept bottleneck layer that converts raw image features into a small set of human-chosen clinical concepts before producing the final diagnosis, thereby enforcing interpretability while retaining accuracy.
If this is right
- Post-hoc explainability methods commonly omit key diagnostic features and conflict with radiologist judgments on chest X-rays.
- Concept-level outputs remain clinically meaningful even after the model is trained for multiple lung pathologies at the same time.
- Human-centric bottleneck design improves both accuracy and explanation quality compared with unsupervised concept models.
- The same architecture supplies a template for extending interpretable AI beyond lung cancer to wider diagnostic tasks.
Where Pith is reading between the lines
- The same expert-concept approach could transfer to other imaging modalities such as CT or MRI if suitable clinical concepts are identified.
- Hospitals might reduce reliance on separate explanation modules by training models with built-in concept layers from the start.
- If the chosen concepts prove robust, the method could shorten regulatory review for medical AI by making decision logic more transparent upfront.
Load-bearing premise
Expert-selected clinical concepts placed in the bottleneck layer accurately capture how radiologists reason about diagnoses and continue to work when the model expands from one lung condition to several at once.
What would settle it
A direct comparison in which radiologists review the same cases and select different diagnostic concepts than those hard-coded in the model, or a test showing that predictive accuracy falls when the model is forced to handle multiple pathologies without those concepts.
Figures
read the original abstract
Deep learning models have shown promise in lung pathology detection from chest X-rays, but widespread clinical adoption remains limited due to opaque model decision-making. In prior work, we introduced ClinicXAI, a human-centric, expert-guided concept bottleneck model (CBM) designed for interpretable lung cancer diagnosis. We now extend that approach and present XpertXAI, a generalizable expert-driven model that preserves human-interpretable clinical concepts while scaling to detect multiple lung pathologies. Using a high-performing InceptionV3-based classifier and a public dataset of chest X-rays with radiology reports, we compare XpertXAI against leading post-hoc explainability methods and an unsupervised CBM, XCBs. We assess explanations through comparison with expert radiologist annotations and medical ground truth. Although XpertXAI is trained for multiple pathologies, our expert validation focuses on lung cancer. We find that existing techniques frequently fail to produce clinically meaningful explanations, omitting key diagnostic features and disagreeing with radiologist judgments. XpertXAI not only outperforms these baselines in predictive accuracy but also delivers concept-level explanations that better align with expert reasoning. While our focus remains on explainability in lung cancer detection, this work illustrates how human-centric model design can be effectively extended to broader diagnostic contexts - offering a scalable path toward clinically meaningful explainable AI in medical diagnostics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces XpertXAI as an extension of the authors' prior ClinicXAI work: a human-centric concept bottleneck model that incorporates expert-selected clinical concepts to enable interpretable detection of multiple lung pathologies from chest X-rays. It claims that XpertXAI outperforms post-hoc explainability baselines and an unsupervised CBM (XCBs) in predictive accuracy while producing concept-level explanations that better align with radiologist annotations and medical ground truth, although the expert validation is restricted to lung cancer despite multi-pathology training.
Significance. If the quantitative results hold, the work demonstrates a viable path for scaling expert-guided concept bottleneck models to multi-disease settings in medical imaging, which could improve clinical trust and adoption of deep learning for chest X-ray analysis by preserving human-interpretable concepts.
major comments (1)
- [Abstract] Abstract: The central claim that XpertXAI provides a scalable, generalizable approach for multiple lung pathologies is load-bearing yet unsupported, because the manuscript states that expert validation and comparison against radiologist annotations focus solely on lung cancer. This leaves the behavior of the expert-selected concepts in the bottleneck layer untested under multi-label conditions where concept co-occurrence and diagnostic interactions differ from the single-disease case.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for identifying an important point about the scope of our claims. We address the major comment below and have revised the manuscript to ensure the claims accurately reflect the presented evidence.
read point-by-point responses
-
Referee: The central claim that XpertXAI provides a scalable, generalizable approach for multiple lung pathologies is load-bearing yet unsupported, because the manuscript states that expert validation and comparison against radiologist annotations focus solely on lung cancer. This leaves the behavior of the expert-selected concepts in the bottleneck layer untested under multi-label conditions where concept co-occurrence and diagnostic interactions differ from the single-disease case.
Authors: We thank the referee for this observation. The manuscript already states that expert validation is restricted to lung cancer, and we do not claim to have conducted radiologist annotation comparisons across all pathologies. XpertXAI is trained in a multi-label setting on a dataset containing multiple lung pathologies, with expert-selected concepts chosen for their relevance to the broader diagnostic task; predictive performance improvements are reported on this multi-pathology objective. We agree that direct testing of concept alignment and interactions under full multi-label conditions for additional diseases would provide stronger support for generalizability. To address this, we have revised the abstract, introduction, and discussion sections to qualify the generalizability statement as an illustration of the approach in a multi-pathology training regime, with detailed expert alignment demonstrated for lung cancer, while explicitly noting the limitation and outlining future multi-disease validation plans. These changes make the claims more precise without overstating the current results. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper extends prior ClinicXAI work by the same authors to a multi-pathology setting but grounds its central claims in new empirical results: training an InceptionV3-based CBM on a public chest X-ray dataset, comparing predictive accuracy against post-hoc methods and an unsupervised CBM baseline, and assessing concept-level explanations via direct comparison to independent expert radiologist annotations and medical ground truth. These steps rely on external data and annotations rather than reducing by construction to the prior paper's definitions or fitted parameters. The explicit note that expert validation focuses on lung cancer is a scope limitation, not a definitional loop. No load-bearing step equates a reported prediction or alignment result to its own inputs via self-citation or ansatz.
Axiom & Free-Parameter Ledger
free parameters (1)
- Expert-selected clinical concepts
axioms (1)
- domain assumption Expert radiologist annotations provide a reliable proxy for clinical ground truth when evaluating explanation quality.
Reference graph
Works this paper leans on
-
[1]
[Alshmraniet al., 2023 ] Goram Mufarah M Alshmrani, Qiang Ni, Richard Jiang, Haris Pervaiz, and Nada M Elshennawy. A deep learning architecture for multi-class lung diseases classification using chest x-ray (cxr) images. Alexandria Engineering Journal, 64:923–935,
work page 2023
-
[2]
Ivanov, Alexey Ko- rnaev, and Ivan Titov
[Alukaevet al., 2023 ] Danis Alukaev, Semen Kiselev, Ilya Pershin, Bulat Ibragimov, Vladimir V . Ivanov, Alexey Ko- rnaev, and Ivan Titov. Cross-modal conceptualization in bottleneck models. InThe 2023 Conference on Empirical Methods in Natural Language Processing,
work page 2023
-
[3]
[Baziet al., 2023 ] Y Bazi, MMA Rahhal, L Bashmal, and M. Zuair. Vision-language model for visual question an- swering in medical imagery. InBioengineering (Basel). 2023 Mar 20;10(3):380.,
work page 2023
-
[4]
[Bruneseet al., 2020 ] Luca Brunese, Francesco Mercaldo, Alfonso Reginelli, and Antonella Santone. Explain- able deep learning for pulmonary disease and coronavirus covid-19 detection from x-rays.Computer Methods and Programs in Biomedicine, 196:105608,
work page 2020
-
[5]
[de Vrieset al., 2023 ] Bart M. de Vries, Gerben J. C. Zwez- erijnen, George L. Burchell, Floris H. P. van Velden, Catharina Willemien Menke-van der Houven van Oordt, and Ronald Boellaard. Explainable artificial intelligence (xai) in radiology and nuclear medicine: a literature re- view.Frontiers in Medicine, 10,
work page 2023
-
[6]
[Goldbergeret al., 2000 ] A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, ..., and H. E Stanley. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. InCirculation [Online],
work page 2000
-
[7]
[Haghanifaret al., 2020 ] Arman Haghanifar, Mahdiyar Mo- lahasani Majdabadi, Younhee Choi, S. Deivalakshmi, and Seokbum Ko. Covid-cxnet: Detecting covid-19 in frontal chest x-ray images using deep learning,
work page 2020
-
[8]
Artificial intelligence in radiology.Nature Reviews Cancer, 18(8):500–510,
[Hosnyet al., 2018 ] Ahmed Hosny, Chintan Parmar, John Quackenbush, Lawrence H Schwartz, and Hugo JWL Aerts. Artificial intelligence in radiology.Nature Reviews Cancer, 18(8):500–510,
work page 2018
-
[9]
[Irvinet al., 2019 ] Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpan- skaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, and An- drew Y . Ng. Chexpe...
work page 2019
-
[10]
[Kimet al., 2022 ] D. Kim, J. Chung, J Choi, and et al. Ac- curate auto-labeling of chest x-ray images based on quan- titative similarity to an explainable ai model.,
work page 2022
-
[11]
[Kimet al., 2023 ] Injae Kim, Jongha Kim, Joonmyung Choi, and Hyunwoo J. Kim. Concept bottleneck with vi- sual concept filtering for explainable medical image classi- fication. In M. Emre Celebi, Md Sirajus Salekin, Hyunwoo Kim, Shadi Albarqouni, Catarina Barata, Allan Halpern, Philipp Tschandl, Marc Combalia, Yuan Liu, Ghada Za- mzmi, Joshua Levy, Huzefa...
work page 2023
-
[12]
Springer Nature Switzerland. [Kohet al., 2020 ] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models,
work page 2020
-
[13]
Addressing the curse of imbalanced training sets: One-sided selection
[Kub´at and Matwin, 1997] Miroslav Kub ´at and Stan Matwin. Addressing the curse of imbalanced training sets: One-sided selection. InInternational Conference on Machine Learning,
work page 1997
-
[14]
[Lapuschkinet al., 2015 ] Sebastian Lapuschkin, Alexander Binder, Gr´egoire Montavon, Frederick Klauschen, Klaus- Robert M ¨uller, and Wojciech Samek. On pixel-wise ex- planations for non-linear classifier decisions by layer-wise relevance propagation.PLoS ONE, 10:e0130140, 07
work page 2015
-
[15]
Cxr-llava: a multimodal large language model for interpreting chest x-ray images,
[Leeet al., 2024 ] Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, and Soon Ho Yoon. Cxr-llava: a multimodal large language model for interpreting chest x-ray images,
work page 2024
-
[16]
Challenges in explaining brain tumor detec- tion
[Legasteloiset al., 2023 ] Benedicte Legastelois, Amy Raf- ferty, Paul Brennan, Hana Chockler, Ajitha Rajan, and Vaishak Belle. Challenges in explaining brain tumor detec- tion. InProceedings of the First International Symposium on Trustworthy Autonomous Systems, TAS ’23, New York, NY , USA,
work page 2023
-
[17]
Association for Computing Machinery. [Liet al., 2023 ] Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. Llava- med: Training a large language-and-vision assistant for biomedicine in one day,
work page 2023
-
[18]
[Luet al., 2023 ] Ming Y . Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Kenji Ikamura, Georg Ger- ber, Ivy Liang, Long Phi Le, Tong Ding, Anil V Parwani, and Faisal Mahmood. A foundational multimodal vision language ai assistant for human pathology,
work page 2023
-
[19]
[Lundberg and Lee, 2017] SM. Lundberg and S. Lee. A uni- fied approach to interpreting model predictions. InPro- ceedings of the 31st international conference on neural in- formation processing systems, pages 4768 – 4777,
work page 2017
-
[20]
Lung cancer detection and classification model using inception v3 algorithm
[Meenaet al., 2023 ] Sitaram Meena, Amod Kumar, Meenakshi Sood, and Rajesh Kumar Meena. Lung cancer detection and classification model using inception v3 algorithm. InInternational Conference on Data Analytics & Management, pages 423–433. Springer,
work page 2023
-
[21]
[Nguyenet al., 2020 ] Ha Q. Nguyen, Khanh Lam, Linh T. Le, Hieu H. Pham, Dat Q. Tran, Dung B. Nguyen, Dung D. Le, Chi M. Pham, Hang T. T. Tong, Diep H. Dinh, Cuong D. Do, Luu T. Doan, Cuong N. Nguyen, Binh T. Nguyen, Que V . Nguyen, Au D. Hoang, Hien N. Phan, Anh T. Nguyen, Phuong H. Ho, Dat T. Ngo, Nghia T. Nguyen, Nhan T. Nguyen, Minh Dao, and Van Vu. V...
work page 2020
-
[22]
Exploring the chestxray14 dataset: problems
[Oakden-Rayner, 2017] Lauren Oakden-Rayner. Exploring the chestxray14 dataset: problems. LaurenOakdenRayner,
work page 2017
-
[23]
Label-free concept bottleneck models
[Oikarinenet al., 2023 ] Tuomas Oikarinen, Subhro Das, Lam M Nguyen, and Tsui-Wei Weng. Label-free concept bottleneck models. InInternational Conference on Learn- ing Representations,
work page 2023
- [24]
-
[25]
[PhysioNet, 2019] PhysioNet. Mimic-cxr dataset. https:// physionet.org/content/mimic-cxr/2.0.0/,
work page 2019
-
[26]
[Pitrodaet al., 2021 ] Vidhi Pitroda, Mostafa M
Accessed: 28.02.2024. [Pitrodaet al., 2021 ] Vidhi Pitroda, Mostafa M. Fouda, and Zubair Md Fadlullah. An explainable ai model for inter- pretable lung disease classification. In2021 IEEE Interna- tional Conference on Internet of Things and Intelligence Systems (IoTaIS), pages 98–103,
work page 2024
-
[27]
Improving language understanding by generative pre-training
[Radfordet al., 2018 ] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training
work page 2018
-
[28]
Learning transferable visual models from natural language supervi- sion,
[Radfordet al., 2021 ] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervi- sion,
work page 2021
-
[29]
Explainable artificial intelligence for breast tumour classification: Helpful or harmful
[Raffertyet al., 2022 ] Amy Rafferty, Rudolf Nenutil, and Ajitha Rajan. Explainable artificial intelligence for breast tumour classification: Helpful or harmful. InInterpretabil- ity of Machine Intelligence in Medical Image Computing, pages 104–123, Cham,
work page 2022
-
[30]
[Raffertyet al., 2025 ] Amy Rafferty, Rishi Ramaesh, and Ajitha Rajan
Springer Nature Switzer- land. [Raffertyet al., 2025 ] Amy Rafferty, Rishi Ramaesh, and Ajitha Rajan. Leveraging expert input for robust and ex- plainable ai-assisted lung cancer detection in chest x-rays,
work page 2025
-
[31]
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
[Ribeiroet al., 2016 ] M. Ribeiro, S. Singh, and C. Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. InarXiv, 1602.04938v3,
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[32]
Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra
[Selvarajuet al., 2019 ] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization.Inter- national Journal of Computer Vision, 128(2):336–359, oct
work page 2019
-
[33]
Covid-19 diagnosis from chest x- ray images using convolutional neural network (cnn) and inceptionv3
[Shadinet al., 2021 ] Nazmus Shakib Shadin, Silvia Sanjana, and Nusrat Jahan Lisa. Covid-19 diagnosis from chest x- ray images using convolutional neural network (cnn) and inceptionv3. In2021 International Conference on Infor- mation Technology (ICIT), pages 799–804. IEEE,
work page 2021
- [34]
-
[35]
Rethinking the inception architecture for computer vision,
[Szegedyet al., 2015 ] Christian Szegedy, Vincent Van- houcke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision,
work page 2015
-
[36]
[Tjoa and Guan, 2021] Erico Tjoa and Cuntai Guan. A sur- vey on explainable artificial intelligence (xai): Toward medical xai.IEEE Transactions on Neural Networks and Learning Systems, 32(11):4793–4813,
work page 2021
-
[37]
[Vermeireet al., 2022 ] T. Vermeire, D. Brughmans, S. Goethals, and et al. Explainable image classifica- tion with evidence counterfactual. InPattern Anal Applic 25, 315–335,
work page 2022
-
[38]
[Wadden, 2022] Jordan Joseph Wadden. Defining the unde- finable: the black box problem in healthcare artificial intel- ligence.Journal of Medical Ethics, 48(10):764–768,
work page 2022
-
[39]
[Yanget al., 2023 ] Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel Jin, Chris Callison-Burch, and Mark Yatskar. Language in a bottle: Language model guided concept bottlenecks for interpretable image clas- sification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19187–19197,
work page 2023
-
[40]
Post-hoc concept bottleneck models
[Yuksekgonulet al., 2023 ] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc concept bottleneck models. InThe Eleventh International Conference on Learning Representations,
work page 2023
-
[41]
Classification of lung diseases using deep learning mod- els
[Zak and Krzy˙zak, 2020] Matthew Zak and Adam Krzy ˙zak. Classification of lung diseases using deep learning mod- els. InInternational Conference on Computational Sci- ence, pages 621–634. Springer, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.