pith. sign in

arxiv: 2312.05975 · v3 · pith:GJ4YWULXnew · submitted 2023-12-10 · 💻 cs.CV · cs.AI· cs.LG

FM-G-CAM: A Holistic Approach for Explainable AI in Computer Vision

Pith reviewed 2026-05-24 04:38 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords explainable AIGrad-CAMCNNsaliency mapsmulti-class explanationcomputer visionmodel interpretabilityactivation maps
0
0 comments X

The pith

FM-G-CAM creates explanations for CNN predictions by fusing saliency maps from the top predicted classes instead of using only one target class.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard explanation tools for convolutional neural networks rely on Grad-CAM and examine only a single chosen class, which leaves out most of the information the model uses when it produces its output. The paper introduces FM-G-CAM to compute and combine activation maps across the model's top predictions, yielding one map that reflects the full decision process. This matters because many image classification tasks involve several plausible categories whose contributions are currently hidden from users. The authors supply a mathematical description, side-by-side comparisons, and an open-source library so the method can be applied directly to existing CNNs.

Core claim

Existing methods for explaining CNN predictions are largely based on Gradient-weighted Class Activation Maps (Grad-CAM) and focus solely on a single target class; this assumption about the target class selection neglects a large portion of the predictor CNN's prediction process. FM-G-CAM considers multiple top-predicted classes and provides a holistic explanation of the predictor CNN's rationale by fusing their individual saliency information.

What carries the argument

Fused Multi-class Gradient-weighted Class Activation Map (FM-G-CAM), which aggregates saliency maps computed for the top-k predicted classes to produce a single explanation.

If this is right

  • Explanations now incorporate evidence from several competing classes rather than isolating one.
  • Quantitative and qualitative comparisons demonstrate clearer benefits in practical image-classification scenarios.
  • An open-source Python library allows direct generation of the fused maps for any CNN model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The fusion step could surface shared visual features that multiple top classes rely on, something single-class maps cannot show.
  • The same multi-class idea might be applied to other gradient-based explanation techniques beyond Grad-CAM.
  • In safety-critical settings the method could flag cases where the model's top predictions rest on overlapping but distinct image regions.

Load-bearing premise

Fusing saliency information from only the top-predicted classes is sufficient to capture the full rationale of the CNN's prediction process.

What would settle it

A controlled test in which adding saliency maps from classes ranked below the top k produces a visibly different fused map that better matches human judgments of the model's actual reasoning.

Figures

Figures reproduced from arXiv: 2312.05975 by Jordan J. Bird, Ravidu Suien Rammuni Silva.

Figure 1
Figure 1. Figure 1: FM-G-CAM for general image classification tasks ag [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The process of generating FM-G-CAM. 3.3 Choosing an optimal value for (K) Unlike other saliency map generation methods, FM-G-CAM utilises multiple classes to generate the final saliency map. We recommend choosing the top predicted classes to be used. The top 3 or top 5 classes are recommended to be used with FM-G-CAM even though the algorithm allows any number of arbitrarily chosen classes to be used in th… view at source ↗
Figure 3
Figure 3. Figure 3: Effect of L2 Norm for saliency map generation in FM- [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the XAI Inference Engine [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the XAI Inference Engine. Column 2 show [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Results of the quantitative evaluation tests. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FM-G-CAM for general image classification tasks in [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FM-G-CAM for general image classification tasks in [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Explainability is a vital aspect of modern AI for real-world impact and usability. The main objective of this paper is to emphasise the need to understand the predictions of Computer Vision models, specifically Convolutional Neural Network (CNN) models. Existing methods for explaining CNN predictions are largely based on Gradient-weighted Class Activation Maps (Grad-CAM) and focus solely on a single target class; this assumption about the target class selection neglects a large portion of the predictor CNN's prediction process. In this paper, we present an exhaustive methodology, called Fused Multi-class Gradient-weighted Class Activation Map (FM-G-CAM), that considers multiple top-predicted classes and provides a holistic explanation of the predictor CNN's rationale. We also provide a detailed mathematical and algorithmic description of our method. Furthermore, alongside a concise comparison of existing methods, we compare FM-G-CAM with Grad-CAM, quantitatively and qualitatively highlighting its benefits through real-world practical use cases. Finally, we present an open-source Python library with an FM-G-CAM implementation to conveniently generate saliency maps for CNN-based model predictions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that single-class Grad-CAM methods neglect a large portion of a CNN's prediction process by focusing only on one target class. It introduces FM-G-CAM, which fuses saliency maps from multiple top-predicted classes to deliver a holistic explanation of the model's rationale, accompanied by a mathematical and algorithmic description, quantitative/qualitative comparisons to Grad-CAM on real-world cases, and an open-source Python library implementation.

Significance. If the central claim is substantiated, the work could advance XAI in computer vision by addressing the incompleteness of single-class saliency maps, potentially improving model debugging and user trust. The open-source library is a clear strength for reproducibility and adoption.

major comments (1)
  1. [Abstract] Abstract and the motivation section: the claim that fusing saliency information from only the top-predicted classes yields a 'holistic' explanation of the CNN's full rationale is load-bearing but unsupported. No ablation, analysis, or argument is provided showing that classes outside the top-k contribute negligibly (as opposed to lower-ranked classes, internal activations, or alternative selection/fusion strategies), leaving the 'holistic' descriptor as an assertion rather than a demonstrated necessity or sufficiency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. The single major comment is addressed point-by-point below. We agree that the 'holistic' framing requires qualification and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract and the motivation section: the claim that fusing saliency information from only the top-predicted classes yields a 'holistic' explanation of the CNN's full rationale is load-bearing but unsupported. No ablation, analysis, or argument is provided showing that classes outside the top-k contribute negligibly (as opposed to lower-ranked classes, internal activations, or alternative selection/fusion strategies), leaving the 'holistic' descriptor as an assertion rather than a demonstrated necessity or sufficiency.

    Authors: We agree that the manuscript does not provide ablations or formal arguments demonstrating that classes outside the top-k contribute negligibly, nor does it compare against alternative selection or fusion strategies. The motivation section instead rests on the observation that single-class Grad-CAM omits contributions from other high-probability classes that participate in the model's output. The term 'holistic' was intended to contrast with single-class methods rather than to assert completeness over all possible classes or internal activations. We will revise the abstract and motivation section to remove or qualify the 'holistic' descriptor, explicitly stating the scope as fusion over the top-k predictions, and add a limitations paragraph discussing the lack of analysis on lower-ranked classes and alternative strategies. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is an explicit algorithmic extension

full rationale

The paper defines FM-G-CAM directly as the fusion of Grad-CAM saliency maps computed on the top-k predicted classes, with an explicit mathematical and algorithmic description provided. No parameter fitting occurs, no predictions are claimed that reduce to the inputs by construction, and no load-bearing self-citations or uniqueness theorems are invoked. The proposal is self-contained as a methodological construction built on standard gradient computations, consistent with the reader's assessment of score 2.0 as the upper bound for a direct algorithmic contribution without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard assumptions from gradient-based explainability in CNNs; no free parameters, new entities, or ad-hoc axioms are introduced beyond the domain assumption that top classes suffice for holistic views.

axioms (1)
  • domain assumption Gradient-weighted class activation maps computed per class can be meaningfully fused to represent the overall prediction rationale of a CNN.
    Invoked in the abstract's description of the FM-G-CAM methodology as an improvement over single-class Grad-CAM.

pith-pipeline@v0.9.0 · 5727 in / 1301 out tokens · 23599 ms · 2026-05-24T04:38:49.034507+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    An optimized KernelSHAP method for 3D medical image segmentation restricts computation to ROI and receptive fields, uses patch logit caching for 15-30% savings, and compares organ units versus supervoxels for clinical...

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai

    Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Ja vier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina , Richard Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58:82–115, 2020

  2. [2]

    Explain able artificial intelligence: a comprehensive review

    Dang Minh, H Xiang Wang, Y Fen Li, and Tan N Nguyen. Explain able artificial intelligence: a comprehensive review. Artificial Intelligence Review, pages 1–66, 2022. 9https://github.com/SuienS/cam-evaluation 12 A Holistic Approach for Explainable AI in Computer Vision Silva and Bird

  3. [3]

    Application of explainable artificial intelligence for hea lthcare: A systematic review of the last decade (2011– 2022)

    Hui Wen Loh, Chui Ping Ooi, Silvia Seoni, Prabal Datta Bar ua, Filippo Molinari, and U Rajendra Acharya. Application of explainable artificial intelligence for hea lthcare: A systematic review of the last decade (2011– 2022). Computer Methods and Programs in Biomedicine , page 107161, 2022

  4. [4]

    Tuberculosis detecti on in chest radiograph using convolutional neural network architecture and explainable artificial intellige nce

    Saad I Nafisah and Ghulam Muhammad. Tuberculosis detecti on in chest radiograph using convolutional neural network architecture and explainable artificial intellige nce. Neural Computing and Applications , pages 1–21, 2022

  5. [5]

    Alzheimer’s disease analysis using explainable artificial intelligenc e (xai)

    K Muthamil Sudar, P Nagaraj, S Nithisaa, R Aishwarya, M Aa kash, and S Ishwarya Lakshmi. Alzheimer’s disease analysis using explainable artificial intelligenc e (xai). In 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS) , pages 419–423. IEEE, 2022

  6. [6]

    Review of deep learning: Concepts, cnn architectures, challenges, applications, future directi ons

    Laith Alzubaidi, Jinglan Zhang, Amjad J Humaidi, A yad Al -Dujaili, Y e Duan, Omran Al-Shamma, José Santa- maría, Mohammed A Fadhel, Muthana Al-Amidie, and Laith Farh an. Review of deep learning: Concepts, cnn architectures, challenges, applications, future directi ons. Journal of big Data , 8:1–74, 2021

  7. [7]

    A survey on vision transformer

    Kai Han, Y unhe Wang, Hanting Chen, Xinghao Chen, Jianyua n Guo, Zhenhua Liu, Y ehui Tang, An Xiao, Chun- jing Xu, Yixing Xu, et al. A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence, 45(1):87–110, 2022

  8. [8]

    Transformers in vision: A survey

    Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. Transformers in vision: A survey. ACM computing surveys (CSUR) , 54(10s):1–41, 2022

  9. [9]

    A survey of methods for explaining black box models

    Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. 51(5), au g 2018

  10. [10]

    Grad-cam: Visual explanations from deep networks vi a gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek D as, Ramakrishna V edantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks vi a gradient-based localization. In Proceedings of the IEEE international conference on computer vision , pages 618–626, 2017

  11. [11]

    Grad-cam++: Gener- alized gradient-based visual explanations for deep convol utional networks

    Aditya Chattopadhay, Anirban Sarkar, Prantik Howlade r, and Vineeth N Balasubramanian. Grad-cam++: Gener- alized gradient-based visual explanations for deep convol utional networks. In 2018 IEEE winter conference on applications of computer vision (WACV) , pages 839–847. IEEE, 2018

  12. [12]

    Deep learn ing (cnn) and transfer learning: a review

    Jaya Gupta, Sunil Pathak, and Gireesh Kumar. Deep learn ing (cnn) and transfer learning: a review. In Journal of Physics: Conference Series , volume 2273, page 012029. IOP Publishing, 2022

  13. [13]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, an d Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recogni tion, pages 248–255. Ieee, 2009

  14. [14]

    A survey of methods for explaining black box models

    Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri , Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. ACM Comput. Surv., 51(5), aug 2018

  15. [15]

    Rajendra Acharya

    Nebras Sobahi, Orhan Atila, Erkan Deniz, Abdulkadir Se ngur, and U. Rajendra Acharya. Explainable covid-19 detection using fractal dimension and vision transformer w ith grad-cam on cough sounds. Biocybernetics and Biomedical Engineering, 42(3):1066–1080, 2022

  16. [16]

    Transformer inter pretability beyond attention visualization

    Hila Chefer, Shir Gur, and Lior Wolf. Transformer inter pretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Re cognition (CVPR), pages 782–791, June 2021

  17. [17]

    Vision transformer in stenosis detection of coronary arter ies

    Michal Jungiewicz, Piotr Jastrzebski, Piotr Wawryka, Karol Przystalski, Karol Sabatowski, and Stanisław Bartus. Vision transformer in stenosis detection of coronary arter ies. Expert Systems with Applications , 228:120234, 2023

  18. [18]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton . Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Wei nberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012

  19. [19]

    Automated 3d fe rrograph image analysis for similar particle identification with the knowledge-embedded double-cnn mod el

    Shuo Wang, Tonghai Wu, and Kunpeng Wang. Automated 3d fe rrograph image analysis for similar particle identification with the knowledge-embedded double-cnn mod el. W ear, 476:203696, 2021. 23rd International Conference on Wear of Materials

  20. [20]

    Rauber, Samuel G

    Paulo E. Rauber, Samuel G. Fadel, Alexandre X. Falcão, a nd Alexandru C. Telea. Visualizing the hidden activity of artificial neural networks. IEEE Transactions on Visualization and Computer Graphics, 23(1):101–110, 2017

  21. [21]

    Visualizing de ep convolutional neural networks using natural pre- images

    Aravindh Mahendran and Andrea V edaldi. Visualizing de ep convolutional neural networks using natural pre- images. International Journal of Computer Vision , 120(3):233–255, 2016. Communicated by Cordelia Schmid

  22. [22]

    Network dissection: Quantifying interpretability of deep visual representations

    David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and An tonio Torralba. Network dissection: Quantifying interpretability of deep visual representations. In 2017 IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 3319–3327, 2017. 13 A Holistic Approach for Explainable AI in Computer Vision Silva and Bird

  23. [23]

    Zeiler and Rob Fergus

    Matthew D. Zeiler and Rob Fergus. Visualizing and under standing convolutional networks. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors , Computer Vision – ECCV 2014 , pages 818–833, Cham, 2014. Springer International Publishing

  24. [24]

    Visualizing deep neural network decisions: Prediction difference analysis

    Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and Max Well ing. Visualizing deep neural network decisions: Prediction difference analysis. arXiv e-prints, pages arXiv–1702, 2017

  25. [25]

    Deep inside conv olutional networks: visualising image classification models and saliency maps

    K Simonyan, A V edaldi, and A Zisserman. Deep inside conv olutional networks: visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Re presentations (ICLR). ICLR, 2014

  26. [26]

    Striving for simplicity: The all convolu- tional net

    J Springenberg, Alexey Dosovitskiy, Thomas Brox, and M Riedmiller. Striving for simplicity: The all convolu- tional net. In ICLR (workshop track), 2015

  27. [27]

    On pixel-wise explanations for non-linear cla ssifier decisions by layer-wise relevance propagation

    Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus-Robert Muller, and Woj- ciech Samek. On pixel-wise explanations for non-linear cla ssifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015

  28. [28]

    Axiomat ic attribution for deep networks

    Mukund Sundararajan, Ankur Taly, and Qiqi Y an. Axiomat ic attribution for deep networks. In International conference on machine learning , pages 3319–3328. PMLR, 2017

  29. [29]

    Ex- plaining nonlinear classification decisions with deep tayl or decomposition

    Grégoire Montavon, Sebastian Lapuschkin, Alexander B inder, Wojciech Samek, and Klaus-Robert Müller. Ex- plaining nonlinear classification decisions with deep tayl or decomposition. Pattern recognition, 65:211–222, 2017

  30. [30]

    Learning deep features for discriminative localization

    Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva , and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pa ttern recognition, pages 2921–2929, 2016

  31. [31]

    Axiom-based grad-cam: Towards accurate visualization and explanation of cnns

    Ruigang Fu, Qingyong Hu, Xiaohu Dong, Y ulan Guo, Yinghu i Gao, and Biao Li. Axiom-based grad-cam: Towards accurate visualization and explanation of cnns. arXiv preprint arXiv:2008.02312 , 2020

  32. [32]

    Use hirescam ins tead of grad-cam for faithful explanations of convolu- tional neural networks

    Rachel Lea Draelos and Lawrence Carin. Use hirescam ins tead of grad-cam for faithful explanations of convolu- tional neural networks. arXiv preprint arXiv:2011.08891 , 2020

  33. [33]

    Seg-xres-cam: Explaining spatially local regions in image segmentation

    Syed Nouman Hasany, Caroline Petitjean, and Fabrice Mé riaudeau. Seg-xres-cam: Explaining spatially local regions in image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision a nd Pattern Recognition (CVPR) W orkshops, pages 3733–3738, June 2023

  34. [34]

    Ramaswamy

    Saurabh Desai and Harish G. Ramaswamy. Ablation-cam: V isual explanations for deep convolutional network via gradient-free localization. In 2020 IEEE Winter Conference on Applications of Computer Vis ion (WACV), pages 972–980, 2020

  35. [35]

    Score- cam: Score-weighted visual explanations for convolutiona l neural networks

    Haofan Wang, Zifan Wang, Mengnan Du, Fan Y ang, Zijian Zhang, Sirui Ding, Piotr Mardziel, and Xia Hu. Score- cam: Score-weighted visual explanations for convolutiona l neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition work shops, pages 24–25, 2020

  36. [36]

    Eigen-cam : Class activation map using principal com- ponents

    Mohammed Bany Muhammad and Mohammed Y easin. Eigen-cam : Class activation map using principal com- ponents. In 2020 international joint conference on neural networks (IJ CNN), pages 1–7. IEEE, 2020

  37. [37]

    Layercam: Exploring hierarchical class activation maps for localization

    Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming-Ming Cheng, and Y unchao Wei. Layercam: Exploring hierarchical class activation maps for localization. IEEE Transactions on Image Processing , 30:5875–5888, 2021

  38. [38]

    Deep feature factorization for concept discovery

    Edo Collins, Radhakrishna Achanta, and Sabine Susstru nk. Deep feature factorization for concept discovery. In Proceedings of the European Conference on Computer Vision ( ECCV), pages 336–352, 2018

  39. [39]

    Somewhere over the rainbow: A n empirical assessment of quantitative colormaps

    Y ang Liu and Jeffrey Heer. Somewhere over the rainbow: A n empirical assessment of quantitative colormaps. In Proceedings of the 2018 CHI Conference on Human Factors in Co mputing Systems , CHI ’18, page 1–12, New Y ork, NY , USA, 2018. Association for Computing Machinery

  40. [40]

    Augmented grad- cam: Heat-maps super resolution through augmentation

    Pietro Morbidelli, Diego Carrera, Beatrice Rossi, Pas qualina Fragneto, and Giacomo Boracchi. Augmented grad- cam: Heat-maps super resolution through augmentation. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 4067–4071, 2020

  41. [41]

    RISE: Randomized Input Sampling for Explanation of Black-box Models

    Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Random ized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421 , 2018

  42. [42]

    Black-box explanation of object detectors via sali ency maps

    Vitali Petsiuk, Rajiv Jain, V arun Manjunatha, Vlad I Mo rariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko. Black-box explanation of object detectors via sali ency maps. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition , pages 11443–11452, 2021. 14 A Holistic Approach for Explainable AI in Computer Vision Silva and Bird

  43. [43]

    Met rics for saliency map evaluation of deep learning explanation methods

    Tristan Gomez, Thomas Fréour, and Harold Mouchère. Met rics for saliency map evaluation of deep learning explanation methods. In International Conference on Pattern Recognition and Artifi cial Intelligence, pages 84–

  44. [44]

    I dentity mappings in deep residual networks

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. I dentity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amst erdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016

  45. [45]

    Microsoft coco: Common objects in conte xt

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hay s, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in conte xt. In Computer Vision–ECCV 2014: 13th Euro- pean Conference, Zurich, Switzerland, September 6-12, 201 4, Proceedings, Part V 13, pages 740–755. Springer, 2014

  46. [46]

    Convoluti onal neural networks in medical image understanding: a survey

    DR Sarvamangala and Raghavendra V Kulkarni. Convoluti onal neural networks in medical image understanding: a survey. Evolutionary intelligence, 15(1):1–22, 2022

  47. [47]

    Chexpe rt: A large chest radiograph dataset with uncer- tainty labels and expert comparison

    Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Y u, S ilviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. Chexpe rt: A large chest radiograph dataset with uncer- tainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligen ce, volume 33, pages 590–597, 2019

  48. [48]

    A cnn model: earlier diagnosis and classification of alzheimer disease u sing mri

    Ahmad Waleed Salehi, Preety Baglat, Brij Bhushan Sharm a, Gaurav Gupta, and Ankita Upadhya. A cnn model: earlier diagnosis and classification of alzheimer disease u sing mri. In 2020 International Conference on Smart Electronics and Communication (ICOSEC) , pages 156–161. IEEE, 2020

  49. [49]

    Computer-aided diagnosis of breast ultrasound images usin g ensemble learning from convolutional neural net- works

    Woo Kyung Moon, Y an-Wei Lee, Hao-Hsiang Ke, Su Hyun Lee, Chiun-Sheng Huang, and Ruey-Feng Chang. Computer-aided diagnosis of breast ultrasound images usin g ensemble learning from convolutional neural net- works. Computer methods and programs in biomedicine , 190:105361, 2020

  50. [50]

    Torchxrayvision: A library of chest x- ray datasets and models

    Joseph Paul Cohen, Joseph D Viviano, Paul Bertin, Paul M orrison, Parsa Torabian, Matteo Guarrera, Matthew P Lungren, Akshay Chaudhari, Rupert Brooks, Mohammad Hashir , et al. Torchxrayvision: A library of chest x- ray datasets and models. In International Conference on Medical Imaging with Deep Lear ning, pages 231–249. PMLR, 2022

  51. [51]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens V an Der Maaten, and Kilia n Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pa ttern recognition, pages 4700–4708, 2017

  52. [52]

    Object detectors emerge in deep scene cnns

    Zhou Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva , and Antonio Torralba. Object detectors emerge in deep scene cnns. 2015

  53. [53]

    Dot-net: Document layout classificati on using texture-based cnn

    Sai Chandra Kosaraju, Mohammed Masum, Nelson Zange Tsa ku, Pritesh Patel, Tanju Bayramoglu, Girish Mod- gil, and Mingon Kang. Dot-net: Document layout classificati on using texture-based cnn. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1029–1034, 2019

  54. [54]

    Predicting clustered weather patterns: A test case for applications of convolutional neural networks to spati o-temporal climate data

    Ashesh Chattopadhyay, Pedram Hassanzadeh, and Saba Pa sha. Predicting clustered weather patterns: A test case for applications of convolutional neural networks to spati o-temporal climate data. Scientific reports, 10(1):1317, 2020

  55. [55]

    Efficient multi-sc ale 3d cnn with fully connected crf for accurate brain lesion segmentation

    Konstantinos Kamnitsas, Christian Ledig, Virginia FJ Newcombe, Joanna P Simpson, Andrew D Kane, David K Menon, Daniel Rueckert, and Ben Glocker. Efficient multi-sc ale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis, 36:61–78, 2017

  56. [56]

    Hyperdense-net: a hyper-densely connected cnn for multi-m odal image segmentation

    Jose Dolz, Karthik Gopinath, Jing Y uan, Herve Lombaert , Christian Desrosiers, and Ismail Ben A yed. Hyperdense-net: a hyper-densely connected cnn for multi-m odal image segmentation. IEEE transactions on medical imaging, 38(5):1116–1126, 2018. 15