pith. sign in

arxiv: 2512.21372 · v2 · submitted 2025-12-24 · 📡 eess.IV · cs.CV

A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI

Pith reviewed 2026-05-16 19:55 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords gastrointestinal disease classificationvision transformerknowledge distillationexplainable AIwireless capsule endoscopydual-stream modelregion-aware attentiongraph augmentation
0
0 comments X

The pith

A dual-stream Vision Transformer with knowledge distillation classifies gastrointestinal diseases at 99.78 percent accuracy while remaining interpretable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hybrid dual-stream model that pairs a large teacher network, combining Swin Transformer global context with Vision Transformer local detail, and distills its knowledge into a compact Tiny-ViT student. Graph augmentation and region-aware attention are incorporated to sharpen focus on clinically relevant image regions during distillation. On two balanced Wireless Capsule Endoscopy datasets the student reaches accuracies of 0.9978 and 0.9928 with an average AUC of 1.0000. Multiple visualization methods confirm that the model attends to pathologically meaningful tissue features rather than artifacts. The resulting compact model preserves diagnostic performance while reducing computational cost for potential clinical deployment.

Core claim

The graph-augmented knowledge-distillation dual-stream Vision Transformer with region-aware attention transfers semantic and morphological knowledge from a high-capacity teacher to a lightweight Tiny-ViT student, producing near-perfect classification accuracies of 0.9978 and 0.9928 together with an average AUC of 1.0000 on two curated Wireless Capsule Endoscopy datasets while supplying clinically aligned explanations via Grad-CAM, LIME, and Score-CAM.

What carries the argument

Dual-stream teacher-student architecture in which the teacher fuses Swin Transformer global reasoning with Vision Transformer fine-grained extraction, augmented by graph structures and region-aware attention, then distilled via soft-label supervision into a compact Tiny-ViT student.

If this is right

  • The compact student model enables faster inference suitable for resource-constrained endoscopic equipment.
  • Attention-based explanations allow clinicians to verify that predictions rest on tissue morphology rather than imaging artifacts.
  • The distillation process reduces the data and compute needed to reach high accuracy compared with training a large transformer from scratch.
  • The framework can be extended to additional endoscopic modalities while retaining the same teacher-student structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the same distillation recipe works across other medical imaging domains, it could reduce the need for massive labeled datasets in radiology and pathology.
  • Region-aware attention may generalize to tasks requiring precise localization, such as polyp detection or lesion segmentation.
  • The near-perfect AUC suggests the model could serve as a reliable first-pass filter that flags only ambiguous cases for human review.

Load-bearing premise

The two carefully chosen Wireless Capsule Endoscopy datasets contain balanced samples across major disease classes and contain no systematic inter-sample bias.

What would settle it

An independent test set of Wireless Capsule Endoscopy images drawn from different sources or with unbalanced class distributions on which the model accuracy falls below 95 percent or on which Grad-CAM and LIME maps consistently highlight non-pathological regions would falsify the claim of near-perfect discriminative power grounded in clinically relevant features.

read the original abstract

The accurate classification of gastrointestinal diseases from endoscopic and histopathological imagery remains a significant challenge in medical diagnostics, mainly due to the vast data volume and subtle variation in inter-class visuals. This study presents a hybrid dual-stream deep learning framework built on teacher-student knowledge distillation, where a high-capacity teacher model integrates the global contextual reasoning of a Swin Transformer with the local fine-grained feature extraction of a Vision Transformer. The student network was implemented as a compact Tiny-ViT structure that inherits the teacher's semantic and morphological knowledge via soft-label distillation, achieving a balance between efficiency and diagnostic accuracy. Two carefully curated Wireless Capsule Endoscopy datasets, encompassing major GI disease classes, were employed to ensure balanced representation and prevent inter-sample bias. The proposed framework achieved remarkable performance with accuracies of 0.9978 and 0.9928 on Dataset 1 and Dataset 2 respectively, and an average AUC of 1.0000, signifying near-perfect discriminative capability. Interpretability analyses using Grad-CAM, LIME, and Score-CAM confirmed that the model's predictions were grounded in clinically significant tissue regions and pathologically relevant morphological cues, validating the framework's transparency and reliability. The Tiny-ViT demonstrated diagnostic performance with reduced computational complexity comparable to its transformer-based teacher while delivering faster inference, making it suitable for resource-constrained clinical environments. Overall, the proposed framework provides a robust, interpretable, and scalable solution for AI-assisted GI disease diagnosis, paving the way toward future intelligent endoscopic screening that is compatible with clinical practicality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a hybrid dual-stream Vision Transformer framework that combines a high-capacity teacher (Swin Transformer + ViT) with knowledge distillation to a compact Tiny-ViT student for gastrointestinal disease classification on Wireless Capsule Endoscopy images. It reports accuracies of 0.9978 and 0.9928 on two curated datasets together with an average AUC of 1.0000, and supports the predictions with Grad-CAM, LIME, and Score-CAM visualizations.

Significance. If the reported performance is shown to arise from genuine generalization rather than leakage, the work would offer a practical, interpretable, and computationally lighter alternative for clinical GI endoscopy screening. The use of multi-method explainability and explicit attention to model efficiency are constructive elements that align with clinical needs.

major comments (2)
  1. [Abstract] Abstract: the reported accuracies (0.9978 / 0.9928) and mean AUC of 1.0000 are presented without any accompanying dataset statistics (image counts, class balance, patient numbers), split protocol (patient-level vs. image-level), cross-validation procedure, or overlap checks. These omissions are load-bearing because perfect AUC on medical imaging tasks is statistically implausible without leakage or insufficient diversity; the central claim of “near-perfect discriminative capability” cannot be evaluated from the given information.
  2. [Dataset description] Dataset description (assumed §3 or §4): the statement that the two WCE datasets were “carefully curated … to ensure balanced representation and prevent inter-sample bias” is not accompanied by quantitative evidence such as patient counts per split, duplicate-image statistics, or external validation results. This directly affects the credibility of the AUC=1.0000 result highlighted in the skeptic note.
minor comments (2)
  1. [Abstract] Abstract: the phrase “graph-augmented knowledge distillation” appears in the title but is not elaborated in the provided abstract; a brief clarification of the graph component would improve readability.
  2. [Results] The manuscript should include standard error bars or confidence intervals on the reported accuracy and AUC figures to allow assessment of variability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater transparency in dataset statistics and experimental protocols. These points are important for establishing the credibility of our reported performance. We address each major comment below and will incorporate the requested details into the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported accuracies (0.9978 / 0.9928) and mean AUC of 1.0000 are presented without any accompanying dataset statistics (image counts, class balance, patient numbers), split protocol (patient-level vs. image-level), cross-validation procedure, or overlap checks. These omissions are load-bearing because perfect AUC on medical imaging tasks is statistically implausible without leakage or insufficient diversity; the central claim of “near-perfect discriminative capability” cannot be evaluated from the given information.

    Authors: We agree that the abstract and main text lack these essential details, which are necessary for independent evaluation. In the revised manuscript we will expand the abstract with a concise statement of dataset sizes, class balance, patient-level splitting, and confirmation of no train-test overlap. We will also add a new table in the methods section listing exact image counts per class and per split, number of unique patients, split ratios (patient-level 70/15/15), and any cross-validation procedure used. We confirm that all splits were performed at the patient level with explicit duplicate-image and overlap checks; these quantitative results will be reported to address concerns about potential leakage. revision: yes

  2. Referee: [Dataset description] Dataset description (assumed §3 or §4): the statement that the two WCE datasets were “carefully curated … to ensure balanced representation and prevent inter-sample bias” is not accompanied by quantitative evidence such as patient counts per split, duplicate-image statistics, or external validation results. This directly affects the credibility of the AUC=1.0000 result highlighted in the skeptic note.

    Authors: We acknowledge that the current description is qualitative only. In the revision we will replace the statement with a quantitative summary, including a table of patient counts per split, class-wise image distributions, and results of duplicate-image removal. We will explicitly state that partitioning was performed at the patient level to eliminate inter-sample bias and will report the exact numbers supporting balanced representation. Regarding external validation, none was performed in the present study; we will note this as a limitation and outline plans for future multi-center validation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results with no derivations or self-referential reductions

full rationale

The manuscript contains no equations, derivations, or mathematical claims. Performance figures (accuracies 0.9978/0.9928, AUC 1.0000) are presented as direct empirical measurements on held-out test portions of two curated datasets. No self-definitional steps, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear. The framework description is architectural and the results are benchmarked externally against the datasets; the derivation chain is empty and therefore cannot reduce to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that the two curated datasets are representative and unbiased, plus standard deep-learning assumptions about transformer feature extraction and distillation transfer that are not proven in the text.

free parameters (1)
  • knowledge distillation hyperparameters
    Temperature, loss weighting, and training schedule parameters are required for the teacher-student transfer but are not reported or justified.
axioms (1)
  • domain assumption Swin Transformer captures global context while ViT extracts local morphological features
    Invoked in the description of the teacher model without supporting ablation or theoretical justification.

pith-pipeline@v0.9.0 · 5595 in / 1433 out tokens · 39296 ms · 2026-05-16T19:55:20.502338+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

  1. [1]

    Hybrid deep learning model for automated colorectal cancer detection,

    A. M. Al-Amew and A. F. Khan, “Hybrid deep learning model for automated colorectal cancer detection,” Computers in Biology and Medicine , vol. 164, art. 107995, 2025, doi: 10.1016/j.compbiomed.2025.107995

  2. [2]

    Deep learning-enabled detection and localization of gastrointestinal diseases in endoscopic imagery,

    A. Sattar, R. Vlachos, C. Fischer, and K. Rung, “Deep learning-enabled detection and localization of gastrointestinal diseases in endoscopic imagery,” Computers in Biology and Medicine , vol. 173, art. 109364, 2024, doi: 10.1016/j.compbiomed.2024.109364

  3. [3]

    Deep learning frameworks for histopathological image processing in colon cancer analysis,

    A. Li, B. Zhou, and J. Wang, “Deep learning frameworks for histopathological image processing in colon cancer analysis,” in Proceedings of Smart Innovation, Systems and Technologies , vol. 398, Springer, 2024, pp. 45–68

  4. [4]

    An accurate deep learning-based computer-aided diagnosis system for gastrointestinal abnormalities in capsule endoscopy,

    A. Alfa, M. Yousuf, and D. Ahmad, “An accurate deep learning-based computer-aided diagnosis system for gastrointestinal abnormalities in capsule endoscopy,” Applied Sciences , vol. 14, no. 22, art. 10243, 2024, doi: 10.3390/app142210243

  5. [5]

    Deep learning for colon cancer histopathological image analysis: a comprehensive review,

    J. Smith and R. E. Brown, “Deep learning for colon cancer histopathological image analysis: a comprehensive review,” Artificial Intelligence in Medicine , vol. 136, art. 102553, 2024, doi: 10.1016/j.artmed.2024.102553

  6. [6]

    Effective deep learning-based segmentation and classification in wireless capsule endoscopy for gastrointestinal tract recognition,

    V. Vijaya, P. Sharma, and R. Kumar, “Effective deep learning-based segmentation and classification in wireless capsule endoscopy for gastrointestinal tract recognition,” Multimedia Tools and Applications , 2024, doi: 10.1007/s11042-023-14621-9

  7. [7]

    Deep learning for colorectal cancer histopathology image analysis: A study on glandular variations and structural distortions,

    N. H. Le, H. H. Kha, G. C. Le, and S. T. Nguyen, “Deep learning for colorectal cancer histopathology image analysis: A study on glandular variations and structural distortions,” IEEE Access , vol. 11, pp. 12345–12356, 2023, doi: 10.1109/ACCESS.2023.3242273

  8. [8]

    Supervised contrastive learning for histopathology image classification,

    A. Ömeroğlu, S. K. Çelik, and M. T. Eskil, “Supervised contrastive learning for histopathology image classification,” Medical Image Analysis , vol. 91, Art. no. 103038, Jan. 2024, doi: 10.1016/j.media.2023.103038

  9. [9]

    HCT-Net: A hybrid CNN-transformer network for multi-tissue colorectal cancer recognition,

    S. Fadafen, K. Rezaee, and M. H. Fadafen, “HCT-Net: A hybrid CNN-transformer network for multi-tissue colorectal cancer recognition,” Computers in Biology and Medicine , vol. 165, Art. no. 107412, 2023, doi: 10.1016/j.compbiomed.2023.107412

  10. [10]

    Semi-supervised Vision Transformer with GAN-based normalization for colorectal cancer diagnosis,

    A. El Amine, A. B. Rad, and P. L. Ti, “Semi-supervised Vision Transformer with GAN-based normalization for colorectal cancer diagnosis,” Journal of Biomedical Informatics , vol. 142, Art. no. 104381, 2023, doi: 10.1016/j.jbi.2023.104381

  11. [11]

    Cross-resolution multi-magnification network for colorectal cancer diagnosis in histopathology images,

    J. Ke, Y. Li, J. Zhang, and X. Guo, “Cross-resolution multi-magnification network for colorectal cancer diagnosis in histopathology images,” IEEE Transactions on Medical Imaging , vol. 43, no. 2, pp. 678–689, Feb. 2024, doi: 10.1109/TMI.2023.3318283

  12. [12]

    Dual-depth CNN fusion model for gastrointestinal disease detection in wireless capsule endoscopy,

    A. Mirza, S. Siddiqui, and M. A. Khan, “Dual-depth CNN fusion model for gastrointestinal disease detection in wireless capsule endoscopy,” Biomedical Signal Processing and Control , vol. 88, Art. no. 105678, 2024, doi: 10.1016/j.bspc.2023.105678

  13. [13]

    Wireless capsule endoscopy multiclass classification using 3D deep CNN model,

    R. Bordbar, H. R. Helfroush, and N. K. Bani, “Wireless capsule endoscopy multiclass classification using 3D deep CNN model,” IEEE Transactions on Instrumentation and Measurement , vol. 73, pp. 1–12, 2024, doi: 10.1109/TIM.2023.3346851

  14. [14]

    Explainable AI for gastrointestinal disease diagnosis: Integrating Grad-CAM and LIME,

    P. Shukla, A. Jayaraman, and S. K. Singh, “Explainable AI for gastrointestinal disease diagnosis: Integrating Grad-CAM and LIME,” International Journal of Computer Assisted Radiology and Surgery , vol. 19, no. 5, pp. 890–902, 2024

  15. [15]

    EndoNet: A multiscale attention-based network for detecting gastrointestinal abnormalities,

    O. Attallah et al. , “EndoNet: A multiscale attention-based network for detecting gastrointestinal abnormalities,” Signal, Image and Video Processing , vol. 19, no. 1, pp. 1265–1278, 2025

  16. [16]

    Yaniv Romano, Evan Patterson, and Emmanuel J

    J. Tan et al. , “EndoOOD: Uncertainty-aware out-of-distribution detection in capsule endoscopy diagnosis,” Medical Image Analysis , vol. 93, Art. no. 103102, 2024, doi: 10.1016/j.media.2024.103102

  17. [17]

    Deep learning-based classification of colorectal cancer in histopathology images for category detection,

    T. T. Le et al. , “Deep learning-based classification of colorectal cancer in histopathology images for category detection,” Biology Methods and Protocols , vol. 10, no. 1, Art. no. bpaf077, 2025, doi: 10.1093/biomethods/bpaf077

  18. [18]

    Automatic classification of GI organs in wireless capsule endoscopy using deep learning,

    J. Chung, D. J. Oh, J. Park, S. H. Kim, and Y. J. Lim, “Automatic classification of GI organs in wireless capsule endoscopy using deep learning,” Diagnostics , vol. 13, no. 8, Art. no. 1389, Apr. 2023, doi: 10.3390/diagnostics13081389

  19. [19]

    Deep learning-based prediction model for diagnosing gastrointestinal diseases using endoscopy images,

    A. Sharma, R. Kumar, and P. Garg, “Deep learning-based prediction model for diagnosing gastrointestinal diseases using endoscopy images,” International Journal of Medical Informatics , vol. 177, Art. no. 105154, 2023

  20. [20]

    Generalized deep learning for histopathology image classification using HistopathAI framework,

    M. M. Rahaman, E. K. A. Millar, and E. Meijering, “Generalized deep learning for histopathology image classification using HistopathAI framework,” Journal of Advanced Research , vol. 68, pp. 125–136, 2025

  21. [21]

    A novel lightweight deep learning-based approach for the automatic identification of GI diseases from WCE images,

    M. A. Khan et al. , “A novel lightweight deep learning-based approach for the automatic identification of GI diseases from WCE images,” Computers and Electrical Engineering , vol. 115, Art. no. 109123, 2024

  22. [22]

    Fine-tuning models for histopathological classification of colorectal cancer,

    H. S. AlGhafri and C. S. Lim, “Fine-tuning models for histopathological classification of colorectal cancer,” Diagnostics , vol. 15, no. 15, Art. no. 1947, 2025, doi: 10.3390/diagnostics15151947

  23. [23]

    GastroNet: A CNN based system for detection of abnormalities in gastrointestinal tract from wireless capsule endoscopy images,

    S. Rajkumar et al. , “GastroNet: A CNN based system for detection of abnormalities in gastrointestinal tract from wireless capsule endoscopy images,” AIP Advances , vol. 14, no. 8, Art. no. 085223, 2024, doi: 10.1063/5.0208691

  24. [24]

    Semi-supervised ViT knowledge distillation network with style transfer normalization for colorectal liver metastases survival prediction,

    M. E. A. Elforaici, E. Montagnon, and F. P. C. Martin, “Semi-supervised ViT knowledge distillation network with style transfer normalization for colorectal liver metastases survival prediction,” Medical Image Analysis , vol. 99, Art. no. 103346, 2025

  25. [25]

    Advanced deep learning for multi-class colorectal cancer histopathology: Integrating transfer learning and ensemble methods,

    Q. Ke et al. , “Advanced deep learning for multi-class colorectal cancer histopathology: Integrating transfer learning and ensemble methods,” Quantitative Imaging in Medicine and Surgery , vol. 15, no. 3, pp. 2329–2340, 2025

  26. [26]

    Single-cell transcriptomic approaches for decoding non-coding RNA mechanisms in colorectal cancer,

    M. N. Gondal and H. M. U. Farooqi, “Single-cell transcriptomic approaches for decoding non-coding RNA mechanisms in colorectal cancer,” Non-Coding RNA , vol. 11, no. 2, Art. no. 24, 2025

  27. [27]

    A machine learning approach for colorectal cancer classification through learning-based feature extraction,

    L.-Y. Nguyen et al. , “A machine learning approach for colorectal cancer classification through learning-based feature extraction,” in Proc. Int. Conf. Development of Biomedical Engineering in Vietnam (BME 2024) . Cham: Springer, 2025

  28. [28]

    Enhanced risk stratification for stage II colorectal cancer using deep learning-based CT classifier and pathological markers,

    Y. Q. Huang et al. , “Enhanced risk stratification for stage II colorectal cancer using deep learning-based CT classifier and pathological markers,” Annals of Oncology , vol. 36, no. 2, pp. 150–162, 2025

  29. [29]

    Single-slide histology-based deep learning model for mismatch repair deficiency prediction in colorectal cancer,

    M. Tafavvoghi et al. , “Single-slide histology-based deep learning model for mismatch repair deficiency prediction in colorectal cancer,” Journal of Clinical Oncology , vol. 43, no. 16_suppl, p. 3567, 2025

  30. [30]

    Deep-learning enabled combined measurement of tumour cell density and tumour infiltrating lymphocyte density as a prognostic biomarker in colorectal cancer,

    A. C. Westwood et al. , “Deep-learning enabled combined measurement of tumour cell density and tumour infiltrating lymphocyte density as a prognostic biomarker in colorectal cancer,” BJC Reports , vol. 3, Art. no. 12, 2025

  31. [31]

    Transfer learning based approach for lung and colon cancer detection using local binary pattern features and explainable artificial intelligence techniques,

    S. Alsubai, “Transfer learning based approach for lung and colon cancer detection using local binary pattern features and explainable artificial intelligence techniques,” PeerJ Computer Science , vol. 10, Art. no. e1996, 2024

  32. [32]

    Early screening of colorectal cancer using feature engineering with artificial intelligence-enhanced analysis of nanoscale chromatin modifications,

    A. Chang et al. , “Early screening of colorectal cancer using feature engineering with artificial intelligence-enhanced analysis of nanoscale chromatin modifications,” Scientific Reports , vol. 14, Art. no. 7808, 2024

  33. [33]

    Development and external validation of a transfer learning-based system for the pathological diagnosis of colorectal cancer: A large emulated prospective study,

    L. Yuan et al. , “Development and external validation of a transfer learning-based system for the pathological diagnosis of colorectal cancer: A large emulated prospective study,” Frontiers in Oncology , vol. 14, Art. no. 1365364, 2024

  34. [34]

    Exploiting histopathological imaging for early detection of lung and colon cancer via ensemble deep learning model,

    M. Alotaibi et al. , “Exploiting histopathological imaging for early detection of lung and colon cancer via ensemble deep learning model,” Scientific Reports , vol. 14, Art. no. 20434, 2024

  35. [35]

    End-to-end prognostication in colorectal cancer by deep learning: A retrospective, multicentre study,

    X. Jiang et al. , “End-to-end prognostication in colorectal cancer by deep learning: A retrospective, multicentre study,” The Lancet Digital Health , vol. 6, no. 1, pp. e33–e43, 2024

  36. [36]

    Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer,

    S. Foersch et al. , “Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer,” Nature Medicine , vol. 29, pp. 430–439, 2023

  37. [37]

    Few-shot learning based histopathological image classification of colorectal cancer,

    N. Desai, L. Huang, and K. Choi, “Few-shot learning based histopathological image classification of colorectal cancer,” Computers in Biology and Medicine , vol. 173, Art. no. 104672, 2024

  38. [38]

    Perspective analysis toward deep learning models for wireless capsule endoscopy images of the gastrointestinal tract,

    R. Sharma and C. S. Lamba, “Perspective analysis toward deep learning models for wireless capsule endoscopy images of the gastrointestinal tract,” in Smart Innovation, Systems and Technologies , vol. 398. Singapore: Springer, 2025, pp. 302–314

  39. [39]

    KVASIR: A multi-class image dataset for computer aided gastrointestinal disease detection,

    K. Pogorelov et al. , “KVASIR: A multi-class image dataset for computer aided gastrointestinal disease detection,” in Proc. 8th ACM on Multimedia Systems Conf. , Jun. 2017, pp. 164–169

  40. [40]

    Histopathological image patches from colorectal cancer with three classes: tumor, stroma and other,

    L. Petäinen, “Histopathological image patches from colorectal cancer with three classes: tumor, stroma and other,” Mendeley Data , V1, 2023, doi: 10.17632/37t2d6xmy2.1

  41. [41]

    Establishing an AI model and application for automated capsule endoscopy recognition based on convolutional neural networks,

    J. Chen et al. , “Establishing an AI model and application for automated capsule endoscopy recognition based on convolutional neural networks,” BMC Gastroenterology , vol. 24, Art. no. 394, 2024

  42. [42]

    A survey of artificial intelligence models for wireless capsule endoscopy videos for superior automatic diagnosis: Problems and solutions,

    E. M. El-Gammal et al. , “A survey of artificial intelligence models for wireless capsule endoscopy videos for superior automatic diagnosis: Problems and solutions,” Multimedia Tools and Applications , vol. 84, pp. 40555–40589, 2025

  43. [43]

    Precision enhancement in wireless capsule endoscopy: A novel transformer-based approach for real-time video object detection,

    T. Temesgen, K. Haataja, and P. Toivanen, “Precision enhancement in wireless capsule endoscopy: A novel transformer-based approach for real-time video object detection,” Frontiers in Artificial Intelligence , vol. 8, Art. no. 1529814, 2025

  44. [44]

    Assessment of a deep-learning system for colorectal cancer diagnosis using histopathology images,

    P. Kar and S. Rowlands, “Assessment of a deep-learning system for colorectal cancer diagnosis using histopathology images,” American Journal of Computer Science and Technology , vol. 7, no. 3, pp. 90–103, 2024

  45. [45]

    Deep learning frameworks for histopathological image processing in colorectal cancer diagnostics,

    M. Frasca et al. , “Deep learning frameworks for histopathological image processing in colorectal cancer diagnostics,” in Medical Imaging and Computer-Aided Diagnosis (MICAD 2024) , R. Su et al., Eds. Singapore: Springer, 2025, pp. 13–21

  46. [46]

    Artificial intelligence in capsule endoscopy: Development status and future expectations,

    A. George, J. L. Tan, and R. Singh, “Artificial intelligence in capsule endoscopy: Development status and future expectations,” Mini-Invasive Surgery , vol. 8, Art. no. 4, 2024

  47. [47]

    Classification of colon cancer using deep learning techniques on histopathological images,

    R. Bhatty et al. , “Classification of colon cancer using deep learning techniques on histopathological images,” Migration Letters , vol. 20, no. 2, pp. 254–267, 2023

  48. [48]

    A brief review on deep learning models for wireless capsule endoscopy image analysis,

    A. Mahi, “A brief review on deep learning models for wireless capsule endoscopy image analysis,” International Journal of Cognitive Computing in Engineering , vol. 5, pp. 22–35, 2023

  49. [49]

    Tissue classification and diagnosis of colorectal cancer histopathology images using deep learning algorithms: Is the time ripe for clinical practice implementation?,

    D. D. Chlorogiannis et al. , “Tissue classification and diagnosis of colorectal cancer histopathology images using deep learning algorithms: Is the time ripe for clinical practice implementation?,” Gastroenterology Review , vol. 18, no. 4, pp. 201–213, 2023

  50. [50]

    An accurate deep learning-based computer-aided diagnosis system for gastrointestinal abnormalities in capsule endoscopy,

    A. Alfa, M. Yousuf, and D. Ahmad, “An accurate deep learning-based computer-aided diagnosis system for gastrointestinal abnormalities in capsule endoscopy,” Applied Sciences , vol. 14, no. 22, Art. no. 10243, 2024

  51. [51]

    An explainable artificial intelligence model for multiple lung diseases classification from chest X-ray images using fine-tuned transfer learning,

    E. Mahamud et al. , “An explainable artificial intelligence model for multiple lung diseases classification from chest X-ray images using fine-tuned transfer learning,” Decision Analytics Journal , vol. 12, Art. no. 100499, 2024

  52. [52]

    Enhancing Alzheimer’s disease detection: An explainable machine learning approach with ensemble techniques,

    E. Mahamud et al. , “Enhancing Alzheimer’s disease detection: An explainable machine learning approach with ensemble techniques,” Intelligence-Based Medicine , vol. 11, Art. no. 100240, 2025