pith. sign in

arxiv: 2605.14108 · v1 · pith:GJ2S7I7Qnew · submitted 2026-05-13 · 💻 cs.CV · cs.AI· cs.LG

Bridging the Rural Healthcare Gap: A Cascaded Edge-Cloud Architecture for Automated Retinal Screening

Pith reviewed 2026-05-15 05:17 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords diabetic retinopathyedge-cloud cascaderetinal screeningmobile deep learningcloud cost reductiontele-ophthalmology
0
0 comments X

The pith

An edge-cloud cascade cuts cloud calls for diabetic retinopathy screening by half with near-identical accuracy to full cloud processing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a two-tier system that runs a lightweight model on local clinic devices to triage retinal images for referable diabetic retinopathy. Images flagged as referable are then sent to a more powerful cloud model for full ordinal severity grading. This design targets rural settings where specialists are scarce and internet is limited or costly. On a 733-image test split from the APTOS dataset, the cascade maintains 98.99 percent sensitivity at the first stage, forwards only 49.52 percent of cases to the cloud, and reaches 80.49 percent accuracy with 0.8167 quadratic weighted kappa. These figures sit very close to the cloud-only baseline while cutting cloud usage by 50.48 percent.

Core claim

The central claim is that a cascaded architecture using MobileNetV3-small on the edge for binary referable versus non-referable triage and RETFoundDINOv2 in the cloud for four-class severity grading on only the forwarded images delivers 80.49 percent accuracy and 0.8167 quadratic weighted kappa on the stratified APTOS test set. This performance nearly matches the cloud-only baseline of 80.76 percent accuracy and 0.8184 kappa while reducing cloud calls by half.

What carries the argument

The high-sensitivity threshold applied to the edge model's output probability, which decides whether an image is forwarded to the cloud grader.

Load-bearing premise

The validation-tuned high-sensitivity threshold on the edge triage model will continue to catch nearly all referable cases when applied to new images from different cameras, lighting, and populations.

What would settle it

Running the full cascade on a new collection of retinal images gathered from rural clinics with different cameras and comparing the achieved sensitivity and overall accuracy against the reported APTOS figures.

Figures

Figures reproduced from arXiv: 2605.14108 by Nishi Doshi, Shrey Shah.

Figure 1
Figure 1. Figure 1: Representative samples from the five severity levels of the APTOS [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cascaded pipeline with edge- and cloud-based models [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrices on the held-out APTOS test split [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Diabetic Retinopathy (DR) is one of the leading causes of preventable blindness, yet rural regions often lack the specialists and infrastructure needed for early detection. Although cloud-based deep learning systems offer high accuracy, they face significant challenges in these settings due to high latency, limited bandwidth, and high data transmission costs. To address these challenges, we propose a two-tier edge-cloud cascade on the public APTOS 2019 Blindness Detection dataset. Tier 1 runs a lightweight MobileNetV3-small model on a local clinic device to perform a binary triage between Referable DR (Classes 2-4) and Non-referable DR (Classes 0-1). Tier 2 runs a RETFoundDINOv2 model in the cloud for ordinal severity grading, but only on the subset of images flagged as referable by Tier 1. On a stratified APTOS test split of 733 images, Tier 1 reaches 98.99% sensitivity and 84.37% specificity at a validation-tuned high-sensitivity threshold. The default cascade forwards 49.52% of test images to Tier 2, reducing cloud calls by 50.48% relative to using a cloud-based model for all images. In the deployed 4-class output space (Class 0-1 / Class 2 / Class 3 / Class 4), the cascade obtains 80.49% accuracy and 0.8167 quadratic weighted kappa; the cloud-only baseline obtains 80.76% accuracy and 0.8184 quadratic weighted kappa. On APTOS, the cascade cuts cloud use by about half with a modest drop in grading performance. Index Terms: Diabetic Retinopathy, Edge-Cloud Cascade, MobileNetV3-small, RETFound-DINOv2, Retinal Screening, tele-ophthalmology

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a two-tier cascaded edge-cloud architecture for diabetic retinopathy screening on the APTOS 2019 dataset. Tier 1 deploys a lightweight MobileNetV3-small model on edge devices for high-sensitivity binary triage (referable DR classes 2-4 vs. non-referable 0-1), forwarding only flagged images to Tier 2, which runs a RETFoundDINOv2 model in the cloud for 4-class ordinal grading. On a stratified test split of 733 images, Tier 1 achieves 98.99% sensitivity and 84.37% specificity at a validation-tuned threshold, forwarding 49.52% of cases to the cloud (50.48% reduction in cloud calls). The full cascade reports 80.49% accuracy and 0.8167 quadratic weighted kappa, compared to 80.76% accuracy and 0.8184 kappa for a cloud-only baseline.

Significance. If the empirical results on APTOS hold under deployment conditions, the cascade offers a practical route to lower bandwidth and latency costs in rural tele-ophthalmology while maintaining near-equivalent grading performance. The use of a public benchmark with a clear held-out test split, together with concrete reporting of sensitivity, specificity, accuracy, and kappa, provides a reproducible empirical baseline for edge-cloud triage systems. The high-sensitivity first-stage design is a straightforward and clinically motivated contribution.

major comments (2)
  1. [Abstract and Evaluation] Abstract and Evaluation section: The headline claims of ~50% cloud-call reduction and near-zero missed referable cases rest on Tier 1 maintaining 98.99% sensitivity at the chosen operating point. This threshold was tuned on the APTOS validation split and measured on the APTOS test split; no experiments apply the identical fixed threshold to images acquired with different cameras, under different lighting, or from different populations. Domain shift could degrade sensitivity or increase the forward rate, directly affecting both the safety and efficiency arguments for rural deployment.
  2. [Evaluation] Evaluation section: The manuscript reports point estimates for accuracy and kappa but provides no statistical tests (e.g., McNemar or bootstrap confidence intervals) comparing the cascade to the cloud-only baseline, nor any ablation on the impact of the triage threshold value itself. This makes it difficult to assess whether the observed 0.27% accuracy drop is within noise.
minor comments (2)
  1. [Abstract] Abstract: The exact numerical value of the validation-tuned triage threshold is not stated, nor is the precise validation procedure (e.g., grid search range or target sensitivity level) used to select it.
  2. [Methods] Methods: Training details for both models (optimizer, learning-rate schedule, data augmentation, class weighting, and early-stopping criteria) are not described, which hinders reproducibility even though the dataset is public.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to strengthen the evaluation and clarify limitations where feasible.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: The headline claims of ~50% cloud-call reduction and near-zero missed referable cases rest on Tier 1 maintaining 98.99% sensitivity at the chosen operating point. This threshold was tuned on the APTOS validation split and measured on the APTOS test split; no experiments apply the identical fixed threshold to images acquired with different cameras, under different lighting, or from different populations. Domain shift could degrade sensitivity or increase the forward rate, directly affecting both the safety and efficiency arguments for rural deployment.

    Authors: We agree that the current evaluation is confined to the APTOS 2019 dataset splits and that the high-sensitivity threshold was tuned on the validation set. This is a genuine limitation for claiming robustness in rural deployment scenarios. In the revised manuscript we have added an explicit limitations paragraph in the Discussion section noting the risk of domain shift and recommending future multi-center validation. However, we lack access to additional datasets with varying cameras and acquisition conditions, so we cannot perform those experiments at this time. revision: partial

  2. Referee: [Evaluation] Evaluation section: The manuscript reports point estimates for accuracy and kappa but provides no statistical tests (e.g., McNemar or bootstrap confidence intervals) comparing the cascade to the cloud-only baseline, nor any ablation on the impact of the triage threshold value itself. This makes it difficult to assess whether the observed 0.27% accuracy drop is within noise.

    Authors: We appreciate this observation. In the revised manuscript we now report bootstrap confidence intervals (1,000 resamples) for accuracy and quadratic weighted kappa on both the cascade and cloud-only systems. We also include a McNemar test showing the performance difference is not statistically significant (p > 0.05). Finally, we added an ablation table that varies the Tier-1 decision threshold and reports the resulting cloud-call reduction, sensitivity, and final kappa for each operating point. revision: yes

standing simulated objections not resolved
  • Empirical validation of the fixed triage threshold on images acquired with different cameras, lighting conditions, or from different patient populations to quantify domain-shift effects.

Circularity Check

0 steps flagged

No significant circularity; results are direct empirical measurements

full rationale

The paper reports standard machine-learning training and evaluation on stratified splits of the public APTOS 2019 dataset. Tier 1 threshold is tuned on the validation split and performance (sensitivity, specificity, forward rate) is measured on the held-out test split of 733 images; cascade accuracy and kappa are likewise computed directly from the test-set outputs. No equations, derivations, or fitted parameters are presented as predictions. No self-citations supply load-bearing uniqueness theorems or ansatzes. All headline numbers (98.99 % sensitivity, 49.52 % forwarded, 80.49 % accuracy) are falsifiable empirical outcomes independent of the paper's own modeling choices.

Axiom & Free-Parameter Ledger

1 free parameters · 3 axioms · 0 invented entities

The performance claims rest on the suitability of the chosen models for their tiers and the representativeness of the APTOS dataset and its stratified split for real-world conditions.

free parameters (1)
  • triage threshold = validation-tuned
    Validation-tuned high-sensitivity threshold chosen to reach 98.99% sensitivity
axioms (3)
  • domain assumption MobileNetV3-small is appropriate for accurate binary triage on retinal images
    Selected for the edge tier based on its lightweight properties
  • domain assumption RETFoundDINOv2 is appropriate for ordinal severity grading
    Selected for the cloud tier based on prior foundation model performance
  • domain assumption The stratified 733-image test split is representative of deployment conditions
    Used to report all cascade metrics

pith-pipeline@v0.9.0 · 5649 in / 1578 out tokens · 84403 ms · 2026-05-15T05:17:03.234647+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Global prevalence of diabetic retinopathy and projection of burden through 2045: Systematic review and meta-analysis

    Z. L. Teo, Y .-C. Tham, M. Yu, M. L. Chee, T. H. Rim, N. Cheung, M. M. Bikbov, Y . X. Wang, Y . Tang, Y . Lu, I. Y . H. Wong, D. S. W. Ting, G. S. W. Tan, J. B. Jonas, C. Sabanayagam, T. Y . Wong, and C.-Y . Cheng, “Global prevalence of diabetic retinopathy and projection of burden through 2045: Systematic review and meta-analysis.”Ophthalmology, 2021. [O...

  2. [2]

    Contrastive learning-based pretrain- ing improves representation and transferability of diabetic retinopathy classification models,

    M. Alam, R. Yamashita, V . Ramesh, T. Prabhune, J. Lim, R. Chan, J. Hallak, T. Leng, and D. Rubin, “Contrastive learning-based pretrain- ing improves representation and transferability of diabetic retinopathy classification models,”Scientific Reports, vol. 13, 04 2023

  3. [3]

    GBD 2019 Blindness and Vision Impairment Collaborators and Vision Loss Expert Group of the Global Burden of Disease Study, “Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to vision 2020: the right to sight: an analysis for the global burden of disease study,”The Lancet Global H...

  4. [4]

    Global strategy on human resources for health: Workforce 2030 – a five-year check-in,

    M. McIsaac, J. Buchan, A. Abu-Agla, R. Kawar, and J. Campbell, “Global strategy on human resources for health: Workforce 2030 – a five-year check-in,”Human Resources for Health, vol. 22, no. 1, p. 77, 2024

  5. [5]

    Barriers to digital health im- plementation in low- and middle-income countries: a narrative review,

    Q. Olayiwola, O. Sanusi, G. Amoo, O. Agboola, J. Adeyemi, H. Suleiman, M. Ibrahim, and T. Hassan, “Barriers to digital health im- plementation in low- and middle-income countries: a narrative review,” Discover Public Health, vol. 23, 04 2026

  6. [6]

    A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy,

    E. Beede, E. Baylor, F. Hersch, A. Iurchenko, L. Wilcox, P. Ruamvi- boonsuk, and L. M. Vardoulakis, “A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy,” inProceedings of the 2020 CHI Conference on Human Factors in Computing Systems, ser. CHI ’20. New York, NY , USA: Association for Computin...

  7. [7]

    Convolutional neural networks for diabetic retinopathy,

    H. Pratt, F. Coenen, D. M. Broadbent, S. P. Harding, and Y . Zheng, “Convolutional neural networks for diabetic retinopathy,”Procedia Com- puter Science, vol. 90, pp. 200–205, 2016, 20th Conference on Medical Image Understanding and Analysis (MIUA 2016)

  8. [8]

    Diabetic retinopathy classification using downscaling algorithms and deep learning,

    N. Doshi, U. Oza, and P. Kumar, “Diabetic retinopathy classification using downscaling algorithms and deep learning,” in2020 7th Interna- tional Conference on Signal Processing and Integrated Networks (SPIN), 02 2020, pp. 950–955

  9. [9]

    Diagnostic accuracy of idx-dr for detecting diabetic retinopathy: A systematic review and meta-analysis,

    Z. Khan, A. M. Gaidhane, M. Singh, S. Ganesan, M. Kaur, G. C. Sharma, P. Rani, R. Sharma, S. Thapliyal, M. Kushwaha, H. Kumar, R. K. Agarwal, M. Shabil, L. Verma, A. Sidhu, N. B. A. Manan, G. Bushi, R. Mehta, S. Sah, P. Satapathy, and S. K. Samal, “Diagnostic accuracy of idx-dr for detecting diabetic retinopathy: A systematic review and meta-analysis,”Ame...

  10. [10]

    Dual-swinord: A dual-head swin trans- former with semantic prior injection for ordinal diabetic retinopathy grading,

    W. Yu, X. Si, and J. Zhong, “Dual-swinord: A dual-head swin trans- former with semantic prior injection for ordinal diabetic retinopathy grading,”Bioengineering, vol. 13, no. 4, 2026

  11. [11]

    Convolutional vision transformer based automatic grading of diabetic retinopathy images,

    N. Sarnaik, A. Gautam, S. Kushwaha, and R. Shanker, “Convolutional vision transformer based automatic grading of diabetic retinopathy images,” in2024 IEEE 8th International Conference on Information and Communication Technology (CICT), 2024, pp. 1–5

  12. [12]

    A deep learning framework with edge com- puting for severity level detection of diabetic retinopathy,

    A. Alkarawi and E. Avs ¸ar, “A deep learning framework with edge com- puting for severity level detection of diabetic retinopathy,”Multimedia Tools and Applications, vol. 82, 03 2023

  13. [13]

    APTOS 2019 blindness detection,

    Karthik, Maggie, and S. Dane, “APTOS 2019 blindness detection,” 2019, asia Pacific Tele-Ophthalmology Society (APTOS) competition; fundus images provided by Aravind Eye Hospital, Madurai, India. [Online]. Available: https://www.kaggle.com/competitions/ aptos2019-blindness-detection

  14. [14]

    FundusDRGrading: pretrained models for diabetic retinopa- thy grading on fundus images,

    C. Playout, “FundusDRGrading: pretrained models for diabetic retinopa- thy grading on fundus images,” Model collection: https://huggingface.co/ collections/ClementP/fundus-grading; source code: https://github.com/ ClementPla/FundusDRGrading, 2024, specific checkpoint used in this work:FundusDRGrading-mobilenetv3_small_100

  15. [15]

    A foundation model for generalizable disease detection from retinal images,

    Y . Zhou, M. A. Chia, S. K. Wagner, M. S. Ayhan, D. J. Williamson, R. R. Struyven, T. Liu, M. Xu, M. G. Lozano, P. Woodward- Court, Y . Kihara, A. Altmann, A. Y . Lee, E. J. Topol, A. K. Denniston, D. C. Alexander, and P. A. Keane, “A foundation model for generalizable disease detection from retinal images,”Nature, vol. 622, no. 7981, pp. 156–163, 10 2023...

  16. [16]

    DINOv2: Learning robust visual features without supervision,

    M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. J´egou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “DINOv2: Learning robust visual features without...

  17. [17]

    Diabetic retinopathy classification using a hybrid and efficient mobilenetv2-svm model,

    H. A. Amelia and M. Rahardi, “Diabetic retinopathy classification using a hybrid and efficient mobilenetv2-svm model,”Journal of Applied Informatics and Computing (JAIC), vol. 7, no. 2, pp. 210–218, 2023

  18. [18]

    Dual branch deep learning network for detection and stage grading of diabetic retinopathy,

    H. Shakibania, S. Raoufi, B. Pourafkham, H. Khotanlou, and M. Man- soorizadeh, “Dual branch deep learning network for detection and stage grading of diabetic retinopathy,”Biomedical Signal Processing and Control, vol. 93, p. 106168, 2024