Bridging the Rural Healthcare Gap: A Cascaded Edge-Cloud Architecture for Automated Retinal Screening
Pith reviewed 2026-05-15 05:17 UTC · model grok-4.3
The pith
An edge-cloud cascade cuts cloud calls for diabetic retinopathy screening by half with near-identical accuracy to full cloud processing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a cascaded architecture using MobileNetV3-small on the edge for binary referable versus non-referable triage and RETFoundDINOv2 in the cloud for four-class severity grading on only the forwarded images delivers 80.49 percent accuracy and 0.8167 quadratic weighted kappa on the stratified APTOS test set. This performance nearly matches the cloud-only baseline of 80.76 percent accuracy and 0.8184 kappa while reducing cloud calls by half.
What carries the argument
The high-sensitivity threshold applied to the edge model's output probability, which decides whether an image is forwarded to the cloud grader.
Load-bearing premise
The validation-tuned high-sensitivity threshold on the edge triage model will continue to catch nearly all referable cases when applied to new images from different cameras, lighting, and populations.
What would settle it
Running the full cascade on a new collection of retinal images gathered from rural clinics with different cameras and comparing the achieved sensitivity and overall accuracy against the reported APTOS figures.
Figures
read the original abstract
Diabetic Retinopathy (DR) is one of the leading causes of preventable blindness, yet rural regions often lack the specialists and infrastructure needed for early detection. Although cloud-based deep learning systems offer high accuracy, they face significant challenges in these settings due to high latency, limited bandwidth, and high data transmission costs. To address these challenges, we propose a two-tier edge-cloud cascade on the public APTOS 2019 Blindness Detection dataset. Tier 1 runs a lightweight MobileNetV3-small model on a local clinic device to perform a binary triage between Referable DR (Classes 2-4) and Non-referable DR (Classes 0-1). Tier 2 runs a RETFoundDINOv2 model in the cloud for ordinal severity grading, but only on the subset of images flagged as referable by Tier 1. On a stratified APTOS test split of 733 images, Tier 1 reaches 98.99% sensitivity and 84.37% specificity at a validation-tuned high-sensitivity threshold. The default cascade forwards 49.52% of test images to Tier 2, reducing cloud calls by 50.48% relative to using a cloud-based model for all images. In the deployed 4-class output space (Class 0-1 / Class 2 / Class 3 / Class 4), the cascade obtains 80.49% accuracy and 0.8167 quadratic weighted kappa; the cloud-only baseline obtains 80.76% accuracy and 0.8184 quadratic weighted kappa. On APTOS, the cascade cuts cloud use by about half with a modest drop in grading performance. Index Terms: Diabetic Retinopathy, Edge-Cloud Cascade, MobileNetV3-small, RETFound-DINOv2, Retinal Screening, tele-ophthalmology
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a two-tier cascaded edge-cloud architecture for diabetic retinopathy screening on the APTOS 2019 dataset. Tier 1 deploys a lightweight MobileNetV3-small model on edge devices for high-sensitivity binary triage (referable DR classes 2-4 vs. non-referable 0-1), forwarding only flagged images to Tier 2, which runs a RETFoundDINOv2 model in the cloud for 4-class ordinal grading. On a stratified test split of 733 images, Tier 1 achieves 98.99% sensitivity and 84.37% specificity at a validation-tuned threshold, forwarding 49.52% of cases to the cloud (50.48% reduction in cloud calls). The full cascade reports 80.49% accuracy and 0.8167 quadratic weighted kappa, compared to 80.76% accuracy and 0.8184 kappa for a cloud-only baseline.
Significance. If the empirical results on APTOS hold under deployment conditions, the cascade offers a practical route to lower bandwidth and latency costs in rural tele-ophthalmology while maintaining near-equivalent grading performance. The use of a public benchmark with a clear held-out test split, together with concrete reporting of sensitivity, specificity, accuracy, and kappa, provides a reproducible empirical baseline for edge-cloud triage systems. The high-sensitivity first-stage design is a straightforward and clinically motivated contribution.
major comments (2)
- [Abstract and Evaluation] Abstract and Evaluation section: The headline claims of ~50% cloud-call reduction and near-zero missed referable cases rest on Tier 1 maintaining 98.99% sensitivity at the chosen operating point. This threshold was tuned on the APTOS validation split and measured on the APTOS test split; no experiments apply the identical fixed threshold to images acquired with different cameras, under different lighting, or from different populations. Domain shift could degrade sensitivity or increase the forward rate, directly affecting both the safety and efficiency arguments for rural deployment.
- [Evaluation] Evaluation section: The manuscript reports point estimates for accuracy and kappa but provides no statistical tests (e.g., McNemar or bootstrap confidence intervals) comparing the cascade to the cloud-only baseline, nor any ablation on the impact of the triage threshold value itself. This makes it difficult to assess whether the observed 0.27% accuracy drop is within noise.
minor comments (2)
- [Abstract] Abstract: The exact numerical value of the validation-tuned triage threshold is not stated, nor is the precise validation procedure (e.g., grid search range or target sensitivity level) used to select it.
- [Methods] Methods: Training details for both models (optimizer, learning-rate schedule, data augmentation, class weighting, and early-stopping criteria) are not described, which hinders reproducibility even though the dataset is public.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to strengthen the evaluation and clarify limitations where feasible.
read point-by-point responses
-
Referee: [Abstract and Evaluation] Abstract and Evaluation section: The headline claims of ~50% cloud-call reduction and near-zero missed referable cases rest on Tier 1 maintaining 98.99% sensitivity at the chosen operating point. This threshold was tuned on the APTOS validation split and measured on the APTOS test split; no experiments apply the identical fixed threshold to images acquired with different cameras, under different lighting, or from different populations. Domain shift could degrade sensitivity or increase the forward rate, directly affecting both the safety and efficiency arguments for rural deployment.
Authors: We agree that the current evaluation is confined to the APTOS 2019 dataset splits and that the high-sensitivity threshold was tuned on the validation set. This is a genuine limitation for claiming robustness in rural deployment scenarios. In the revised manuscript we have added an explicit limitations paragraph in the Discussion section noting the risk of domain shift and recommending future multi-center validation. However, we lack access to additional datasets with varying cameras and acquisition conditions, so we cannot perform those experiments at this time. revision: partial
-
Referee: [Evaluation] Evaluation section: The manuscript reports point estimates for accuracy and kappa but provides no statistical tests (e.g., McNemar or bootstrap confidence intervals) comparing the cascade to the cloud-only baseline, nor any ablation on the impact of the triage threshold value itself. This makes it difficult to assess whether the observed 0.27% accuracy drop is within noise.
Authors: We appreciate this observation. In the revised manuscript we now report bootstrap confidence intervals (1,000 resamples) for accuracy and quadratic weighted kappa on both the cascade and cloud-only systems. We also include a McNemar test showing the performance difference is not statistically significant (p > 0.05). Finally, we added an ablation table that varies the Tier-1 decision threshold and reports the resulting cloud-call reduction, sensitivity, and final kappa for each operating point. revision: yes
- Empirical validation of the fixed triage threshold on images acquired with different cameras, lighting conditions, or from different patient populations to quantify domain-shift effects.
Circularity Check
No significant circularity; results are direct empirical measurements
full rationale
The paper reports standard machine-learning training and evaluation on stratified splits of the public APTOS 2019 dataset. Tier 1 threshold is tuned on the validation split and performance (sensitivity, specificity, forward rate) is measured on the held-out test split of 733 images; cascade accuracy and kappa are likewise computed directly from the test-set outputs. No equations, derivations, or fitted parameters are presented as predictions. No self-citations supply load-bearing uniqueness theorems or ansatzes. All headline numbers (98.99 % sensitivity, 49.52 % forwarded, 80.49 % accuracy) are falsifiable empirical outcomes independent of the paper's own modeling choices.
Axiom & Free-Parameter Ledger
free parameters (1)
- triage threshold =
validation-tuned
axioms (3)
- domain assumption MobileNetV3-small is appropriate for accurate binary triage on retinal images
- domain assumption RETFoundDINOv2 is appropriate for ordinal severity grading
- domain assumption The stratified 733-image test split is representative of deployment conditions
Reference graph
Works this paper leans on
-
[1]
Z. L. Teo, Y .-C. Tham, M. Yu, M. L. Chee, T. H. Rim, N. Cheung, M. M. Bikbov, Y . X. Wang, Y . Tang, Y . Lu, I. Y . H. Wong, D. S. W. Ting, G. S. W. Tan, J. B. Jonas, C. Sabanayagam, T. Y . Wong, and C.-Y . Cheng, “Global prevalence of diabetic retinopathy and projection of burden through 2045: Systematic review and meta-analysis.”Ophthalmology, 2021. [O...
work page 2045
-
[2]
M. Alam, R. Yamashita, V . Ramesh, T. Prabhune, J. Lim, R. Chan, J. Hallak, T. Leng, and D. Rubin, “Contrastive learning-based pretrain- ing improves representation and transferability of diabetic retinopathy classification models,”Scientific Reports, vol. 13, 04 2023
work page 2023
-
[3]
GBD 2019 Blindness and Vision Impairment Collaborators and Vision Loss Expert Group of the Global Burden of Disease Study, “Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to vision 2020: the right to sight: an analysis for the global burden of disease study,”The Lancet Global H...
work page 2019
-
[4]
Global strategy on human resources for health: Workforce 2030 – a five-year check-in,
M. McIsaac, J. Buchan, A. Abu-Agla, R. Kawar, and J. Campbell, “Global strategy on human resources for health: Workforce 2030 – a five-year check-in,”Human Resources for Health, vol. 22, no. 1, p. 77, 2024
work page 2030
-
[5]
Barriers to digital health im- plementation in low- and middle-income countries: a narrative review,
Q. Olayiwola, O. Sanusi, G. Amoo, O. Agboola, J. Adeyemi, H. Suleiman, M. Ibrahim, and T. Hassan, “Barriers to digital health im- plementation in low- and middle-income countries: a narrative review,” Discover Public Health, vol. 23, 04 2026
work page 2026
-
[6]
E. Beede, E. Baylor, F. Hersch, A. Iurchenko, L. Wilcox, P. Ruamvi- boonsuk, and L. M. Vardoulakis, “A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy,” inProceedings of the 2020 CHI Conference on Human Factors in Computing Systems, ser. CHI ’20. New York, NY , USA: Association for Computin...
work page 2020
-
[7]
Convolutional neural networks for diabetic retinopathy,
H. Pratt, F. Coenen, D. M. Broadbent, S. P. Harding, and Y . Zheng, “Convolutional neural networks for diabetic retinopathy,”Procedia Com- puter Science, vol. 90, pp. 200–205, 2016, 20th Conference on Medical Image Understanding and Analysis (MIUA 2016)
work page 2016
-
[8]
Diabetic retinopathy classification using downscaling algorithms and deep learning,
N. Doshi, U. Oza, and P. Kumar, “Diabetic retinopathy classification using downscaling algorithms and deep learning,” in2020 7th Interna- tional Conference on Signal Processing and Integrated Networks (SPIN), 02 2020, pp. 950–955
work page 2020
-
[9]
Z. Khan, A. M. Gaidhane, M. Singh, S. Ganesan, M. Kaur, G. C. Sharma, P. Rani, R. Sharma, S. Thapliyal, M. Kushwaha, H. Kumar, R. K. Agarwal, M. Shabil, L. Verma, A. Sidhu, N. B. A. Manan, G. Bushi, R. Mehta, S. Sah, P. Satapathy, and S. K. Samal, “Diagnostic accuracy of idx-dr for detecting diabetic retinopathy: A systematic review and meta-analysis,”Ame...
work page 2025
-
[10]
W. Yu, X. Si, and J. Zhong, “Dual-swinord: A dual-head swin trans- former with semantic prior injection for ordinal diabetic retinopathy grading,”Bioengineering, vol. 13, no. 4, 2026
work page 2026
-
[11]
Convolutional vision transformer based automatic grading of diabetic retinopathy images,
N. Sarnaik, A. Gautam, S. Kushwaha, and R. Shanker, “Convolutional vision transformer based automatic grading of diabetic retinopathy images,” in2024 IEEE 8th International Conference on Information and Communication Technology (CICT), 2024, pp. 1–5
work page 2024
-
[12]
A. Alkarawi and E. Avs ¸ar, “A deep learning framework with edge com- puting for severity level detection of diabetic retinopathy,”Multimedia Tools and Applications, vol. 82, 03 2023
work page 2023
-
[13]
APTOS 2019 blindness detection,
Karthik, Maggie, and S. Dane, “APTOS 2019 blindness detection,” 2019, asia Pacific Tele-Ophthalmology Society (APTOS) competition; fundus images provided by Aravind Eye Hospital, Madurai, India. [Online]. Available: https://www.kaggle.com/competitions/ aptos2019-blindness-detection
work page 2019
-
[14]
FundusDRGrading: pretrained models for diabetic retinopa- thy grading on fundus images,
C. Playout, “FundusDRGrading: pretrained models for diabetic retinopa- thy grading on fundus images,” Model collection: https://huggingface.co/ collections/ClementP/fundus-grading; source code: https://github.com/ ClementPla/FundusDRGrading, 2024, specific checkpoint used in this work:FundusDRGrading-mobilenetv3_small_100
work page 2024
-
[15]
A foundation model for generalizable disease detection from retinal images,
Y . Zhou, M. A. Chia, S. K. Wagner, M. S. Ayhan, D. J. Williamson, R. R. Struyven, T. Liu, M. Xu, M. G. Lozano, P. Woodward- Court, Y . Kihara, A. Altmann, A. Y . Lee, E. J. Topol, A. K. Denniston, D. C. Alexander, and P. A. Keane, “A foundation model for generalizable disease detection from retinal images,”Nature, vol. 622, no. 7981, pp. 156–163, 10 2023...
-
[16]
DINOv2: Learning robust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. J´egou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “DINOv2: Learning robust visual features without...
work page 2024
-
[17]
Diabetic retinopathy classification using a hybrid and efficient mobilenetv2-svm model,
H. A. Amelia and M. Rahardi, “Diabetic retinopathy classification using a hybrid and efficient mobilenetv2-svm model,”Journal of Applied Informatics and Computing (JAIC), vol. 7, no. 2, pp. 210–218, 2023
work page 2023
-
[18]
Dual branch deep learning network for detection and stage grading of diabetic retinopathy,
H. Shakibania, S. Raoufi, B. Pourafkham, H. Khotanlou, and M. Man- soorizadeh, “Dual branch deep learning network for detection and stage grading of diabetic retinopathy,”Biomedical Signal Processing and Control, vol. 93, p. 106168, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.