Disease classification of macular Optical Coherence Tomography scans using deep learning software: validation on independent, multi-centre data
Pith reviewed 2026-05-24 23:08 UTC · model grok-4.3
The pith
Pegasus-OCT detects macular anomalies with at least 98% AUROC across independent multi-centre OCT datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pegasus-OCT performed with AUROCs of at least 98% for all datasets in the detection of general macular anomalies. For scans of sufficient quality, the AUROCs for general AMD and DME detection were found to be at least 99% and 98%, respectively.
What carries the argument
Pegasus-OCT deep learning software that identifies features of retinal disease from macula OCT scans and is tested for performance across heterogeneous populations.
If this is right
- The software maintains performance when applied to data from different patient demographics and device manufacturers.
- High detection rates hold for scans acquired at multiple independent sites by different operators.
- The results support potential use of the software to help manage growing demand in eye care services for retinal disease.
Where Pith is reading between the lines
- Validation on external multi-centre data increases the chance the model will work in new clinics that use different OCT machines.
- If performance remains stable, the software could reduce variability in initial screening for AMD and DME across regions.
- Further tests could measure whether the high AUROCs translate into faster referral decisions in routine practice.
Load-bearing premise
Ground truth labels supplied by the dataset owners are accurate, consistent, and free of systematic bias across centers, devices, and operators.
What would settle it
Independent re-labelling of a random subset of the scans by a new panel of experts that produces labels differing on more than 10% of cases and drops the reported AUROCs below 90%.
read the original abstract
Purpose: To evaluate Pegasus-OCT, a clinical decision support software for the identification of features of retinal disease from macula OCT scans, across heterogenous populations involving varying patient demographics, device manufacturers, acquisition sites and operators. Methods: 5,588 normal and anomalous macular OCT volumes (162,721 B-scans), acquired at independent centres in five countries, were processed using the software. Results were evaluated against ground truth provided by the dataset owners. Results: Pegasus-OCT performed with AUROCs of at least 98% for all datasets in the detection of general macular anomalies. For scans of sufficient quality, the AUROCs for general AMD and DME detection were found to be at least 99% and 98%, respectively. Conclusions: The ability of a clinical decision support system to cater for different populations is key to its adoption. Pegasus-OCT was shown to be able to detect AMD, DME and general anomalies in OCT volumes acquired across multiple independent sites with high performance. Its use thus offers substantial promise, with the potential to alleviate the burden of growing demand in eye care services caused by retinal disease.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates Pegasus-OCT, a deep learning clinical decision support tool, for detecting general macular anomalies, AMD, and DME in 5,588 OCT volumes (162,721 B-scans) acquired across five independent centers in different countries using varied devices and operators. Performance is assessed via AUROC against ground-truth labels supplied by the dataset owners, yielding AUROCs ≥98% for general anomalies on all datasets and ≥99% (AMD) / ≥98% (DME) on quality-filtered scans.
Significance. A large-scale, multi-center, multi-device validation study is a strength for assessing real-world robustness of OCT classification software, which could support clinical adoption if the reported metrics are shown to reflect true generalization. The scale (five countries) addresses an important practical need in retinal disease screening.
major comments (3)
- [Methods] Methods: No information is supplied on the composition or provenance of the training data used to develop Pegasus-OCT, nor on any steps taken to exclude overlap with the five validation datasets. This detail is load-bearing for the central claim of robust performance on 'independent' multi-centre data.
- [Methods] Methods: The evaluation relies entirely on ground-truth labels supplied by the five dataset owners, yet the text provides no evidence of a unified labeling protocol, inter-rater reliability statistics, or any post-hoc audit of label consistency across centers, devices, or operators. Because all AUROCs are computed directly against these labels, systematic inter-center labeling differences could inflate or deflate the reported figures without reflecting model behavior.
- [Abstract] Abstract and Results: No confidence intervals, standard errors, or other measures of statistical uncertainty are reported for any AUROC value, and the quality-filtering criteria used to define the 'sufficient quality' subset are not described. Both omissions prevent assessment of the precision and scope of the headline performance claims.
minor comments (1)
- [Abstract] The abstract states results for 'general AMD and DME detection' but does not clarify whether these are binary detection tasks or multi-class; a brief clarification in the Methods would improve readability.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We provide point-by-point responses to the major comments below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Methods] Methods: No information is supplied on the composition or provenance of the training data used to develop Pegasus-OCT, nor on any steps taken to exclude overlap with the five validation datasets. This detail is load-bearing for the central claim of robust performance on 'independent' multi-centre data.
Authors: Pegasus-OCT is a proprietary clinical decision support tool developed using training data collected from clinical sites distinct from the five validation centers described in this study. The validation datasets were acquired independently at centers in five countries with no participation in the model's development. We will revise the Methods section to include this information on the independence of the validation data. revision: yes
-
Referee: [Methods] Methods: The evaluation relies entirely on ground-truth labels supplied by the five dataset owners, yet the text provides no evidence of a unified labeling protocol, inter-rater reliability statistics, or any post-hoc audit of label consistency across centers, devices, or operators. Because all AUROCs are computed directly against these labels, systematic inter-center labeling differences could inflate or deflate the reported figures without reflecting model behavior.
Authors: Each dataset owner supplied ground-truth labels according to their own clinical protocols and standards. As this study utilizes pre-existing datasets for external validation, inter-rater reliability data were not available to the authors. We will add a statement in the Methods section to clarify that labels were used as provided by the dataset owners without additional auditing. revision: partial
-
Referee: [Abstract] Abstract and Results: No confidence intervals, standard errors, or other measures of statistical uncertainty are reported for any AUROC value, and the quality-filtering criteria used to define the 'sufficient quality' subset are not described. Both omissions prevent assessment of the precision and scope of the headline performance claims.
Authors: We agree with the need for statistical uncertainty measures and a description of quality criteria. We will add 95% confidence intervals for all reported AUROCs, computed using bootstrap resampling. We will also describe the quality-filtering criteria applied to define the sufficient quality subset in the Methods section. revision: yes
Circularity Check
No circularity: empirical validation without derivation chain
full rationale
The paper is a straightforward empirical validation study that processes 5,588 OCT volumes with existing Pegasus-OCT software and reports observed AUROCs against ground-truth labels supplied by the dataset owners. No equations, parameter fitting, ansatzes, uniqueness theorems, or self-citations appear in the abstract or described methods; the reported performance figures are direct measurements on held-out data rather than quantities derived from or reduced to the paper's own inputs by construction. The central claim therefore contains independent empirical content and does not trigger any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Ground truth labels provided by dataset owners are accurate and unbiased across all centers and devices.
Reference graph
Works this paper leans on
-
[1]
World Health Organization, Global Data on Visual Impairments 2010, 2012
work page 2010
-
[2]
Bourne RRA, Flaxman SR, Braithwaite T, et al.; Vision Loss Expert Group. Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis. Lancet Glob Health. 2017 Sep;5(9):e888–97
work page 2017
-
[3]
Resnikoff S, Felch W, Gauthier T-M, Spivey B. The number of ophthalmologists in practice and training worldwide: a growing gap despite more than 200 000 practitioners. Br J Ophthalmol. 2012;96(6):783-787
work page 2012
-
[4]
Pennington KL, DeAngelis MM. Epidemiology of age-related macular degeneration (AMD): associations with cardiovascular disease phenotypes and lipid factors. Eye and Vision 2016; 3:34
work page 2016
-
[5]
Wong WL, Su X, Li X, et al. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Global Health 2014; Feb 2(2):e106-16
work page 2020
-
[6]
Keane PA, Liakopoulos S, Jivrajka RV, et al. Evaluation of optical coherence tomography retinal thickness parameters for use in clinical trials for neovascular age-related macular degeneration. Invest Ophthalmol Vis Sci. 2009; 50(7):3378-3385
work page 2009
-
[7]
Reznicek L, Muhr J, Ulbig M, et al. Visual acuity and central retinal thickness: fulfilment of retreatment criteria for recurrent neovascular AMD in routine clinical care. Br J Ophthalmol. 2014; 98(10):1333-1337. 16
work page 2014
-
[8]
Pron G. Optical coherence tomography monitoring strategies for A-VEGFetreated age-related macular degeneration: an evidence-based analysis. Ont Health Technol Assess Ser. 2014; 14(10):1-64. [online]. http://www.hqontario.ca/evidence/publications-and-ohtac-recommendations/ontario-health-tecno logy-assessment-series/OCT-monitoring-strategies
work page 2014
-
[9]
The Development, Commercialization, and Impact of Optical Coherence Tomography
Fujimoto J, Swanson E. The Development, Commercialization, and Impact of Optical Coherence Tomography. Invest Ophthalmol Vis Sci. 2016; 57(9): OCT1–OCT13
work page 2016
-
[10]
Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science. 1991; 254(5035):1178e1181
work page 1991
-
[11]
Evaluation of age-related macular degeneration with optical coherence tomography
Keane PA, Patel PJ, Liakopoulos S, et al. Evaluation of age-related macular degeneration with optical coherence tomography. Surv Ophthalmol. 2012; 57(5):389-414
work page 2012
-
[12]
Ilginis T, Clarke J, Patel PJ. Ophthalmic imaging. Br Med Bull. 2014; 111(1):77-88
work page 2014
-
[13]
Computer-aided diagnosis: how to move from the laboratory to the clinic
van Ginneken B, Schaefer-Prokop CM, Prokop M. Computer-aided diagnosis: how to move from the laboratory to the clinic. Radiology. 2011; 261(3):719-732
work page 2011
-
[14]
van der Heijden AA, Abramoff MD, Verbraak F, et al. Validation of automated screening for referable diabetic retinopathy with the IDx-DR device in the Hoorn Diabetes Care System. Acta Ophthalmol. 2018; 96(1):63-68
work page 2018
-
[15]
ImageNet classification with deep convolutional networks
Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional networks. NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 2012; 1097-1105
work page 2012
-
[16]
Learning convolutional feature hierarchies for visual recognition
Kavukcuoglu K, Sermanet P, Boureau Y-L, et al. Learning convolutional feature hierarchies for visual recognition. NIPS'10 Proceedings of the 25th International Conference on Neural Information Processing Systems 2010; 1090-1098
work page 2010
-
[17]
A survey on deep learning in medical image analysis
Litjens G, Kooi T, Ehteshami Bejnordi B, et al. A survey on deep learning in medical image analysis. Medical Image Analysis, 2017; 42:60-88
work page 2017
-
[18]
Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 2018; 15(11): e1002683
work page 2018
-
[19]
Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping
Caruana R, Lawrence S, Giles L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. NIPS'00 Proceedings of the 13th International Conference on Neural Information Processing Systems, 2000;381-387. 17
work page 2000
-
[20]
Abràmoff M, Lou Y, Erginay A, et al. Retina Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. Invest Ophthalmol Vis Sci. 2016; 57(13):5200-5206
work page 2016
-
[21]
Ting DSW, Cheung CYL, Lim G et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA. 2017;318(22):2211-2223
work page 2017
-
[22]
Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016; 316(22):2402-2410
work page 2016
-
[23]
Automated Identification of Diabetic Retinopathy Using Deep Learning
Gargeya R, Leng T. Automated Identification of Diabetic Retinopathy Using Deep Learning. Ophthalmology. 2017; 124(7):962-969
work page 2017
-
[24]
Screening for Diabetic Retinopathy in the Central Region of Portugal
Ribeiro L, Oliveira CM, Neves C, et al. Screening for Diabetic Retinopathy in the Central Region of Portugal. Added Value of Automated 'Disease/No Disease' Grading. Ophthalmologica 2015; 233:96-103
work page 2015
-
[25]
Tufail A, Rudisill C, Egan C, et al. Automated Diabetic Retinopathy Image Assessment Software: Diagnostic Accuracy and Cost-Effectiveness Compared with Human Graders. Ophthalmology 2017; 124(3):343-351
work page 2017
-
[26]
Graefes Arch Clin Exp Ophthalmol
Treder M, Lauermann JL, Eter N, Automated detection of exudative age-related macular degeneration in spectral domain optical coherence tomography using deep learning. Graefes Arch Clin Exp Ophthalmol. 2018; 256, 259–265
work page 2018
-
[27]
Fully automated detection and quantification of macular fluid in OCT using deep learning
Schlegl T, Waldstein SM, Bogunovic H, et al. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology 2018; 125, 549–558
work page 2018
-
[28]
Lee CS, Baughman DM, Lee AY. Deep Learning Is Effective for Classifying Normal versus Age-Related Macular Degeneration Optical Coherence Tomography Images. Ophthalmology Retina 2017; 1(4):322-327
work page 2017
-
[29]
Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning
Kermany D, Goldbaum M, Cai W, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 2018; 172:122-1131
work page 2018
-
[30]
Clinically applicable deep learning for diagnosis and referral in retinal disease
De Fauw J, Ledsam J, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine 2018; 24:1342–1350
work page 2018
-
[31]
Retinal thickness analysis by race, gender, and age using Stratus OCT
Kashani AH, Zimmer-Galler IE, Shah SM. Retinal thickness analysis by race, gender, and age using Stratus OCT. Am J Ophthalmol. 2010; 149(3):496-502. 18
work page 2010
-
[32]
Ooto S, Hangai M, Yoshimura N. Effects of sex and age on the normal retinal and choroidal structures on optical coherence tomography. Curr Eye Res. 2015; 40(2):213-25
work page 2015
-
[33]
Available at: https://www.nice.org.uk/guidance/ng82/resources [Accessed October 2018]
National Institute for Health and Care Excellence (2018) Tuberculosis (NICE Guideline 82). Available at: https://www.nice.org.uk/guidance/ng82/resources [Accessed October 2018]
work page 2018
-
[34]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations, 2015
work page 2015
-
[35]
Farsiu S, Chiu SJ, O’Connell RV, et al. Quantitative Classification of Eyes with and without Intermediate Age-related Macular Degeneration Using Optical Coherence Tomography. Ophthalmology 2014; 121(1):162-172
work page 2014
-
[36]
Macular OCT Classification using a Multi-Scale Convolutional Neural Network Ensemble
Rasti R, Rabbani H, Mehridehnavi A, Hajizadeh F. Macular OCT Classification using a Multi-Scale Convolutional Neural Network Ensemble. IEEE Trans. Med. Im. 2018; 37(4):1024-1034
work page 2018
-
[37]
Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
Buolamwini J, Gebru T. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Conference on Fairness, Accountability, and Transparency. Proceedings of Machine Learning Research 2018; 81:1–15. 19 Table 1: Independent evaluation datasets used in this paper Name Acquisition device manufacturer(s) Countries Number of acquisi...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.