Disease classification of macular Optical Coherence Tomography scans using deep learning software: validation on independent, multi-centre data

Ashley Wood; Kanwal K. Bhatia; Louise Terry; Mark S. Graham; Nicolas Jaccard; Paris Tranos; Sameer Trikha

arxiv: 1907.05164 · v1 · pith:RZPSSZGZnew · submitted 2019-07-11 · 📡 eess.IV · cs.CV· cs.LG

Disease classification of macular Optical Coherence Tomography scans using deep learning software: validation on independent, multi-centre data

Kanwal K. Bhatia , Mark S. Graham , Louise Terry , Ashley Wood , Paris Tranos , Sameer Trikha , Nicolas Jaccard This is my paper

Pith reviewed 2026-05-24 23:08 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords optical coherence tomographydeep learningretinal diseaseAMDDMEmulti-centre validationclinical decision support

0 comments

The pith

Pegasus-OCT detects macular anomalies with at least 98% AUROC across independent multi-centre OCT datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates Pegasus-OCT, a deep learning clinical decision support software, on 5,588 normal and anomalous macular OCT volumes collected from independent centres in five countries. It processes the scans and compares results against ground truth labels supplied by the dataset owners. The software achieves AUROCs of at least 98% for general macular anomalies, and at least 99% and 98% for AMD and DME on sufficient-quality scans. A sympathetic reader would care because consistent high performance across varied demographics, device manufacturers, sites and operators indicates the tool could operate reliably outside its original training environment.

Core claim

Pegasus-OCT performed with AUROCs of at least 98% for all datasets in the detection of general macular anomalies. For scans of sufficient quality, the AUROCs for general AMD and DME detection were found to be at least 99% and 98%, respectively.

What carries the argument

Pegasus-OCT deep learning software that identifies features of retinal disease from macula OCT scans and is tested for performance across heterogeneous populations.

If this is right

The software maintains performance when applied to data from different patient demographics and device manufacturers.
High detection rates hold for scans acquired at multiple independent sites by different operators.
The results support potential use of the software to help manage growing demand in eye care services for retinal disease.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Validation on external multi-centre data increases the chance the model will work in new clinics that use different OCT machines.
If performance remains stable, the software could reduce variability in initial screening for AMD and DME across regions.
Further tests could measure whether the high AUROCs translate into faster referral decisions in routine practice.

Load-bearing premise

Ground truth labels supplied by the dataset owners are accurate, consistent, and free of systematic bias across centers, devices, and operators.

What would settle it

Independent re-labelling of a random subset of the scans by a new panel of experts that produces labels differing on more than 10% of cases and drops the reported AUROCs below 90%.

read the original abstract

Purpose: To evaluate Pegasus-OCT, a clinical decision support software for the identification of features of retinal disease from macula OCT scans, across heterogenous populations involving varying patient demographics, device manufacturers, acquisition sites and operators. Methods: 5,588 normal and anomalous macular OCT volumes (162,721 B-scans), acquired at independent centres in five countries, were processed using the software. Results were evaluated against ground truth provided by the dataset owners. Results: Pegasus-OCT performed with AUROCs of at least 98% for all datasets in the detection of general macular anomalies. For scans of sufficient quality, the AUROCs for general AMD and DME detection were found to be at least 99% and 98%, respectively. Conclusions: The ability of a clinical decision support system to cater for different populations is key to its adoption. Pegasus-OCT was shown to be able to detect AMD, DME and general anomalies in OCT volumes acquired across multiple independent sites with high performance. Its use thus offers substantial promise, with the potential to alleviate the burden of growing demand in eye care services caused by retinal disease.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward multi-center validation of an existing OCT tool with high reported AUROCs, but missing details on labels and training data keep the claims hard to assess fully.

read the letter

Pegasus-OCT shows AUROCs of 98% or higher for anomaly detection across data from five countries and multiple devices. The paper runs an existing software on 5,588 independent macular OCT volumes and reports strong numbers against site-provided labels, with even higher figures for AMD and DME on quality-filtered scans. This kind of external test on heterogeneous data is the main point, and it lines up with the need for tools that handle real clinic variation in patient groups and equipment.

Referee Report

3 major / 1 minor

Summary. The manuscript evaluates Pegasus-OCT, a deep learning clinical decision support tool, for detecting general macular anomalies, AMD, and DME in 5,588 OCT volumes (162,721 B-scans) acquired across five independent centers in different countries using varied devices and operators. Performance is assessed via AUROC against ground-truth labels supplied by the dataset owners, yielding AUROCs ≥98% for general anomalies on all datasets and ≥99% (AMD) / ≥98% (DME) on quality-filtered scans.

Significance. A large-scale, multi-center, multi-device validation study is a strength for assessing real-world robustness of OCT classification software, which could support clinical adoption if the reported metrics are shown to reflect true generalization. The scale (five countries) addresses an important practical need in retinal disease screening.

major comments (3)

[Methods] Methods: No information is supplied on the composition or provenance of the training data used to develop Pegasus-OCT, nor on any steps taken to exclude overlap with the five validation datasets. This detail is load-bearing for the central claim of robust performance on 'independent' multi-centre data.
[Methods] Methods: The evaluation relies entirely on ground-truth labels supplied by the five dataset owners, yet the text provides no evidence of a unified labeling protocol, inter-rater reliability statistics, or any post-hoc audit of label consistency across centers, devices, or operators. Because all AUROCs are computed directly against these labels, systematic inter-center labeling differences could inflate or deflate the reported figures without reflecting model behavior.
[Abstract] Abstract and Results: No confidence intervals, standard errors, or other measures of statistical uncertainty are reported for any AUROC value, and the quality-filtering criteria used to define the 'sufficient quality' subset are not described. Both omissions prevent assessment of the precision and scope of the headline performance claims.

minor comments (1)

[Abstract] The abstract states results for 'general AMD and DME detection' but does not clarify whether these are binary detection tasks or multi-class; a brief clarification in the Methods would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We provide point-by-point responses to the major comments below, indicating where revisions will be made.

read point-by-point responses

Referee: [Methods] Methods: No information is supplied on the composition or provenance of the training data used to develop Pegasus-OCT, nor on any steps taken to exclude overlap with the five validation datasets. This detail is load-bearing for the central claim of robust performance on 'independent' multi-centre data.

Authors: Pegasus-OCT is a proprietary clinical decision support tool developed using training data collected from clinical sites distinct from the five validation centers described in this study. The validation datasets were acquired independently at centers in five countries with no participation in the model's development. We will revise the Methods section to include this information on the independence of the validation data. revision: yes
Referee: [Methods] Methods: The evaluation relies entirely on ground-truth labels supplied by the five dataset owners, yet the text provides no evidence of a unified labeling protocol, inter-rater reliability statistics, or any post-hoc audit of label consistency across centers, devices, or operators. Because all AUROCs are computed directly against these labels, systematic inter-center labeling differences could inflate or deflate the reported figures without reflecting model behavior.

Authors: Each dataset owner supplied ground-truth labels according to their own clinical protocols and standards. As this study utilizes pre-existing datasets for external validation, inter-rater reliability data were not available to the authors. We will add a statement in the Methods section to clarify that labels were used as provided by the dataset owners without additional auditing. revision: partial
Referee: [Abstract] Abstract and Results: No confidence intervals, standard errors, or other measures of statistical uncertainty are reported for any AUROC value, and the quality-filtering criteria used to define the 'sufficient quality' subset are not described. Both omissions prevent assessment of the precision and scope of the headline performance claims.

Authors: We agree with the need for statistical uncertainty measures and a description of quality criteria. We will add 95% confidence intervals for all reported AUROCs, computed using bootstrap resampling. We will also describe the quality-filtering criteria applied to define the sufficient quality subset in the Methods section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation without derivation chain

full rationale

The paper is a straightforward empirical validation study that processes 5,588 OCT volumes with existing Pegasus-OCT software and reports observed AUROCs against ground-truth labels supplied by the dataset owners. No equations, parameter fitting, ansatzes, uniqueness theorems, or self-citations appear in the abstract or described methods; the reported performance figures are direct measurements on held-out data rather than quantities derived from or reduced to the paper's own inputs by construction. The central claim therefore contains independent empirical content and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the unverified accuracy of external ground-truth labels and on the assumption that the test volumes are fully independent of any data used to develop the software.

axioms (1)

domain assumption Ground truth labels provided by dataset owners are accurate and unbiased across all centers and devices.
AUROC calculations are computed directly against these labels; any systematic labeling error would invalidate the reported performance figures.

pith-pipeline@v0.9.0 · 5765 in / 1250 out tokens · 21281 ms · 2026-05-24T23:08:00.158023+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

World Health Organization, Global Data on Visual Impairments 2010, 2012

work page 2010
[2]

Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis

Bourne RRA, Flaxman SR, Braithwaite T, et al.; Vision Loss Expert Group. Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis. Lancet Glob Health. 2017 Sep;5(9):e888–97

work page 2017
[3]

The number of ophthalmologists in practice and training worldwide: a growing gap despite more than 200 000 practitioners

Resnikoff S, Felch W, Gauthier T-M, Spivey B. The number of ophthalmologists in practice and training worldwide: a growing gap despite more than 200 000 practitioners. Br J Ophthalmol. 2012;96(6):783-787

work page 2012
[4]

Epidemiology of age-related macular degeneration (AMD): associations with cardiovascular disease phenotypes and lipid factors

Pennington KL, DeAngelis MM. Epidemiology of age-related macular degeneration (AMD): associations with cardiovascular disease phenotypes and lipid factors. Eye and Vision 2016; 3:34

work page 2016
[5]

Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis

Wong WL, Su X, Li X, et al. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Global Health 2014; Feb 2(2):e106-16

work page 2020
[6]

Evaluation of optical coherence tomography retinal thickness parameters for use in clinical trials for neovascular age-related macular degeneration

Keane PA, Liakopoulos S, Jivrajka RV, et al. Evaluation of optical coherence tomography retinal thickness parameters for use in clinical trials for neovascular age-related macular degeneration. Invest Ophthalmol Vis Sci. 2009; 50(7):3378-3385

work page 2009
[7]

Visual acuity and central retinal thickness: fulfilment of retreatment criteria for recurrent neovascular AMD in routine clinical care

Reznicek L, Muhr J, Ulbig M, et al. Visual acuity and central retinal thickness: fulfilment of retreatment criteria for recurrent neovascular AMD in routine clinical care. Br J Ophthalmol. 2014; 98(10):1333-1337. 16

work page 2014
[8]

Optical coherence tomography monitoring strategies for A-VEGFetreated age-related macular degeneration: an evidence-based analysis

Pron G. Optical coherence tomography monitoring strategies for A-VEGFetreated age-related macular degeneration: an evidence-based analysis. Ont Health Technol Assess Ser. 2014; 14(10):1-64. [online]. http://www.hqontario.ca/evidence/publications-and-ohtac-recommendations/ontario-health-tecno logy-assessment-series/OCT-monitoring-strategies

work page 2014
[9]

The Development, Commercialization, and Impact of Optical Coherence Tomography

Fujimoto J, Swanson E. The Development, Commercialization, and Impact of Optical Coherence Tomography. Invest Ophthalmol Vis Sci. 2016; 57(9): OCT1–OCT13

work page 2016
[10]

Optical coherence tomography

Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science. 1991; 254(5035):1178e1181

work page 1991
[11]

Evaluation of age-related macular degeneration with optical coherence tomography

Keane PA, Patel PJ, Liakopoulos S, et al. Evaluation of age-related macular degeneration with optical coherence tomography. Surv Ophthalmol. 2012; 57(5):389-414

work page 2012
[12]

Ophthalmic imaging

Ilginis T, Clarke J, Patel PJ. Ophthalmic imaging. Br Med Bull. 2014; 111(1):77-88

work page 2014
[13]

Computer-aided diagnosis: how to move from the laboratory to the clinic

van Ginneken B, Schaefer-Prokop CM, Prokop M. Computer-aided diagnosis: how to move from the laboratory to the clinic. Radiology. 2011; 261(3):719-732

work page 2011
[14]

Validation of automated screening for referable diabetic retinopathy with the IDx-DR device in the Hoorn Diabetes Care System

van der Heijden AA, Abramoff MD, Verbraak F, et al. Validation of automated screening for referable diabetic retinopathy with the IDx-DR device in the Hoorn Diabetes Care System. Acta Ophthalmol. 2018; 96(1):63-68

work page 2018
[15]

ImageNet classification with deep convolutional networks

Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional networks. NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 2012; 1097-1105

work page 2012
[16]

Learning convolutional feature hierarchies for visual recognition

Kavukcuoglu K, Sermanet P, Boureau Y-L, et al. Learning convolutional feature hierarchies for visual recognition. NIPS'10 Proceedings of the 25th International Conference on Neural Information Processing Systems 2010; 1090-1098

work page 2010
[17]

A survey on deep learning in medical image analysis

Litjens G, Kooi T, Ehteshami Bejnordi B, et al. A survey on deep learning in medical image analysis. Medical Image Analysis, 2017; 42:60-88

work page 2017
[18]

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study

Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 2018; 15(11): e1002683

work page 2018
[19]

Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping

Caruana R, Lawrence S, Giles L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. NIPS'00 Proceedings of the 13th International Conference on Neural Information Processing Systems, 2000;381-387. 17

work page 2000
[20]

Retina Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning

Abràmoff M, Lou Y, Erginay A, et al. Retina Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. Invest Ophthalmol Vis Sci. 2016; 57(13):5200-5206

work page 2016
[21]

Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes

Ting DSW, Cheung CYL, Lim G et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA. 2017;318(22):2211-2223

work page 2017
[22]

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016; 316(22):2402-2410

work page 2016
[23]

Automated Identification of Diabetic Retinopathy Using Deep Learning

Gargeya R, Leng T. Automated Identification of Diabetic Retinopathy Using Deep Learning. Ophthalmology. 2017; 124(7):962-969

work page 2017
[24]

Screening for Diabetic Retinopathy in the Central Region of Portugal

Ribeiro L, Oliveira CM, Neves C, et al. Screening for Diabetic Retinopathy in the Central Region of Portugal. Added Value of Automated 'Disease/No Disease' Grading. Ophthalmologica 2015; 233:96-103

work page 2015
[25]

Automated Diabetic Retinopathy Image Assessment Software: Diagnostic Accuracy and Cost-Effectiveness Compared with Human Graders

Tufail A, Rudisill C, Egan C, et al. Automated Diabetic Retinopathy Image Assessment Software: Diagnostic Accuracy and Cost-Effectiveness Compared with Human Graders. Ophthalmology 2017; 124(3):343-351

work page 2017
[26]

Graefes Arch Clin Exp Ophthalmol

Treder M, Lauermann JL, Eter N, Automated detection of exudative age-related macular degeneration in spectral domain optical coherence tomography using deep learning. Graefes Arch Clin Exp Ophthalmol. 2018; 256, 259–265

work page 2018
[27]

Fully automated detection and quantification of macular fluid in OCT using deep learning

Schlegl T, Waldstein SM, Bogunovic H, et al. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology 2018; 125, 549–558

work page 2018
[28]

Deep Learning Is Effective for Classifying Normal versus Age-Related Macular Degeneration Optical Coherence Tomography Images

Lee CS, Baughman DM, Lee AY. Deep Learning Is Effective for Classifying Normal versus Age-Related Macular Degeneration Optical Coherence Tomography Images. Ophthalmology Retina 2017; 1(4):322-327

work page 2017
[29]

Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning

Kermany D, Goldbaum M, Cai W, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 2018; 172:122-1131

work page 2018
[30]

Clinically applicable deep learning for diagnosis and referral in retinal disease

De Fauw J, Ledsam J, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine 2018; 24:1342–1350

work page 2018
[31]

Retinal thickness analysis by race, gender, and age using Stratus OCT

Kashani AH, Zimmer-Galler IE, Shah SM. Retinal thickness analysis by race, gender, and age using Stratus OCT. Am J Ophthalmol. 2010; 149(3):496-502. 18

work page 2010
[32]

Effects of sex and age on the normal retinal and choroidal structures on optical coherence tomography

Ooto S, Hangai M, Yoshimura N. Effects of sex and age on the normal retinal and choroidal structures on optical coherence tomography. Curr Eye Res. 2015; 40(2):213-25

work page 2015
[33]

Available at: https://www.nice.org.uk/guidance/ng82/resources [Accessed October 2018]

National Institute for Health and Care Excellence (2018) Tuberculosis (NICE Guideline 82). Available at: https://www.nice.org.uk/guidance/ng82/resources [Accessed October 2018]

work page 2018
[34]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations, 2015

work page 2015
[35]

Quantitative Classification of Eyes with and without Intermediate Age-related Macular Degeneration Using Optical Coherence Tomography

Farsiu S, Chiu SJ, O’Connell RV, et al. Quantitative Classification of Eyes with and without Intermediate Age-related Macular Degeneration Using Optical Coherence Tomography. Ophthalmology 2014; 121(1):162-172

work page 2014
[36]

Macular OCT Classification using a Multi-Scale Convolutional Neural Network Ensemble

Rasti R, Rabbani H, Mehridehnavi A, Hajizadeh F. Macular OCT Classification using a Multi-Scale Convolutional Neural Network Ensemble. IEEE Trans. Med. Im. 2018; 37(4):1024-1034

work page 2018
[37]

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

Buolamwini J, Gebru T. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Conference on Fairness, Accountability, and Transparency. Proceedings of Machine Learning Research 2018; 81:1–15. 19 Table 1: Independent evaluation datasets used in this paper Name Acquisition device manufacturer(s) Countries Number of acquisi...

work page 2018

[1] [1]

World Health Organization, Global Data on Visual Impairments 2010, 2012

work page 2010

[2] [2]

Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis

Bourne RRA, Flaxman SR, Braithwaite T, et al.; Vision Loss Expert Group. Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis. Lancet Glob Health. 2017 Sep;5(9):e888–97

work page 2017

[3] [3]

The number of ophthalmologists in practice and training worldwide: a growing gap despite more than 200 000 practitioners

Resnikoff S, Felch W, Gauthier T-M, Spivey B. The number of ophthalmologists in practice and training worldwide: a growing gap despite more than 200 000 practitioners. Br J Ophthalmol. 2012;96(6):783-787

work page 2012

[4] [4]

Epidemiology of age-related macular degeneration (AMD): associations with cardiovascular disease phenotypes and lipid factors

Pennington KL, DeAngelis MM. Epidemiology of age-related macular degeneration (AMD): associations with cardiovascular disease phenotypes and lipid factors. Eye and Vision 2016; 3:34

work page 2016

[5] [5]

Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis

Wong WL, Su X, Li X, et al. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Global Health 2014; Feb 2(2):e106-16

work page 2020

[6] [6]

Evaluation of optical coherence tomography retinal thickness parameters for use in clinical trials for neovascular age-related macular degeneration

Keane PA, Liakopoulos S, Jivrajka RV, et al. Evaluation of optical coherence tomography retinal thickness parameters for use in clinical trials for neovascular age-related macular degeneration. Invest Ophthalmol Vis Sci. 2009; 50(7):3378-3385

work page 2009

[7] [7]

Visual acuity and central retinal thickness: fulfilment of retreatment criteria for recurrent neovascular AMD in routine clinical care

Reznicek L, Muhr J, Ulbig M, et al. Visual acuity and central retinal thickness: fulfilment of retreatment criteria for recurrent neovascular AMD in routine clinical care. Br J Ophthalmol. 2014; 98(10):1333-1337. 16

work page 2014

[8] [8]

Optical coherence tomography monitoring strategies for A-VEGFetreated age-related macular degeneration: an evidence-based analysis

Pron G. Optical coherence tomography monitoring strategies for A-VEGFetreated age-related macular degeneration: an evidence-based analysis. Ont Health Technol Assess Ser. 2014; 14(10):1-64. [online]. http://www.hqontario.ca/evidence/publications-and-ohtac-recommendations/ontario-health-tecno logy-assessment-series/OCT-monitoring-strategies

work page 2014

[9] [9]

The Development, Commercialization, and Impact of Optical Coherence Tomography

Fujimoto J, Swanson E. The Development, Commercialization, and Impact of Optical Coherence Tomography. Invest Ophthalmol Vis Sci. 2016; 57(9): OCT1–OCT13

work page 2016

[10] [10]

Optical coherence tomography

Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science. 1991; 254(5035):1178e1181

work page 1991

[11] [11]

Evaluation of age-related macular degeneration with optical coherence tomography

Keane PA, Patel PJ, Liakopoulos S, et al. Evaluation of age-related macular degeneration with optical coherence tomography. Surv Ophthalmol. 2012; 57(5):389-414

work page 2012

[12] [12]

Ophthalmic imaging

Ilginis T, Clarke J, Patel PJ. Ophthalmic imaging. Br Med Bull. 2014; 111(1):77-88

work page 2014

[13] [13]

Computer-aided diagnosis: how to move from the laboratory to the clinic

van Ginneken B, Schaefer-Prokop CM, Prokop M. Computer-aided diagnosis: how to move from the laboratory to the clinic. Radiology. 2011; 261(3):719-732

work page 2011

[14] [14]

Validation of automated screening for referable diabetic retinopathy with the IDx-DR device in the Hoorn Diabetes Care System

van der Heijden AA, Abramoff MD, Verbraak F, et al. Validation of automated screening for referable diabetic retinopathy with the IDx-DR device in the Hoorn Diabetes Care System. Acta Ophthalmol. 2018; 96(1):63-68

work page 2018

[15] [15]

ImageNet classification with deep convolutional networks

Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional networks. NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 2012; 1097-1105

work page 2012

[16] [16]

Learning convolutional feature hierarchies for visual recognition

Kavukcuoglu K, Sermanet P, Boureau Y-L, et al. Learning convolutional feature hierarchies for visual recognition. NIPS'10 Proceedings of the 25th International Conference on Neural Information Processing Systems 2010; 1090-1098

work page 2010

[17] [17]

A survey on deep learning in medical image analysis

Litjens G, Kooi T, Ehteshami Bejnordi B, et al. A survey on deep learning in medical image analysis. Medical Image Analysis, 2017; 42:60-88

work page 2017

[18] [18]

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study

Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 2018; 15(11): e1002683

work page 2018

[19] [19]

Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping

Caruana R, Lawrence S, Giles L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. NIPS'00 Proceedings of the 13th International Conference on Neural Information Processing Systems, 2000;381-387. 17

work page 2000

[20] [20]

Retina Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning

Abràmoff M, Lou Y, Erginay A, et al. Retina Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. Invest Ophthalmol Vis Sci. 2016; 57(13):5200-5206

work page 2016

[21] [21]

Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes

Ting DSW, Cheung CYL, Lim G et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA. 2017;318(22):2211-2223

work page 2017

[22] [22]

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016; 316(22):2402-2410

work page 2016

[23] [23]

Automated Identification of Diabetic Retinopathy Using Deep Learning

Gargeya R, Leng T. Automated Identification of Diabetic Retinopathy Using Deep Learning. Ophthalmology. 2017; 124(7):962-969

work page 2017

[24] [24]

Screening for Diabetic Retinopathy in the Central Region of Portugal

Ribeiro L, Oliveira CM, Neves C, et al. Screening for Diabetic Retinopathy in the Central Region of Portugal. Added Value of Automated 'Disease/No Disease' Grading. Ophthalmologica 2015; 233:96-103

work page 2015

[25] [25]

Automated Diabetic Retinopathy Image Assessment Software: Diagnostic Accuracy and Cost-Effectiveness Compared with Human Graders

Tufail A, Rudisill C, Egan C, et al. Automated Diabetic Retinopathy Image Assessment Software: Diagnostic Accuracy and Cost-Effectiveness Compared with Human Graders. Ophthalmology 2017; 124(3):343-351

work page 2017

[26] [26]

Graefes Arch Clin Exp Ophthalmol

Treder M, Lauermann JL, Eter N, Automated detection of exudative age-related macular degeneration in spectral domain optical coherence tomography using deep learning. Graefes Arch Clin Exp Ophthalmol. 2018; 256, 259–265

work page 2018

[27] [27]

Fully automated detection and quantification of macular fluid in OCT using deep learning

Schlegl T, Waldstein SM, Bogunovic H, et al. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology 2018; 125, 549–558

work page 2018

[28] [28]

Deep Learning Is Effective for Classifying Normal versus Age-Related Macular Degeneration Optical Coherence Tomography Images

Lee CS, Baughman DM, Lee AY. Deep Learning Is Effective for Classifying Normal versus Age-Related Macular Degeneration Optical Coherence Tomography Images. Ophthalmology Retina 2017; 1(4):322-327

work page 2017

[29] [29]

Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning

Kermany D, Goldbaum M, Cai W, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 2018; 172:122-1131

work page 2018

[30] [30]

Clinically applicable deep learning for diagnosis and referral in retinal disease

De Fauw J, Ledsam J, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine 2018; 24:1342–1350

work page 2018

[31] [31]

Retinal thickness analysis by race, gender, and age using Stratus OCT

Kashani AH, Zimmer-Galler IE, Shah SM. Retinal thickness analysis by race, gender, and age using Stratus OCT. Am J Ophthalmol. 2010; 149(3):496-502. 18

work page 2010

[32] [32]

Effects of sex and age on the normal retinal and choroidal structures on optical coherence tomography

Ooto S, Hangai M, Yoshimura N. Effects of sex and age on the normal retinal and choroidal structures on optical coherence tomography. Curr Eye Res. 2015; 40(2):213-25

work page 2015

[33] [33]

Available at: https://www.nice.org.uk/guidance/ng82/resources [Accessed October 2018]

National Institute for Health and Care Excellence (2018) Tuberculosis (NICE Guideline 82). Available at: https://www.nice.org.uk/guidance/ng82/resources [Accessed October 2018]

work page 2018

[34] [34]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations, 2015

work page 2015

[35] [35]

Quantitative Classification of Eyes with and without Intermediate Age-related Macular Degeneration Using Optical Coherence Tomography

Farsiu S, Chiu SJ, O’Connell RV, et al. Quantitative Classification of Eyes with and without Intermediate Age-related Macular Degeneration Using Optical Coherence Tomography. Ophthalmology 2014; 121(1):162-172

work page 2014

[36] [36]

Macular OCT Classification using a Multi-Scale Convolutional Neural Network Ensemble

Rasti R, Rabbani H, Mehridehnavi A, Hajizadeh F. Macular OCT Classification using a Multi-Scale Convolutional Neural Network Ensemble. IEEE Trans. Med. Im. 2018; 37(4):1024-1034

work page 2018

[37] [37]

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

Buolamwini J, Gebru T. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Conference on Fairness, Accountability, and Transparency. Proceedings of Machine Learning Research 2018; 81:1–15. 19 Table 1: Independent evaluation datasets used in this paper Name Acquisition device manufacturer(s) Countries Number of acquisi...

work page 2018