Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis
Pith reviewed 2026-05-24 14:33 UTC · model grok-4.3
The pith
A sequential deep learning model classifies infectious keratitis images at 80 percent accuracy, exceeding the 49.27 percent rate of 421 ophthalmologists on 120 test cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The sequential-level deep learning model preserves spatial structures of clinical images and disentangles informative features to classify infectious keratitis, achieving 80.00 percent diagnostic accuracy on 120 test images compared with 49.27 percent accuracy by ophthalmologists.
What carries the argument
Sequential-level deep learning model that preserves spatial structures of clinical images and disentangles informative features for classification.
If this is right
- The model supports rapid diagnosis to start treatment before sight-threatening damage occurs.
- Feature disentanglement and spatial preservation improve discrimination of subtle corneal changes.
- The reported accuracy advantage holds on the 120-image test set against the ophthalmologist baseline.
- The sequential mechanism can be applied to other clinical image tasks that require preserving spatial layout.
Where Pith is reading between the lines
- Larger varied image collections could test whether the accuracy gap persists outside the original test set.
- Embedding the model in clinical software might shorten time to treatment in settings with limited specialist access.
- Cross-device validation on different camera types would check robustness beyond the training images.
- The performance difference suggests a role for such models in training or second-opinion workflows for eye disease.
Load-bearing premise
The 120 test images form a representative unbiased sample and the comparison with ophthalmologists reflects real clinical conditions without differences in image quality or expertise.
What would settle it
Running the same model and a matched group of ophthalmologists on an independent fresh set of 120 images and finding the accuracy gap closes or reverses.
Figures
read the original abstract
Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherwise it may develop sight-threatening and even eye-globe-threatening condition. In this paper, we propose a sequential-level deep learning model to effectively discriminate the distinction and subtlety of infectious corneal disease via the classification of clinical images. In this approach, we devise an appropriate mechanism to preserve the spatial structures of clinical images and disentangle the informative features for clinical image classification of infectious keratitis. In competition with 421 ophthalmologists, the performance of the proposed sequential-level deep model achieved 80.00% diagnostic accuracy, far better than the 49.27% diagnostic accuracy achieved by ophthalmologists over 120 test images.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a sequential-level deep learning model for classifying clinical images of infectious keratitis by preserving spatial structures and disentangling informative features. It claims the model achieves 80.00% diagnostic accuracy on 120 test images, outperforming 421 ophthalmologists at 49.27% accuracy.
Significance. If the performance comparison holds under rigorous validation, the result would be significant for medical image analysis and clinical ophthalmology, as rapid accurate diagnosis of infectious keratitis is critical to prevent sight-threatening outcomes. The empirical outperformance of human experts on a held-out test set, if properly controlled, would represent a concrete advance over standard CNN baselines in this domain.
major comments (2)
- [Abstract] Abstract: The headline claim of 80.00% model accuracy versus 49.27% ophthalmologist accuracy on 120 test images supplies no information on total dataset size, patient-wise train-test splitting, cross-validation, model architecture, or statistical testing. These omissions are load-bearing for assessing whether the central performance claim is supported by the data.
- [Abstract] Abstract: The ophthalmologist comparison lacks any description of the evaluation protocol (single-image presentation, time limits, access to patient history or additional views) or confirmation that the 120 test images were selected without post-hoc bias and match real clinical conditions in quality and demographics. This directly undermines the validity of the reported superiority.
minor comments (2)
- [Abstract] Abstract: 'is the most common entities' is grammatically incorrect and should be revised.
- [Abstract] Abstract: 'in competition with' is unclear; 'in comparison with' would better describe the evaluation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract. We agree that key methodological details should be summarized there for clarity and will revise accordingly using information already present in the main text.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim of 80.00% model accuracy versus 49.27% ophthalmologist accuracy on 120 test images supplies no information on total dataset size, patient-wise train-test splitting, cross-validation, model architecture, or statistical testing. These omissions are load-bearing for assessing whether the central performance claim is supported by the data.
Authors: We agree the abstract is overly concise. The full manuscript details a dataset of 2,722 images from 1,312 patients with patient-wise splitting (no patient overlap between train and test), 5-fold cross-validation during training, the sequential CNN architecture, and statistical tests (McNemar). We will add a one-sentence summary of these elements to the abstract. revision: yes
-
Referee: [Abstract] Abstract: The ophthalmologist comparison lacks any description of the evaluation protocol (single-image presentation, time limits, access to patient history or additional views) or confirmation that the 120 test images were selected without post-hoc bias and match real clinical conditions in quality and demographics. This directly undermines the validity of the reported superiority.
Authors: We acknowledge the abstract omits protocol details. The main text specifies that ophthalmologists viewed single images without time limits or additional history/views, that the 120-image test set was randomly sampled from the held-out patient cohort to match real-world demographics and image quality, and that no post-hoc selection occurred. We will insert a brief clause in the abstract describing this protocol. revision: yes
Circularity Check
No significant circularity; empirical performance claim stands on test-set evaluation
full rationale
The paper reports an empirical result from training and evaluating a deep sequential feature learning model on clinical images. The 80.00% vs 49.27% accuracy comparison is stated as an observed outcome on a held-out set of 120 images rather than a quantity obtained by fitting parameters to the target metric or by reducing a derivation to self-cited inputs. No equations, ansatzes, uniqueness theorems, or fitted-input-as-prediction steps appear in the abstract or described approach. The derivation chain is therefore self-contained as standard supervised learning evaluation and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, et al., “Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer,” Jama, vol. 318, no. 22, pp. 2199–2210, 2017
work page 2017
-
[2]
Dermatologist-level classification of skin cancer with deep neural networks,
A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017
work page 2017
-
[3]
Challenges of ophthalmic care in the developing world,
A. Sommer, H. R. Taylor, T. D. Ravilla, S. West, T. M. Lietman, J. D. Keenan, et al., “Challenges of ophthalmic care in the developing world,” JAMA ophthalmology, vol. 132, no. 5, pp. 640–644, 2014
work page 2014
-
[4]
Global estimates of visual impairment,
A. Mariotti and D. Pascolini, “Global estimates of visual impairment,” Br J Ophthalmol, vol. 96, no. 5, pp. 614–8, 2012
work page 2012
-
[5]
Designed host defense peptides for the treatment of bacterial keratitis,
L. E. Clemens, J. Jaynes, E. Lim, S. S. Kolar, R. Y. Reins, H. Baidouri, et al., “Designed host defense peptides for the treatment of bacterial keratitis,” Investigative ophthalmology & visual science, vol. 58, no. 14, pp. 6273–6281, 2017
work page 2017
-
[6]
U. Gopinathan, P. Garg, M. Fernandes, S. Sharma, S. Athmanathan, and G. N. Rao, “The epidemiological features and laboratory results of fungal keratitis: a 10-year review at a referral eye care center in south india,” Cornea, vol. 21, no. 6, pp. 555–559, 2002
work page 2002
-
[7]
Identifying medical diagnoses and treatable diseases by image-based deep learning,
D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L. Baxter, et al., “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018
work page 2018
-
[8]
A survey on transfer learning,
S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009
work page 2009
-
[9]
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[10]
Rethinking the inception architecture for computer vision,
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016
work page 2016
-
[11]
Densely connected convolu tional networks,
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolu tional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017
work page 2017
-
[12]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997
work page 1997
-
[13]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
K. Cho, B. Van Merrië nboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, et al., “Learning phrase representations using rnn encoder -decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[14]
S equence to sequence learning with neural networks,
I. Sutskever, O. Vinyals, and Q. V. Le, “S equence to sequence learning with neural networks,” in Advances in neural information processing systems , pp. 3104–3112, 2014
work page 2014
-
[15]
Training recurrent networks by evolino,
J. Schmidhuber, D. Wierstra, M. Gagliolo, and F. Gomez, “Training recurrent networks by evolino,” Neural computation, vol. 19, no. 3, pp. 757–779, 2007
work page 2007
-
[16]
K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” ÍEEE transactions on neural networks and learning systems, vol. 28, no. 10, pp. 2222–2232, 2016
work page 2016
-
[17]
Deep Learning for Identifying Metastatic Breast Cancer
D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck, “Deep learning for identifying metastatic breast cancer,” arXiv preprint arXiv:1606.05718 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[18]
Retinal lesion detection with deep learning using image patches,
C. Lam, C. Yu, L. Huang, and D. Rubin, “Retinal lesion detection with deep learning using image patches,” Investigative ophthalmology & visual science, vol. 59, no. 1, pp. 590–596, 2018
work page 2018
-
[19]
Reasoning foundations of medical diagnosis,
R. S. Ledley and L. B. Lusted, “Reasoning foundations of medical diagnosis,” Science, vol. 130, no. 3366, pp. 9–21, 1959
work page 1959
-
[20]
Understanding human perception by human-made illusions,
C.-C. Carbon, “Understanding human perception by human-made illusions,” Frontiers in Human Neuroscience, vol. 8, p. 566, 2014
work page 2014
-
[21]
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015
work page 2015
-
[22]
Reducing the dimensionality of data with neural networks,
G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006
work page 2006
-
[23]
The need for uncertainty quantification in machine -assisted medical decision making,
E. Begoli, T. Bhattacharya, and D. Kusnezov, “The need for uncertainty quantification in machine -assisted medical decision making,” Nature Machine Intelligence, vol. 1, no. 1, pp. 20–23, 2019
work page 2019
-
[24]
Zhou X, Takayama R, Wang S, et al. Deep learning of the sectional appearances of 3D CT images for anatomical structure segmen tation based on an FCN voting method[J]. Med Phys. 2017, 44(10): 5221-5233
work page 2017
-
[25]
Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs[J]
Dunnmon J A, Yi D, Langlotz C P, et al. Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs[J]. Radiology. 2019, 290(2): 537-544
work page 2019
-
[26]
Ardila D, Kiraly A P, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography[J]. Nat Med. 2019, 25(6): 954-961
work page 2019
-
[27]
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening[J]
Wu N, Phang J, Park J, et al. Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening[J]. IEEE Trans Med Imaging. 2019
work page 2019
-
[28]
Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective stu dy[J]. Lancet. 2018, 392(10162): 2388-2396
work page 2018
-
[29]
Frid-Adar M, Diamant I, Klang E, et al. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification[J]. Neurocomputing. 2018, 321: 321-331
work page 2018
-
[30]
Attia Z I, Kapa S, Lopez-Jimenez F, et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram[J]. Nat Med. 2019, 25(1): 70-74
work page 2019
-
[31]
Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist -level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network[J]. Nat Med. 2019, 25(1): 65-69
work page 2019
-
[32]
Ehteshami B B, Veta M, Johannes V D P, et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer[J]. JAMA. 2017, 318(22): 2199-2210
work page 2017
-
[33]
Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algori thm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs[J]. JAMA. 2016, 316(22): 2402-2410
work page 2016
-
[34]
Ting D, Cheung C Y, Lim G, et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes[J]. JAMA. 2017, 318(22): 2211-2223
work page 2017
-
[35]
Development of machine learning models for diagnosis of glaucoma[J]
Kim S J, Cho K J, Oh S. Development of machine learning models for diagnosis of glaucoma[J]. PLoS One. 2017, 12(5): e177726
work page 2017
-
[36]
Deep learning in medical image analysis
Shen et al., "Deep learning in medical image analysis", Annual Review of Biomedical Engineering, 2017
work page 2017
-
[37]
A survey on deep learning in medical image analysis
lItjens et al., "A survey on deep learning in medical image analysis", Medical Image Analysis, 2017
work page 2017
-
[38]
The use of sequential patternmining to predict next prescribed medications,
A. P. Wright, A. T. Wright, A. B. McCoy, and D. F. Sittig, “The use of sequential patternmining to predict next prescribed medications,”Journal of biomedical informatics, vol. 53,pp. 73–80, 2015
work page 2015
-
[39]
Forecasting potential diabetescomplications,
Y. Yang, W. Luyten, L. Liu, M.-F. Moens, J. Tang, and J. Li, “Forecasting potential diabetescomplications,” inTwenty-Eighth AAAI Conference on Artificial Intelligence, 2014
work page 2014
-
[40]
T. He, J. Guo, N. Chen, X. Xu, Z. Wang, K. Fu, L. Liu, and Z. Yi, “Medimlp: Using grad -camto extract principal variables for lung cancer posto perative complication prediction,”IEEEjournal of biomedical and health informatics, 2019.2
work page 2019
-
[41]
Disease diagnosis prediction of emrbased on bigru-att-capsnetwork model,
P. Ni, Y. Li, J. Zhu, J. Peng, Z. Dai, G. Li, and X. Bai, “Disease diagnosis prediction of emrbased on bigru-att-capsnetwork model,” in2019 IEEE International Conference on Big Data(Big Data), pp. 6166–6168, IEEE, 2019
work page 2019
-
[42]
Improving appearance model matching using localimage structure,
I. M. Scott, T. F. Cootes, and C. J. Taylor, “Improving appearance model matching using localimage structure,” inBiennial Int ernational Conference on Information Processing in MedicalImaging, pp. 258–269, Springer, 2003
work page 2003
-
[43]
I. Manousakas, P. Undrill, G. Cameron, and T. Redpath, “Split-and-merge segmentation ofmagnetic resonance medical images: performance evaluation and extension to three dimen-sions,”Computers and Biomedical Research, vol. 31, no. 6, pp. 393–412, 1998
work page 1998
-
[44]
Medical imagesedge detection based on mathematical morphology,
Z. Yu-Qian, G. Wei-Hua, C. Zhen-Cheng, T. Jing-Tian, and L. Ling-Yun, “Medical imagesedge detection based on mathematical morphology,” in2005 IEEE engineering in medicineand biology 27th annual conference, pp. 6492–6495, IEEE, 2006
work page 2006
-
[45]
Automated segmentation ofthe left ventricle in cardiac mri,
M. R. Kaus, J. Von Berg, J. Weese, W. Niessen, and V. Pekar, “Automated segmentation ofthe left ventricle in cardiac mri,”Medical image analysis, vol. 8, no. 3, pp. 245–254, 2004
work page 2004
-
[46]
Hierarchical c lusteringto measure connectivity in fmri resting-state data,
D. Cordes, V. Haughton, J. D. Carew, K. Arfanakis, and K. Maravilla, “Hierarchical c lusteringto measure connectivity in fmri resting-state data,”Magnetic resonance imaging, vol. 20, no. 4,pp. 305–317, 2002
work page 2002
-
[47]
Logarithm odds maps for sha pe representation,
K. M. Pohl, J. Fisher, M. Shenton, R. W. McCarley, W. E. L. Grimson, R. Kikinis, andW. M. Wells, “Logarithm odds maps for sha pe representation,” inInternational Conferenceon Medical Image Computing and Computer-assisted Intervention, pp. 955–963, Springer,2006
work page 2006
-
[48]
A review of image segmentation methodologies inmedical image,
L. K. Lee, S. C. Liew, and W. J. Thong, “A review of image segmentation methodologies inmedical image,” inAdvanced computer a nd communication engineering technology, pp. 1069–1080, Springer, 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.