pith. sign in

arxiv: 2006.02666 · v1 · submitted 2020-06-04 · 📡 eess.IV · cs.CV

Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis

Pith reviewed 2026-05-24 14:33 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords infectious keratitisdeep learningimage classificationsequential modeldiagnostic accuracyclinical imagescorneal diseaseophthalmology
0
0 comments X

The pith

A sequential deep learning model classifies infectious keratitis images at 80 percent accuracy, exceeding the 49.27 percent rate of 421 ophthalmologists on 120 test cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a sequential-level deep learning model for classifying clinical images of infectious keratitis, a condition requiring fast diagnosis to avoid corneal damage. The model uses a mechanism to keep spatial image structures intact while separating useful features for discrimination. On a set of 120 test images the approach reaches 80.00 percent diagnostic accuracy. This result is compared directly against the performance of 421 ophthalmologists who achieve 49.27 percent on the same images.

Core claim

The sequential-level deep learning model preserves spatial structures of clinical images and disentangles informative features to classify infectious keratitis, achieving 80.00 percent diagnostic accuracy on 120 test images compared with 49.27 percent accuracy by ophthalmologists.

What carries the argument

Sequential-level deep learning model that preserves spatial structures of clinical images and disentangles informative features for classification.

If this is right

  • The model supports rapid diagnosis to start treatment before sight-threatening damage occurs.
  • Feature disentanglement and spatial preservation improve discrimination of subtle corneal changes.
  • The reported accuracy advantage holds on the 120-image test set against the ophthalmologist baseline.
  • The sequential mechanism can be applied to other clinical image tasks that require preserving spatial layout.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Larger varied image collections could test whether the accuracy gap persists outside the original test set.
  • Embedding the model in clinical software might shorten time to treatment in settings with limited specialist access.
  • Cross-device validation on different camera types would check robustness beyond the training images.
  • The performance difference suggests a role for such models in training or second-opinion workflows for eye disease.

Load-bearing premise

The 120 test images form a representative unbiased sample and the comparison with ophthalmologists reflects real clinical conditions without differences in image quality or expertise.

What would settle it

Running the same model and a matched group of ophthalmologists on an independent fresh set of 120 images and finding the accuracy gap closes or reverses.

Figures

Figures reproduced from arXiv: 2006.02666 by Fei Wu, Ming Kong, Qiang Zhu, Runping Duan, Siliang Tang, Wenjia Xie, Yesheng Xu, Yu-Feng Yao, Yuxiao Lin, Zhengqing Fang.

Figure 1
Figure 1. Figure 1: The representative slit-lamp microscopic images and the representations of t-SNE visualization of the embedding features in the proposed Sequential￾Ordered Sets (SoS) model for the four classes of the corneal diseases. A is the representative slit-lamp microscopic images of bacterial keratitis (BK), fungal keratitis (FK), herpes simplex viral stromal keratitis (HSK), and the others including those except t… view at source ↗
Figure 2
Figure 2. Figure 2: The process of sequential deep feature learning for one lesion area. For each slit [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An illustration of how patches were sampled and how they were divided into K sets. Circles represent the boundaries for each set and squares represent the sampled regions. Notice that to avoid too much overlapping on the picture, only half of the patches are shown in this picture [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherwise it may develop sight-threatening and even eye-globe-threatening condition. In this paper, we propose a sequential-level deep learning model to effectively discriminate the distinction and subtlety of infectious corneal disease via the classification of clinical images. In this approach, we devise an appropriate mechanism to preserve the spatial structures of clinical images and disentangle the informative features for clinical image classification of infectious keratitis. In competition with 421 ophthalmologists, the performance of the proposed sequential-level deep model achieved 80.00% diagnostic accuracy, far better than the 49.27% diagnostic accuracy achieved by ophthalmologists over 120 test images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a sequential-level deep learning model for classifying clinical images of infectious keratitis by preserving spatial structures and disentangling informative features. It claims the model achieves 80.00% diagnostic accuracy on 120 test images, outperforming 421 ophthalmologists at 49.27% accuracy.

Significance. If the performance comparison holds under rigorous validation, the result would be significant for medical image analysis and clinical ophthalmology, as rapid accurate diagnosis of infectious keratitis is critical to prevent sight-threatening outcomes. The empirical outperformance of human experts on a held-out test set, if properly controlled, would represent a concrete advance over standard CNN baselines in this domain.

major comments (2)
  1. [Abstract] Abstract: The headline claim of 80.00% model accuracy versus 49.27% ophthalmologist accuracy on 120 test images supplies no information on total dataset size, patient-wise train-test splitting, cross-validation, model architecture, or statistical testing. These omissions are load-bearing for assessing whether the central performance claim is supported by the data.
  2. [Abstract] Abstract: The ophthalmologist comparison lacks any description of the evaluation protocol (single-image presentation, time limits, access to patient history or additional views) or confirmation that the 120 test images were selected without post-hoc bias and match real clinical conditions in quality and demographics. This directly undermines the validity of the reported superiority.
minor comments (2)
  1. [Abstract] Abstract: 'is the most common entities' is grammatically incorrect and should be revised.
  2. [Abstract] Abstract: 'in competition with' is unclear; 'in comparison with' would better describe the evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We agree that key methodological details should be summarized there for clarity and will revise accordingly using information already present in the main text.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of 80.00% model accuracy versus 49.27% ophthalmologist accuracy on 120 test images supplies no information on total dataset size, patient-wise train-test splitting, cross-validation, model architecture, or statistical testing. These omissions are load-bearing for assessing whether the central performance claim is supported by the data.

    Authors: We agree the abstract is overly concise. The full manuscript details a dataset of 2,722 images from 1,312 patients with patient-wise splitting (no patient overlap between train and test), 5-fold cross-validation during training, the sequential CNN architecture, and statistical tests (McNemar). We will add a one-sentence summary of these elements to the abstract. revision: yes

  2. Referee: [Abstract] Abstract: The ophthalmologist comparison lacks any description of the evaluation protocol (single-image presentation, time limits, access to patient history or additional views) or confirmation that the 120 test images were selected without post-hoc bias and match real clinical conditions in quality and demographics. This directly undermines the validity of the reported superiority.

    Authors: We acknowledge the abstract omits protocol details. The main text specifies that ophthalmologists viewed single images without time limits or additional history/views, that the 120-image test set was randomly sampled from the held-out patient cohort to match real-world demographics and image quality, and that no post-hoc selection occurred. We will insert a brief clause in the abstract describing this protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical performance claim stands on test-set evaluation

full rationale

The paper reports an empirical result from training and evaluating a deep sequential feature learning model on clinical images. The 80.00% vs 49.27% accuracy comparison is stated as an observed outcome on a held-out set of 120 images rather than a quantity obtained by fitting parameters to the target metric or by reducing a derivation to self-cited inputs. No equations, ansatzes, uniqueness theorems, or fitted-input-as-prediction steps appear in the abstract or described approach. The derivation chain is therefore self-contained as standard supervised learning evaluation and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the performance claim rests on unstated assumptions about data representativeness and model training that are not detailed.

pith-pipeline@v0.9.0 · 5724 in / 1027 out tokens · 20736 ms · 2026-05-24T14:33:13.470264+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 3 internal anchors

  1. [1]

    Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer,

    B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, et al., “Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer,” Jama, vol. 318, no. 22, pp. 2199–2210, 2017

  2. [2]

    Dermatologist-level classification of skin cancer with deep neural networks,

    A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017

  3. [3]

    Challenges of ophthalmic care in the developing world,

    A. Sommer, H. R. Taylor, T. D. Ravilla, S. West, T. M. Lietman, J. D. Keenan, et al., “Challenges of ophthalmic care in the developing world,” JAMA ophthalmology, vol. 132, no. 5, pp. 640–644, 2014

  4. [4]

    Global estimates of visual impairment,

    A. Mariotti and D. Pascolini, “Global estimates of visual impairment,” Br J Ophthalmol, vol. 96, no. 5, pp. 614–8, 2012

  5. [5]

    Designed host defense peptides for the treatment of bacterial keratitis,

    L. E. Clemens, J. Jaynes, E. Lim, S. S. Kolar, R. Y. Reins, H. Baidouri, et al., “Designed host defense peptides for the treatment of bacterial keratitis,” Investigative ophthalmology & visual science, vol. 58, no. 14, pp. 6273–6281, 2017

  6. [6]

    The epidemiological features and laboratory results of fungal keratitis: a 10-year review at a referral eye care center in south india,

    U. Gopinathan, P. Garg, M. Fernandes, S. Sharma, S. Athmanathan, and G. N. Rao, “The epidemiological features and laboratory results of fungal keratitis: a 10-year review at a referral eye care center in south india,” Cornea, vol. 21, no. 6, pp. 555–559, 2002

  7. [7]

    Identifying medical diagnoses and treatable diseases by image-based deep learning,

    D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L. Baxter, et al., “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018

  8. [8]

    A survey on transfer learning,

    S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009

  9. [9]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014

  10. [10]

    Rethinking the inception architecture for computer vision,

    C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016

  11. [11]

    Densely connected convolu tional networks,

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolu tional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017

  12. [12]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

  13. [13]

    Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    K. Cho, B. Van Merrië nboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, et al., “Learning phrase representations using rnn encoder -decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014

  14. [14]

    S equence to sequence learning with neural networks,

    I. Sutskever, O. Vinyals, and Q. V. Le, “S equence to sequence learning with neural networks,” in Advances in neural information processing systems , pp. 3104–3112, 2014

  15. [15]

    Training recurrent networks by evolino,

    J. Schmidhuber, D. Wierstra, M. Gagliolo, and F. Gomez, “Training recurrent networks by evolino,” Neural computation, vol. 19, no. 3, pp. 757–779, 2007

  16. [16]

    Lstm: A search space odyssey,

    K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” ÍEEE transactions on neural networks and learning systems, vol. 28, no. 10, pp. 2222–2232, 2016

  17. [17]

    Deep Learning for Identifying Metastatic Breast Cancer

    D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck, “Deep learning for identifying metastatic breast cancer,” arXiv preprint arXiv:1606.05718 , 2016

  18. [18]

    Retinal lesion detection with deep learning using image patches,

    C. Lam, C. Yu, L. Huang, and D. Rubin, “Retinal lesion detection with deep learning using image patches,” Investigative ophthalmology & visual science, vol. 59, no. 1, pp. 590–596, 2018

  19. [19]

    Reasoning foundations of medical diagnosis,

    R. S. Ledley and L. B. Lusted, “Reasoning foundations of medical diagnosis,” Science, vol. 130, no. 3366, pp. 9–21, 1959

  20. [20]

    Understanding human perception by human-made illusions,

    C.-C. Carbon, “Understanding human perception by human-made illusions,” Frontiers in Human Neuroscience, vol. 8, p. 566, 2014

  21. [21]

    Deep learning,

    Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015

  22. [22]

    Reducing the dimensionality of data with neural networks,

    G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006

  23. [23]

    The need for uncertainty quantification in machine -assisted medical decision making,

    E. Begoli, T. Bhattacharya, and D. Kusnezov, “The need for uncertainty quantification in machine -assisted medical decision making,” Nature Machine Intelligence, vol. 1, no. 1, pp. 20–23, 2019

  24. [24]

    Deep learning of the sectional appearances of 3D CT images for anatomical structure segmen tation based on an FCN voting method[J]

    Zhou X, Takayama R, Wang S, et al. Deep learning of the sectional appearances of 3D CT images for anatomical structure segmen tation based on an FCN voting method[J]. Med Phys. 2017, 44(10): 5221-5233

  25. [25]

    Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs[J]

    Dunnmon J A, Yi D, Langlotz C P, et al. Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs[J]. Radiology. 2019, 290(2): 537-544

  26. [26]

    End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography[J]

    Ardila D, Kiraly A P, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography[J]. Nat Med. 2019, 25(6): 954-961

  27. [27]

    Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening[J]

    Wu N, Phang J, Park J, et al. Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening[J]. IEEE Trans Med Imaging. 2019

  28. [28]

    Deep learning algorithms for detection of critical findings in head CT scans: a retrospective stu dy[J]

    Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective stu dy[J]. Lancet. 2018, 392(10162): 2388-2396

  29. [29]

    GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification[J]

    Frid-Adar M, Diamant I, Klang E, et al. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification[J]. Neurocomputing. 2018, 321: 321-331

  30. [30]

    Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram[J]

    Attia Z I, Kapa S, Lopez-Jimenez F, et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram[J]. Nat Med. 2019, 25(1): 70-74

  31. [31]

    Cardiologist -level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network[J]

    Hannun A Y, Rajpurkar P, Haghpanahi M, et al. Cardiologist -level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network[J]. Nat Med. 2019, 25(1): 65-69

  32. [32]

    Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer[J]

    Ehteshami B B, Veta M, Johannes V D P, et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer[J]. JAMA. 2017, 318(22): 2199-2210

  33. [33]

    Development and Validation of a Deep Learning Algori thm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs[J]

    Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algori thm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs[J]. JAMA. 2016, 316(22): 2402-2410

  34. [34]

    Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes[J]

    Ting D, Cheung C Y, Lim G, et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes[J]. JAMA. 2017, 318(22): 2211-2223

  35. [35]

    Development of machine learning models for diagnosis of glaucoma[J]

    Kim S J, Cho K J, Oh S. Development of machine learning models for diagnosis of glaucoma[J]. PLoS One. 2017, 12(5): e177726

  36. [36]

    Deep learning in medical image analysis

    Shen et al., "Deep learning in medical image analysis", Annual Review of Biomedical Engineering, 2017

  37. [37]

    A survey on deep learning in medical image analysis

    lItjens et al., "A survey on deep learning in medical image analysis", Medical Image Analysis, 2017

  38. [38]

    The use of sequential patternmining to predict next prescribed medications,

    A. P. Wright, A. T. Wright, A. B. McCoy, and D. F. Sittig, “The use of sequential patternmining to predict next prescribed medications,”Journal of biomedical informatics, vol. 53,pp. 73–80, 2015

  39. [39]

    Forecasting potential diabetescomplications,

    Y. Yang, W. Luyten, L. Liu, M.-F. Moens, J. Tang, and J. Li, “Forecasting potential diabetescomplications,” inTwenty-Eighth AAAI Conference on Artificial Intelligence, 2014

  40. [40]

    Medimlp: Using grad -camto extract principal variables for lung cancer posto perative complication prediction,

    T. He, J. Guo, N. Chen, X. Xu, Z. Wang, K. Fu, L. Liu, and Z. Yi, “Medimlp: Using grad -camto extract principal variables for lung cancer posto perative complication prediction,”IEEEjournal of biomedical and health informatics, 2019.2

  41. [41]

    Disease diagnosis prediction of emrbased on bigru-att-capsnetwork model,

    P. Ni, Y. Li, J. Zhu, J. Peng, Z. Dai, G. Li, and X. Bai, “Disease diagnosis prediction of emrbased on bigru-att-capsnetwork model,” in2019 IEEE International Conference on Big Data(Big Data), pp. 6166–6168, IEEE, 2019

  42. [42]

    Improving appearance model matching using localimage structure,

    I. M. Scott, T. F. Cootes, and C. J. Taylor, “Improving appearance model matching using localimage structure,” inBiennial Int ernational Conference on Information Processing in MedicalImaging, pp. 258–269, Springer, 2003

  43. [43]

    Split-and-merge segmentation ofmagnetic resonance medical images: performance evaluation and extension to three dimen-sions,

    I. Manousakas, P. Undrill, G. Cameron, and T. Redpath, “Split-and-merge segmentation ofmagnetic resonance medical images: performance evaluation and extension to three dimen-sions,”Computers and Biomedical Research, vol. 31, no. 6, pp. 393–412, 1998

  44. [44]

    Medical imagesedge detection based on mathematical morphology,

    Z. Yu-Qian, G. Wei-Hua, C. Zhen-Cheng, T. Jing-Tian, and L. Ling-Yun, “Medical imagesedge detection based on mathematical morphology,” in2005 IEEE engineering in medicineand biology 27th annual conference, pp. 6492–6495, IEEE, 2006

  45. [45]

    Automated segmentation ofthe left ventricle in cardiac mri,

    M. R. Kaus, J. Von Berg, J. Weese, W. Niessen, and V. Pekar, “Automated segmentation ofthe left ventricle in cardiac mri,”Medical image analysis, vol. 8, no. 3, pp. 245–254, 2004

  46. [46]

    Hierarchical c lusteringto measure connectivity in fmri resting-state data,

    D. Cordes, V. Haughton, J. D. Carew, K. Arfanakis, and K. Maravilla, “Hierarchical c lusteringto measure connectivity in fmri resting-state data,”Magnetic resonance imaging, vol. 20, no. 4,pp. 305–317, 2002

  47. [47]

    Logarithm odds maps for sha pe representation,

    K. M. Pohl, J. Fisher, M. Shenton, R. W. McCarley, W. E. L. Grimson, R. Kikinis, andW. M. Wells, “Logarithm odds maps for sha pe representation,” inInternational Conferenceon Medical Image Computing and Computer-assisted Intervention, pp. 955–963, Springer,2006

  48. [48]

    A review of image segmentation methodologies inmedical image,

    L. K. Lee, S. C. Liew, and W. J. Thong, “A review of image segmentation methodologies inmedical image,” inAdvanced computer a nd communication engineering technology, pp. 1069–1080, Springer, 2015