Fused Detection of Retinal Biomarkers in OCT Volumes
Pith reviewed 2026-05-24 21:05 UTC · model grok-4.3
The pith
A bidirectional LSTM fuses CNN outputs to impose coherence on retinal biomarker predictions across OCT volume slices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a method that automatically predicts the presence of biomarkers in OCT cross-sections by incorporating information from the entire volume. We do so by adding a bidirectional LSTM to fuse the outputs of a Convolutional Neural Network that predicts individual biomarkers. We thus avoid the need to use pixel-wise annotations to train our method, and instead provide fine-grained biomarker information regardless. On a dataset of 416 volumes, we show that our approach imposes coherence between biomarker predictions across volume slices and our predictions are superior to several existing approaches.
What carries the argument
Bidirectional LSTM that fuses independent CNN predictions from each slice to enforce coherence across the volume.
If this is right
- Biomarker predictions become coherent across adjacent slices.
- The system trains without requiring pixel-wise annotations.
- Performance exceeds several existing approaches on a 416-volume dataset.
- Fine-grained biomarker information is generated from volume-level training data.
Where Pith is reading between the lines
- If the LSTM fusion is the source of the gain, the same idea could apply to other volumetric scans in medicine.
- Coherent slice predictions might reduce the rate of isolated false detections in practice.
- Lower annotation needs could speed up creation of similar tools for other eye diseases.
Load-bearing premise
That the observed superiority stems from the bidirectional LSTM fusion and not from the particular dataset split or evaluation choices.
What would settle it
Running the method without the LSTM component or on an independent dataset and finding no performance improvement would disprove the benefit of the fusion step.
read the original abstract
Optical Coherence Tomography (OCT) is the primary imaging modality for detecting pathological biomarkers associated to retinal diseases such as Age-Related Macular Degeneration. In practice, clinical diagnosis and treatment strategies are closely linked to biomarkers visible in OCT volumes and the ability to identify these plays an important role in the development of ophthalmic pharmaceutical products. In this context, we present a method that automatically predicts the presence of biomarkers in OCT cross-sections by incorporating information from the entire volume. We do so by adding a bidirectional LSTM to fuse the outputs of a Convolutional Neural Network that predicts individual biomarkers. We thus avoid the need to use pixel-wise annotations to train our method, and instead provide fine-grained biomarker information regardless. On a dataset of 416 volumes, we show that our approach imposes coherence between biomarker predictions across volume slices and our predictions are superior to several existing approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes fusing per-slice biomarker predictions from a CNN with a bidirectional LSTM to enforce coherence across OCT volume slices. It claims this yields superior performance to existing approaches on a 416-volume dataset while requiring only volume-level labels rather than pixel-wise annotations.
Significance. If the claimed superiority and coherence gains are demonstrated with proper controls, the method would offer a practical route to slice-level biomarker detection that respects 3D volume structure without dense supervision, which is relevant for scalable analysis of retinal OCT in clinical and pharmaceutical settings.
major comments (1)
- [Abstract] Abstract: the central claim that the biLSTM fusion 'imposes coherence' and produces predictions 'superior to several existing approaches' on 416 volumes is unsupported by any quantitative results, per-biomarker metrics (AUC/F1/etc.), ablation tables isolating the LSTM contribution, statistical tests, or dataset-split details.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We address the concern point-by-point below and will incorporate revisions to strengthen the presentation of results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the biLSTM fusion 'imposes coherence' and produces predictions 'superior to several existing approaches' on 416 volumes is unsupported by any quantitative results, per-biomarker metrics (AUC/F1/etc.), ablation tables isolating the LSTM contribution, statistical tests, or dataset-split details.
Authors: We agree that the abstract would be strengthened by including concrete quantitative support rather than relying solely on the summary phrasing. The full manuscript already contains per-biomarker AUC/F1 results, ablation studies isolating the bidirectional LSTM contribution, dataset split details (e.g., train/validation/test partitioning of the 416 volumes), and statistical comparisons in the Experiments and Results sections. In the revised version we will update the abstract to explicitly reference key metrics (e.g., average AUC improvement and coherence measures) while keeping it concise. revision: yes
Circularity Check
No circularity; standard CNN+biLSTM supervised pipeline
full rationale
The paper describes a conventional supervised learning setup: a CNN produces per-slice biomarker predictions, followed by a bidirectional LSTM to enforce coherence across volume slices. No equations, fitted parameters, or derivations are presented that reduce the claimed performance gain to a definition or self-citation chain. The abstract and method summary contain no self-citations used as load-bearing uniqueness theorems, no ansatz smuggled via prior work, and no renaming of known results as novel organization. The central claim rests on empirical comparison to baselines on the 416-volume dataset, which is externally falsifiable and does not reduce by construction to the input architecture. This is the most common honest non-finding for standard ML papers.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.