Fused Detection of Retinal Biomarkers in OCT Volumes

Marion Munk; Pablo M\'arquez-Neila; Raphael Sznitman; Sebastian Wolf; Siqing Yu; Thomas Kurmann

arxiv: 1907.06955 · v1 · pith:JCMWLKIAnew · submitted 2019-07-16 · 💻 cs.CV · cs.LG

Fused Detection of Retinal Biomarkers in OCT Volumes

Thomas Kurmann , Pablo M\'arquez-Neila , Siqing Yu , Marion Munk , Sebastian Wolf , Raphael Sznitman This is my paper

Pith reviewed 2026-05-24 21:05 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords retinal biomarkersOCT volumesbidirectional LSTMCNNcoherencebiomarker detectionvolume fusionAge-Related Macular Degeneration

0 comments

The pith

A bidirectional LSTM fuses CNN outputs to impose coherence on retinal biomarker predictions across OCT volume slices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a technique for detecting retinal biomarkers in OCT volumes that first uses a convolutional neural network to analyze each individual slice and then applies a bidirectional LSTM to combine those results using context from the whole volume. This produces predictions that are consistent from slice to slice. The method trains without pixel-wise annotations yet still delivers detailed biomarker locations. Tests on 416 volumes show it outperforms several prior methods. A reader would care if this coherence improves the reliability of automated diagnosis for conditions like age-related macular degeneration.

Core claim

We present a method that automatically predicts the presence of biomarkers in OCT cross-sections by incorporating information from the entire volume. We do so by adding a bidirectional LSTM to fuse the outputs of a Convolutional Neural Network that predicts individual biomarkers. We thus avoid the need to use pixel-wise annotations to train our method, and instead provide fine-grained biomarker information regardless. On a dataset of 416 volumes, we show that our approach imposes coherence between biomarker predictions across volume slices and our predictions are superior to several existing approaches.

What carries the argument

Bidirectional LSTM that fuses independent CNN predictions from each slice to enforce coherence across the volume.

If this is right

Biomarker predictions become coherent across adjacent slices.
The system trains without requiring pixel-wise annotations.
Performance exceeds several existing approaches on a 416-volume dataset.
Fine-grained biomarker information is generated from volume-level training data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the LSTM fusion is the source of the gain, the same idea could apply to other volumetric scans in medicine.
Coherent slice predictions might reduce the rate of isolated false detections in practice.
Lower annotation needs could speed up creation of similar tools for other eye diseases.

Load-bearing premise

That the observed superiority stems from the bidirectional LSTM fusion and not from the particular dataset split or evaluation choices.

What would settle it

Running the method without the LSTM component or on an independent dataset and finding no performance improvement would disprove the benefit of the fusion step.

read the original abstract

Optical Coherence Tomography (OCT) is the primary imaging modality for detecting pathological biomarkers associated to retinal diseases such as Age-Related Macular Degeneration. In practice, clinical diagnosis and treatment strategies are closely linked to biomarkers visible in OCT volumes and the ability to identify these plays an important role in the development of ophthalmic pharmaceutical products. In this context, we present a method that automatically predicts the presence of biomarkers in OCT cross-sections by incorporating information from the entire volume. We do so by adding a bidirectional LSTM to fuse the outputs of a Convolutional Neural Network that predicts individual biomarkers. We thus avoid the need to use pixel-wise annotations to train our method, and instead provide fine-grained biomarker information regardless. On a dataset of 416 volumes, we show that our approach imposes coherence between biomarker predictions across volume slices and our predictions are superior to several existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CNN per slice plus biLSTM fusion for OCT biomarkers on 416 volumes, but the abstract supplies no metrics, ablations, or split details to show the fusion actually improves anything.

read the letter

The paper trains a CNN on individual OCT slices to flag retinal biomarkers, then runs a bidirectional LSTM across the slice predictions to enforce consistency through the volume. On 416 volumes it claims this produces coherent outputs and beats several existing methods. That combination for this exact task is the main thing presented as new; the architecture itself is standard supervised learning with no pixel-level labels required, which is a practical plus for medical data where dense annotations are expensive. The approach makes sense for capturing the fact that biomarkers should not flip arbitrarily between adjacent slices. The main gap is that the abstract states superiority and coherence without any AUC, F1, p-values, or even a simple CNN-only baseline. No ablation isolates the LSTM contribution, no dataset split details appear, and no error bars or per-biomarker breakdowns are mentioned. Without those numbers the central claim cannot be checked, so any reported gain could come from how the data were split or how the metric was computed rather than the fusion step. The work is aimed at groups already building automated OCT tools for AMD or similar retinal conditions; a reader who needs a volume-aware but annotation-light method might find the full paper useful once the tables are there. It is coherent on its own terms and engages the right literature for the application, so it is worth sending to referees who can ask for the missing quantitative evidence.

Referee Report

1 major / 0 minor

Summary. The paper proposes fusing per-slice biomarker predictions from a CNN with a bidirectional LSTM to enforce coherence across OCT volume slices. It claims this yields superior performance to existing approaches on a 416-volume dataset while requiring only volume-level labels rather than pixel-wise annotations.

Significance. If the claimed superiority and coherence gains are demonstrated with proper controls, the method would offer a practical route to slice-level biomarker detection that respects 3D volume structure without dense supervision, which is relevant for scalable analysis of retinal OCT in clinical and pharmaceutical settings.

major comments (1)

[Abstract] Abstract: the central claim that the biLSTM fusion 'imposes coherence' and produces predictions 'superior to several existing approaches' on 416 volumes is unsupported by any quantitative results, per-biomarker metrics (AUC/F1/etc.), ablation tables isolating the LSTM contribution, statistical tests, or dataset-split details.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the concern point-by-point below and will incorporate revisions to strengthen the presentation of results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the biLSTM fusion 'imposes coherence' and produces predictions 'superior to several existing approaches' on 416 volumes is unsupported by any quantitative results, per-biomarker metrics (AUC/F1/etc.), ablation tables isolating the LSTM contribution, statistical tests, or dataset-split details.

Authors: We agree that the abstract would be strengthened by including concrete quantitative support rather than relying solely on the summary phrasing. The full manuscript already contains per-biomarker AUC/F1 results, ablation studies isolating the bidirectional LSTM contribution, dataset split details (e.g., train/validation/test partitioning of the 416 volumes), and statistical comparisons in the Experiments and Results sections. In the revised version we will update the abstract to explicitly reference key metrics (e.g., average AUC improvement and coherence measures) while keeping it concise. revision: yes

Circularity Check

0 steps flagged

No circularity; standard CNN+biLSTM supervised pipeline

full rationale

The paper describes a conventional supervised learning setup: a CNN produces per-slice biomarker predictions, followed by a bidirectional LSTM to enforce coherence across volume slices. No equations, fitted parameters, or derivations are presented that reduce the claimed performance gain to a definition or self-citation chain. The abstract and method summary contain no self-citations used as load-bearing uniqueness theorems, no ansatz smuggled via prior work, and no renaming of known results as novel organization. The central claim rests on empirical comparison to baselines on the 416-volume dataset, which is externally falsifiable and does not reduce by construction to the input architecture. This is the most common honest non-finding for standard ML papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; all modeling choices remain implicit.

pith-pipeline@v0.9.0 · 5687 in / 932 out tokens · 18119 ms · 2026-05-24T21:05:06.253883+00:00 · methodology

Fused Detection of Retinal Biomarkers in OCT Volumes

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)