You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging

Felix Cohen; Guang Yang; Harry Anthony; Konstantinos Kamnitsas; Wentian Xu; Yasin Ibrahim; Ziyun Liang

arxiv: 2503.06717 · v4 · submitted 2025-03-09 · 💻 cs.CV

You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging

Wentian Xu , Ziyun Liang , Harry Anthony , Yasin Ibrahim , Felix Cohen , Guang Yang , Konstantinos Kamnitsas This is my paper

Pith reviewed 2026-05-23 00:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords interactive segmentationonline adaptationdistribution shiftsmedical imagingpseudo-ground-truthclick-centered lossfundus imagingbrain MRI

0 comments

The pith

Treating user-refined outputs as pseudo-ground-truth enables online adaptation of interactive segmentation models to distribution shifts in medical imaging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a practical online adaptation method for interactive segmentation models so they can handle distribution shifts common when medical images come from new scanners, modalities or patient populations. The core mechanism uses the final user-corrected prediction after interactive refinement as a pseudo label to update the model parameters on each new test image. Adaptation happens in two modes: a full update after the user finishes refining an image and incremental updates after each individual click. A Click-Centered Gaussian loss is added to make the model more responsive to user clicks and to concentrate learning on clinically relevant regions. Experiments across five fundus and four brain-MRI datasets show the method outperforms prior approaches when facing shifts such as unseen modalities and pathologies.

Core claim

By treating the post-interaction user-refined model output as pseudo-ground-truth, a lean online adaptation method can be designed that enables a model to learn effectively across sequential test images. The framework includes a Post-Interaction adaptation process that updates the model after the user completes interactive refinement and a Mid-Interaction adaptation process that updates incrementally after each click. Both processes incorporate a Click-Centered Gaussian loss that strengthens the model's reaction to clicks and focuses learning on user-guided regions. This approach allows the model to adapt to new data distributions without ground-truth labels for the test images.

What carries the argument

Post-Interaction and Mid-Interaction adaptation processes that treat user-refined outputs as pseudo-ground-truth, combined with the Click-Centered Gaussian loss to increase responsiveness to clicks.

If this is right

The model can adapt to previously unseen imaging modalities and pathologies without retraining from scratch.
Adaptation occurs across sequential test images using only the user's interactive refinements rather than new ground-truth annotations.
Both full-image updates after refinement and incremental updates after each click contribute to improved performance under distribution shift.
The Click-Centered Gaussian loss improves the model's initial responsiveness to user clicks on both training and test data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Deployed interactive tools could continuously improve their accuracy on new clinical sites simply through routine user corrections.
The same pseudo-label strategy might extend to other interactive medical tasks where user adjustments naturally supply supervision signals.
Error accumulation remains a risk if early user refinements contain consistent biases that the adaptation then reinforces.

Load-bearing premise

The post-interaction user-refined outputs provide sufficiently accurate pseudo-ground-truth for adaptation without accumulating errors or causing performance degradation over sequential images.

What would settle it

A long sequence of test images in which repeated application of the adaptation causes the model's segmentation accuracy to decline steadily instead of improve or stabilize.

read the original abstract

Interactive segmentation uses real-time user inputs, such as mouse clicks, to iteratively refine model predictions. Although not originally designed to address distribution shifts, this paradigm naturally lends itself to such challenges. In medical imaging, where distribution shifts are common, interactive methods can use user inputs to guide models towards improved predictions. Moreover, once a model is deployed, user corrections can be used to adapt the network parameters to the new data distribution, mitigating distribution shift. Based on these insights, we aim to develop a practical, effective method for improving the adaptive capabilities of interactive segmentation models to new data distributions in medical imaging. Firstly, we found that strengthening the model's responsiveness to clicks is important for the initial training process. Moreover, we show that by treating the post-interaction user-refined model output as pseudo-ground-truth, we can design a lean, practical online adaptation method that enables a model to learn effectively across sequential test images. The framework includes two components: (i) a Post-Interaction adaptation process, updating the model after the user has completed interactive refinement of an image, and (ii) a Mid-Interaction adaptation process, updating incrementally after each click. Both processes include a Click-Centered Gaussian loss that strengthens the model's reaction to clicks and enhances focus on user-guided, clinically relevant regions. Experiments on 5 fundus and 4 brain-MRI databases show that our approach consistently outperforms existing methods under diverse distribution shifts, including unseen imaging modalities and pathologies. Code and pretrained models will be released upon publication.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical online adaptation method for interactive medical segmentation by treating user refinements as pseudo-GT, with post- and mid-interaction updates plus a click-centered loss, but the reliability of those pseudo-labels over sequences is not clearly bounded.

read the letter

The core contribution is a lean online adaptation scheme that updates the model using the final user-refined mask after clicks as pseudo-ground-truth. It splits this into a post-interaction step once the user finishes an image and a mid-interaction step after each click, plus a Click-Centered Gaussian loss that re-weights around the clicks to sharpen response to user input. Experiments across five fundus and four brain-MRI datasets report consistent gains over prior methods under modality and pathology shifts. That setup is concrete and directly tied to how interactive tools are already used in clinics, so the framework feels deployable without heavy extra machinery. The results on multiple databases give some evidence the approach transfers across shifts. The main gap is the unexamined assumption that the refined masks stay close enough to true labels. The abstract gives no Dice numbers against held-out expert annotations for the pseudo-GT, nor any check for error buildup across sequential test images. The Gaussian loss only emphasizes click regions; it does not correct label noise. If refinements are incomplete or biased, the loop could reinforce mistakes rather than improve the model. The full paper may address this with additional plots or bounds, but based on what is shown it remains the load-bearing uncertainty. This work is aimed at groups building interactive segmentation pipelines for medical imaging who already collect user clicks and want a lightweight way to handle new scanners or pathologies. A reader looking for ready-to-test adaptation recipes would find the two-process design and loss useful to try. It is worth sending for peer review because the empirical scope is broad enough and the method is simple enough that referees can give targeted feedback on the pseudo-label question and the exact baselines.

Referee Report

2 major / 2 minor

Summary. The paper claims that interactive segmentation models can be adapted online to distribution shifts in medical imaging by treating post-interaction user-refined masks as pseudo-ground-truth. It introduces two adaptation processes (Post-Interaction after full refinement and Mid-Interaction after each click) plus a Click-Centered Gaussian loss to strengthen click responsiveness, and reports consistent outperformance versus existing methods on 5 fundus and 4 brain-MRI datasets under shifts including unseen modalities and pathologies.

Significance. If the empirical results and the pseudo-GT assumption hold, the work supplies a lean, annotation-free test-time adaptation strategy that exploits the natural user corrections already present in interactive workflows. This is potentially valuable for clinical deployment where scanner, protocol, or pathology shifts are routine and retraining from scratch is impractical.

major comments (2)

[Abstract] Abstract: The central claim that the method 'enables a model to learn effectively across sequential test images' rests on the unverified premise that user-refined outputs constitute sufficiently accurate pseudo-ground-truth. No quantitative check (e.g., Dice overlap of these masks against held-out expert annotations) or bound on label noise is described, leaving open the risk that residual errors accumulate and degrade later images in the sequence.
[Abstract] Abstract (adaptation description): The Click-Centered Gaussian loss only re-weights the loss around user clicks; it does not correct for possible systematic bias or incompleteness in the pseudo-labels themselves. Because both Post-Interaction and Mid-Interaction updates rely on these labels, an explicit ablation or noise-robustness experiment is required to substantiate that the online loop improves rather than reinforces errors.

minor comments (2)

[Abstract] The abstract states that code and pretrained models will be released, but provides no details on the exact training protocol, hyper-parameter ranges, or statistical testing procedure used to claim 'consistent outperformance.'
[Abstract] Dataset descriptions are summarized at a high level (5 fundus, 4 brain-MRI); a table listing the specific public datasets, number of images per shift type, and the exact distribution-shift categories would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the pseudo-ground-truth assumption and the need for robustness analysis. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the method 'enables a model to learn effectively across sequential test images' rests on the unverified premise that user-refined outputs constitute sufficiently accurate pseudo-ground-truth. No quantitative check (e.g., Dice overlap of these masks against held-out expert annotations) or bound on label noise is described, leaving open the risk that residual errors accumulate and degrade later images in the sequence.

Authors: We acknowledge that a direct quantitative validation (e.g., Dice against held-out expert annotations) of the user-refined pseudo-GT is not reported. In the test-time adaptation setting, such annotations are unavailable by design for the shifted data; the refined masks are the clinically accepted outputs after user interaction. Our experiments across multiple datasets and shift types show consistent gains on later images in each sequence without degradation, providing indirect evidence against harmful error accumulation. We will add an explicit discussion of this assumption, its limitations, and the observed empirical behavior in the revised manuscript. revision: partial
Referee: [Abstract] Abstract (adaptation description): The Click-Centered Gaussian loss only re-weights the loss around user clicks; it does not correct for possible systematic bias or incompleteness in the pseudo-labels themselves. Because both Post-Interaction and Mid-Interaction updates rely on these labels, an explicit ablation or noise-robustness experiment is required to substantiate that the online loop improves rather than reinforces errors.

Authors: The Click-Centered Gaussian loss prioritizes the user-provided clicks, which constitute the most reliable signal within the pseudo-labels. The manuscript already contains ablations isolating the loss and the two adaptation schedules. To directly address potential noise reinforcement, we will add a controlled noise-robustness experiment (synthetic perturbations to pseudo-labels) and report its results in the revised version. revision: yes

Circularity Check

0 steps flagged

Empirical pseudo-labeling method with no circular derivations

full rationale

The paper describes an empirical online adaptation framework for interactive segmentation models. It defines Post-Interaction and Mid-Interaction adaptation by treating user-refined outputs as pseudo-ground-truth and adds a Click-Centered Gaussian loss. No equations, first-principles derivations, or predictions are claimed that reduce to fitted inputs or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior work are present. The method is self-contained as a practical engineering approach validated on external datasets; the reader's assessment of score 2.0 is consistent with minor or absent circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on domain assumption that user refinements yield usable pseudo-labels and standard ML assumptions for online learning.

axioms (1)

domain assumption User-refined model outputs serve as reliable pseudo-ground-truth for parameter updates
Invoked as the basis for both Post-Interaction and Mid-Interaction adaptation processes.

pith-pipeline@v0.9.0 · 5824 in / 1077 out tokens · 66122 ms · 2026-05-23T00:12:18.845017+00:00 · methodology

You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)