You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging
Pith reviewed 2026-05-23 00:12 UTC · model grok-4.3
The pith
Treating user-refined outputs as pseudo-ground-truth enables online adaptation of interactive segmentation models to distribution shifts in medical imaging.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By treating the post-interaction user-refined model output as pseudo-ground-truth, a lean online adaptation method can be designed that enables a model to learn effectively across sequential test images. The framework includes a Post-Interaction adaptation process that updates the model after the user completes interactive refinement and a Mid-Interaction adaptation process that updates incrementally after each click. Both processes incorporate a Click-Centered Gaussian loss that strengthens the model's reaction to clicks and focuses learning on user-guided regions. This approach allows the model to adapt to new data distributions without ground-truth labels for the test images.
What carries the argument
Post-Interaction and Mid-Interaction adaptation processes that treat user-refined outputs as pseudo-ground-truth, combined with the Click-Centered Gaussian loss to increase responsiveness to clicks.
If this is right
- The model can adapt to previously unseen imaging modalities and pathologies without retraining from scratch.
- Adaptation occurs across sequential test images using only the user's interactive refinements rather than new ground-truth annotations.
- Both full-image updates after refinement and incremental updates after each click contribute to improved performance under distribution shift.
- The Click-Centered Gaussian loss improves the model's initial responsiveness to user clicks on both training and test data.
Where Pith is reading between the lines
- Deployed interactive tools could continuously improve their accuracy on new clinical sites simply through routine user corrections.
- The same pseudo-label strategy might extend to other interactive medical tasks where user adjustments naturally supply supervision signals.
- Error accumulation remains a risk if early user refinements contain consistent biases that the adaptation then reinforces.
Load-bearing premise
The post-interaction user-refined outputs provide sufficiently accurate pseudo-ground-truth for adaptation without accumulating errors or causing performance degradation over sequential images.
What would settle it
A long sequence of test images in which repeated application of the adaptation causes the model's segmentation accuracy to decline steadily instead of improve or stabilize.
read the original abstract
Interactive segmentation uses real-time user inputs, such as mouse clicks, to iteratively refine model predictions. Although not originally designed to address distribution shifts, this paradigm naturally lends itself to such challenges. In medical imaging, where distribution shifts are common, interactive methods can use user inputs to guide models towards improved predictions. Moreover, once a model is deployed, user corrections can be used to adapt the network parameters to the new data distribution, mitigating distribution shift. Based on these insights, we aim to develop a practical, effective method for improving the adaptive capabilities of interactive segmentation models to new data distributions in medical imaging. Firstly, we found that strengthening the model's responsiveness to clicks is important for the initial training process. Moreover, we show that by treating the post-interaction user-refined model output as pseudo-ground-truth, we can design a lean, practical online adaptation method that enables a model to learn effectively across sequential test images. The framework includes two components: (i) a Post-Interaction adaptation process, updating the model after the user has completed interactive refinement of an image, and (ii) a Mid-Interaction adaptation process, updating incrementally after each click. Both processes include a Click-Centered Gaussian loss that strengthens the model's reaction to clicks and enhances focus on user-guided, clinically relevant regions. Experiments on 5 fundus and 4 brain-MRI databases show that our approach consistently outperforms existing methods under diverse distribution shifts, including unseen imaging modalities and pathologies. Code and pretrained models will be released upon publication.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that interactive segmentation models can be adapted online to distribution shifts in medical imaging by treating post-interaction user-refined masks as pseudo-ground-truth. It introduces two adaptation processes (Post-Interaction after full refinement and Mid-Interaction after each click) plus a Click-Centered Gaussian loss to strengthen click responsiveness, and reports consistent outperformance versus existing methods on 5 fundus and 4 brain-MRI datasets under shifts including unseen modalities and pathologies.
Significance. If the empirical results and the pseudo-GT assumption hold, the work supplies a lean, annotation-free test-time adaptation strategy that exploits the natural user corrections already present in interactive workflows. This is potentially valuable for clinical deployment where scanner, protocol, or pathology shifts are routine and retraining from scratch is impractical.
major comments (2)
- [Abstract] Abstract: The central claim that the method 'enables a model to learn effectively across sequential test images' rests on the unverified premise that user-refined outputs constitute sufficiently accurate pseudo-ground-truth. No quantitative check (e.g., Dice overlap of these masks against held-out expert annotations) or bound on label noise is described, leaving open the risk that residual errors accumulate and degrade later images in the sequence.
- [Abstract] Abstract (adaptation description): The Click-Centered Gaussian loss only re-weights the loss around user clicks; it does not correct for possible systematic bias or incompleteness in the pseudo-labels themselves. Because both Post-Interaction and Mid-Interaction updates rely on these labels, an explicit ablation or noise-robustness experiment is required to substantiate that the online loop improves rather than reinforces errors.
minor comments (2)
- [Abstract] The abstract states that code and pretrained models will be released, but provides no details on the exact training protocol, hyper-parameter ranges, or statistical testing procedure used to claim 'consistent outperformance.'
- [Abstract] Dataset descriptions are summarized at a high level (5 fundus, 4 brain-MRI); a table listing the specific public datasets, number of images per shift type, and the exact distribution-shift categories would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the pseudo-ground-truth assumption and the need for robustness analysis. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the method 'enables a model to learn effectively across sequential test images' rests on the unverified premise that user-refined outputs constitute sufficiently accurate pseudo-ground-truth. No quantitative check (e.g., Dice overlap of these masks against held-out expert annotations) or bound on label noise is described, leaving open the risk that residual errors accumulate and degrade later images in the sequence.
Authors: We acknowledge that a direct quantitative validation (e.g., Dice against held-out expert annotations) of the user-refined pseudo-GT is not reported. In the test-time adaptation setting, such annotations are unavailable by design for the shifted data; the refined masks are the clinically accepted outputs after user interaction. Our experiments across multiple datasets and shift types show consistent gains on later images in each sequence without degradation, providing indirect evidence against harmful error accumulation. We will add an explicit discussion of this assumption, its limitations, and the observed empirical behavior in the revised manuscript. revision: partial
-
Referee: [Abstract] Abstract (adaptation description): The Click-Centered Gaussian loss only re-weights the loss around user clicks; it does not correct for possible systematic bias or incompleteness in the pseudo-labels themselves. Because both Post-Interaction and Mid-Interaction updates rely on these labels, an explicit ablation or noise-robustness experiment is required to substantiate that the online loop improves rather than reinforces errors.
Authors: The Click-Centered Gaussian loss prioritizes the user-provided clicks, which constitute the most reliable signal within the pseudo-labels. The manuscript already contains ablations isolating the loss and the two adaptation schedules. To directly address potential noise reinforcement, we will add a controlled noise-robustness experiment (synthetic perturbations to pseudo-labels) and report its results in the revised version. revision: yes
Circularity Check
Empirical pseudo-labeling method with no circular derivations
full rationale
The paper describes an empirical online adaptation framework for interactive segmentation models. It defines Post-Interaction and Mid-Interaction adaptation by treating user-refined outputs as pseudo-ground-truth and adds a Click-Centered Gaussian loss. No equations, first-principles derivations, or predictions are claimed that reduce to fitted inputs or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior work are present. The method is self-contained as a practical engineering approach validated on external datasets; the reader's assessment of score 2.0 is consistent with minor or absent circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption User-refined model outputs serve as reliable pseudo-ground-truth for parameter updates
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.