pith. sign in

arxiv: 2506.23104 · v2 · submitted 2025-06-29 · 💻 cs.CV

DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation

Pith reviewed 2026-05-19 08:05 UTC · model grok-4.3

classification 💻 cs.CV
keywords test-time adaptationinteractive segmentationclick partitioningSegment Anything Modelcamouflaged objectsmodel merging
0
0 comments X p. Extension

The pith

Partitioning user clicks into coherent subsets lets SAM adapt at test time without cue conflicts for better interactive segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a test-time adaptation approach that breaks a set of user clicks into smaller, more consistent groups rather than feeding them all to one model at the same time. Each group drives its own independent adaptation of the Segment Anything Model, after which the specialized versions are combined into a single final predictor. This strategy is meant to cut down on contradictory signals from mixed positive and negative clicks and to support more focused updates that work especially well on hard cases like camouflaged or multi-part objects. A reader would care because it points to a practical way to make interactive segmentation more accurate while requiring fewer corrections from the user.

Core claim

Instead of forcing a single model to incorporate all user clicks at once, DC-TTA partitions the clicks into more coherent subsets, each processed independently via test-time adaptation with a separated model; the adapted models are then merged to form a unified predictor that integrates the specialized knowledge from each subset.

What carries the argument

The divide-and-conquer partitioning of clicks into coherent subsets for independent per-subset test-time adaptation followed by model merging.

Load-bearing premise

Splitting the clicks into coherent subsets reduces conflicts among diverse cues and enables more localized updates that improve the final merged predictor.

What would settle it

A benchmark run in which merging separately adapted models on partitioned clicks produces equal or lower accuracy than a single model adapted on all clicks together.

Figures

Figures reproduced from arXiv: 2506.23104 by Hoyong Kwon, Hyeokjun Kweon, Jihun Kim, Kuk-Jin Yoon, Wooseong Jeong.

Figure 1
Figure 1. Figure 1: Motivation of Our DC-TTA. (a) Na¨ [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the naive TTA approach. Each iteration, a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of our DC strategy (without TTA). Given [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the proposed DC-TTA framework. First, user clicks are assigned into segmentation units (left). Each unit is then [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: A qualitative comparison between the baseline method and our DC-TTA. Each subfigure shows the segmentation result after [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative examples of challenging camouflaged objects from CAMO [ [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Segmentation units identified in our DC-TTA. User [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Interactive segmentation (IS) allows users to iteratively refine object boundaries with minimal cues, such as positive and negative clicks. While the Segment Anything Model (SAM) has garnered attention in the IS community for its promptable segmentation capabilities, it often struggles in specialized domains or when handling complex scenarios (e.g., camouflaged or multi-part objects). To overcome these challenges, we propose DC-TTA, a novel test-time adaptation (TTA) framework that adapts SAM on a per-sample basis by leveraging user interactions as supervision. Instead of forcing a single model to incorporate all user clicks at once, DC-TTA partitions the clicks into more coherent subsets, each processed independently via TTA with a separated model. This Divide-and-Conquer strategy reduces conflicts among diverse cues and enables more localized updates. Finally, we merge the adapted models to form a unified predictor that integrates the specialized knowledge from each subset. Experimental results across various benchmarks demonstrate that DC-TTA significantly outperforms SAM's zero-shot results and conventional TTA methods, effectively handling complex tasks such as camouflaged object segmentation with fewer interactions and improved accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents DC-TTA, a divide-and-conquer framework for test-time adaptation in interactive segmentation. It partitions user clicks into coherent subsets, adapts separate SAM models on each subset using TTA, and merges the results to create a final predictor. This is claimed to handle conflicts in cues better than applying TTA to all clicks at once, leading to improved performance on standard benchmarks and complex tasks like camouflaged object segmentation compared to zero-shot SAM and conventional TTA methods.

Significance. Should the gains prove to stem from the coherence of the partitions rather than from the general benefits of adapting and merging multiple models, the approach could meaningfully advance interactive segmentation by allowing more effective use of user inputs in difficult scenarios. The framework is empirical and relies on the specific partitioning strategy, so its significance hinges on demonstrating that this strategy is superior to alternatives like random partitioning.

major comments (2)
  1. [§3] §3 (Method): The central claim that coherent partitioning specifically reduces cue conflicts and enables superior localized updates is not isolated experimentally. No control is described that holds the number of adapted models and total TTA optimization steps fixed while comparing coherent partitioning against random partitioning or a single-model baseline.
  2. [§4] §4 (Experiments): The reported outperformance on benchmarks lacks accompanying error bars, statistical tests, or ablations that directly vary only the partitioning criterion. This leaves open whether gains derive from ensembling effects rather than the coherence assumption highlighted in the abstract.
minor comments (2)
  1. [Abstract] The abstract could specify the exact benchmarks, metrics, and interaction counts used to support claims of 'fewer interactions and improved accuracy'.
  2. Notation for the final merging step would benefit from an explicit equation defining how the specialized predictors are combined into the unified model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects for strengthening the empirical support of our claims. We address each major comment below and will incorporate revisions to provide the requested controls and statistical rigor.

read point-by-point responses
  1. Referee: [§3] §3 (Method): The central claim that coherent partitioning specifically reduces cue conflicts and enables superior localized updates is not isolated experimentally. No control is described that holds the number of adapted models and total TTA optimization steps fixed while comparing coherent partitioning against random partitioning or a single-model baseline.

    Authors: We agree that the current experiments do not fully isolate the contribution of coherent partitioning. While the manuscript includes a single-model TTA baseline, it lacks a matched control with random partitioning under fixed model count and optimization steps. We will add this ablation in the revised version to directly test whether coherence (rather than the mere presence of multiple adapted models) drives the observed benefits. revision: yes

  2. Referee: [§4] §4 (Experiments): The reported outperformance on benchmarks lacks accompanying error bars, statistical tests, or ablations that directly vary only the partitioning criterion. This leaves open whether gains derive from ensembling effects rather than the coherence assumption highlighted in the abstract.

    Authors: We acknowledge that the experimental section would benefit from error bars, statistical significance testing, and ablations that isolate the partitioning criterion. In the revision we will report error bars across multiple runs, include statistical tests, and add an ablation that varies only the partitioning strategy (coherent vs. random) while holding the number of models and total TTA steps constant, thereby clarifying that improvements are attributable to cue coherence rather than generic ensembling. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent experimental validation

full rationale

The paper introduces DC-TTA as a practical divide-and-conquer TTA method for interactive segmentation: user clicks are partitioned into coherent subsets, each subset drives independent per-sample adaptation of a separate SAM-based model, and the resulting models are merged into a final predictor. All performance claims rest on reported benchmark experiments comparing against SAM zero-shot and standard single-model TTA baselines. No equations, fitted parameters, or derivations appear in the provided text that would reduce the reported gains to quantities defined by construction within the paper itself. The central design choice (coherent partitioning) is motivated by an empirical hypothesis about cue conflict and is evaluated directly on external datasets rather than being justified by self-citation chains or self-referential definitions. Consequently the derivation chain is self-contained and the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that user-provided clicks serve as reliable per-sample supervision signals for adaptation; no new physical entities or large numbers of free parameters are introduced in the visible description.

axioms (1)
  • domain assumption User clicks provide reliable supervision for test-time adaptation
    The method uses positive and negative clicks as direct supervision to adapt the model on each subset.

pith-pipeline@v0.9.0 · 5739 in / 1207 out tokens · 26542 ms · 2026-05-19T08:05:32.569182+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 5 internal anchors

  1. [1]

    Interactive graph cuts for optimal boundary & region segmentation of objects in nd images

    Yuri Y Boykov and M-P Jolly. Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In Proceedings eighth IEEE international confer- ence on computer vision. ICCV 2001, pages 105–112. IEEE,

  2. [2]

    Contrastive test-time adaptation

    Dian Chen, Dequan Wang, Trevor Darrell, and Sayna Ebrahimi. Contrastive test-time adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 295–305, 2022. 2

  3. [3]

    Focalclick: Towards prac- tical interactive image segmentation

    Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, and Hengshuang Zhao. Focalclick: Towards prac- tical interactive image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1300–1309, 2022. 2, 7, 8

  4. [4]

    Sam-med2d

    Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Ji- long Chen, Lei Jiang, et al. Sam-med2d. arXiv preprint arXiv:2308.16184, 2023. 1, 2

  5. [5]

    To adapt or not to adapt? real- time adaptation for semantic segmentation

    Marc Botet Colomer, Pier Luigi Dovesi, Theodoros Pana- giotakopoulos, Joao Frederico Carvalho, Linus H ¨arenstam- Nielsen, Hossein Azizpour, Hedvig Kjellstr ¨om, Daniel Cre- mers, and Matteo Poggi. To adapt or not to adapt? real- time adaptation for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages ...

  6. [6]

    Adaptive stochastic weight averaging

    Caglar Demir, Arnab Sharma, and Axel-Cyrille Ngonga Ngomo. Adaptive stochastic weight averaging. arXiv preprint arXiv:2406.19092, 2024. 3

  7. [7]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. 7

  8. [8]

    The pascal visual object classes (voc) challenge

    Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010. 6, 7

  9. [9]

    Camouflaged object de- tection

    Deng-Ping Fan, Ge-Peng Ji, Guolei Sun, Ming-Ming Cheng, Jianbing Shen, and Ling Shao. Camouflaged object de- tection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2777–2787,

  10. [10]

    Uncertainty reduction for model adaptation in semantic segmentation

    Francois Fleuret et al. Uncertainty reduction for model adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9613–9623, 2021. 2

  11. [11]

    Getting to 99% accuracy in interactive seg- mentation

    Marco Forte, Brian Price, Scott Cohen, Ning Xu, and Franc ¸ois Piti´e. Getting to 99% accuracy in interactive seg- mentation. arXiv preprint arXiv:2003.07932, 2020. 2

  12. [12]

    Random walks for image segmentation

    Leo Grady. Random walks for image segmentation. IEEE transactions on pattern analysis and machine intelligence , 28(11):1768–1783, 2006. 2

  13. [13]

    Geodesic star convexity for interactive image segmentation

    Varun Gulshan, Carsten Rother, Antonio Criminisi, Andrew Blake, and Andrew Zisserman. Geodesic star convexity for interactive image segmentation. In 2010 IEEE Computer So- ciety Conference on Computer Vision and Pattern Recogni- tion, pages 3129–3136. IEEE, 2010. 2

  14. [14]

    Trash- can: A semantically-segmented dataset towards visual de- tection of marine debris

    Jungseok Hong, Michael Fulton, and Junaed Sattar. Trash- can: A semantically-segmented dataset towards visual de- tection of marine debris. arXiv preprint arXiv:2007.08097,

  15. [15]

    Foc- sam: Delving deeply into focused objects in segmenting any- thing

    You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, and Rongrong Ji. Foc- sam: Delving deeply into focused objects in segmenting any- thing. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 3120–3130,

  16. [16]

    Editing Models with Task Arithmetic

    Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089, 2022. 2, 3

  17. [17]

    Interactive image seg- mentation via backpropagating refinement scheme

    Won-Dong Jang and Chang-Su Kim. Interactive image seg- mentation via backpropagating refinement scheme. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5297–5306, 2019. 2

  18. [18]

    Fine-tuning linear layers only is a simple yet effective way for task arithmetic

    Ruochen Jin, Bojian Hou, Jiancong Xiao, Weijie Su, and Li Shen. Fine-tuning linear layers only is a simple yet effective way for task arithmetic. arXiv preprint arXiv:2407.07089 ,

  19. [19]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 1, 2, 3, 7, 8

  20. [20]

    Continuous adaptation for interactive ob- ject segmentation by learning from corrections

    Theodora Kontogianni, Michael Gygli, Jasper Uijlings, and Vittorio Ferrari. Continuous adaptation for interactive ob- ject segmentation by learning from corrections. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16 , pages 579–596. Springer, 2020. 2

  21. [21]

    Anabranch network for camouflaged object segmentation

    Trung-Nghia Le, Tam V Nguyen, Zhongliang Nie, Minh- Triet Tran, and Akihiro Sugimoto. Anabranch network for camouflaged object segmentation. Computer vision and im- age understanding, 184:45–56, 2019. 6, 7, 8 9

  22. [22]

    Mfp: Mak- ing full use of probability maps for interactive image seg- mentation

    Chaewon Lee, Seon-Ho Lee, and Chang-Su Kim. Mfp: Mak- ing full use of probability maps for interactive image seg- mentation. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4051–4059. IEEE, 2024. 2, 8

  23. [23]

    Interactive learn- ing for semantic segmentation in earth observation

    Gaston Lenczner, Adrien Chan-Hon-Tong, Nicola Luminari, Bertrand Le Saux, and Guy Le Besnerais. Interactive learn- ing for semantic segmentation in earth observation. arXiv preprint arXiv:2009.11250, 2020. 2

  24. [24]

    Interactive image segmentation with latent diversity

    Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Interactive image segmentation with latent diversity. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 577–585, 2018. 2, 7

  25. [25]

    Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation

    Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for un- supervised domain adaptation. In International conference on machine learning, pages 6028–6039. PMLR, 2020. 2

  26. [26]

    Regional interactive image segmentation net- works

    JunHao Liew, Yunchao Wei, Wei Xiong, Sim-Heng Ong, and Jiashi Feng. Regional interactive image segmentation net- works. In 2017 IEEE international conference on computer vision (ICCV), pages 2746–2754. IEEE, 2017. 2

  27. [27]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pages 740–755. Springer, 2014. 6

  28. [28]

    Interactive image segmentation with first click attention

    Zheng Lin, Zhao Zhang, Lin-Zhuo Chen, Ming-Ming Cheng, and Shao-Ping Lu. Interactive image segmentation with first click attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 13339–13348, 2020. 2

  29. [29]

    Click prompt learning with optimal trans- port for interactive segmentation

    Jie Liu, Haochen Wang, Wenzhe Yin, Jan-Jakob Sonke, and Efstratios Gavves. Click prompt learning with optimal trans- port for interactive segmentation. In European Conference on Computer Vision, pages 93–110. Springer, 2024. 2

  30. [30]

    Pseudoclick: Interactive image segmentation with click imi- tation

    Qin Liu, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, Marc Niethammer, and Ziyan Wu. Pseudoclick: Interactive image segmentation with click imi- tation. In European Conference on Computer Vision, pages 728–745. Springer, 2022. 2

  31. [31]

    Simpleclick: Interactive image segmentation with sim- ple vision transformers

    Qin Liu, Zhenlin Xu, Gedas Bertasius, and Marc Nietham- mer. Simpleclick: Interactive image segmentation with sim- ple vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22290– 22300, 2023. 7

  32. [32]

    Rethinking interactive image segmentation with low latency high quality and diverse prompts

    Qin Liu, Jaemin Cho, Mohit Bansal, and Marc Nietham- mer. Rethinking interactive image segmentation with low latency high quality and diverse prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3773–3782, 2024. 2

  33. [33]

    Tangent transformers for composition, privacy and removal

    Tian Yu Liu, Aditya Golatkar, and Stefano Soatto. Tangent transformers for composition, privacy and removal. arXiv preprint arXiv:2307.08122, 2023. 3

  34. [34]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. 7

  35. [35]

    A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics

    David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings eighth IEEE international conference on computer vision. ICCV 2001 , pages 416–423. IEEE, 2001. 6, 7

  36. [36]

    Towards stable test-time adaptation in dynamic wild world

    Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world. In Internetional Conference on Learning Representations, 2023. 2

  37. [37]

    Task arithmetic in the tangent space: Improved editing of pre-trained models

    Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard. Task arithmetic in the tangent space: Improved editing of pre-trained models. Advances in Neural Informa- tion Processing Systems, 36, 2024. 3

  38. [38]

    Real-time, accurate, and consistent video semantic segmentation via unsupervised adaptation and cross-unit de- ployment on mobile device

    Hyojin Park, Alan Yessenbayev, Tushar Singhal, Navin Ku- mar Adhikari, Yizhe Zhang, Shubhankar Mangesh Borse, Hong Cai, Nilesh Prasad Pandey, Fei Yin, Frank Mayer, et al. Real-time, accurate, and consistent video semantic segmentation via unsupervised adaptation and cross-unit de- ployment on mobile device. InProceedings of the IEEE/CVF Conference on Comp...

  39. [39]

    A benchmark dataset and evaluation methodology for video object segmentation

    Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 724–732,

  40. [40]

    ” grabcut” interactive foreground extraction using iterated graph cuts

    Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. ” grabcut” interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG) , 23(3): 309–314, 2004. 2

  41. [41]

    Adapting the segment anything model during usage in novel situations

    Robin Sch ¨on, Julian Lorenz, Katja Ludwig, and Rainer Lien- hart. Adapting the segment anything model during usage in novel situations. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3616–3626, 2024. 1, 2, 7, 8

  42. [42]

    f-brs: Rethinking backpropagating refinement for interactive segmentation

    Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, and Anton Konushin. f-brs: Rethinking backpropagating refinement for interactive segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 8623–8632, 2020. 6, 7

  43. [43]

    f-brs: Rethinking backpropagating refinement for interactive segmentation

    Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, and Anton Konushin. f-brs: Rethinking backpropagating refinement for interactive segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 8623–8632, 2020. 2

  44. [44]

    Re- viving iterative training with mask guidance for interactive segmentation

    Konstantin Sofiiuk, Ilya A Petrov, and Anton Konushin. Re- viving iterative training with mask guidance for interactive segmentation. In 2022 IEEE International Conference on Image Processing (ICIP), pages 3141–3145. IEEE, 2022. 2, 7, 8

  45. [45]

    Cfr-icl: Cascade-forward refinement with iterative click loss for interactive image segmentation

    Shoukun Sun, Min Xian, Fei Xu, Luca Capriotti, and Tiankai Yao. Cfr-icl: Cascade-forward refinement with iterative click loss for interactive image segmentation. In Proceedings of the AAAI conference on artificial intelligence , pages 5017– 5024, 2024. 2

  46. [46]

    Parameter efficient multi- 10 task model fusion with partial linearization

    Anke Tang, Li Shen, Yong Luo, Yibing Zhan, Han Hu, Bo Du, Yixin Chen, and Dacheng Tao. Parameter efficient multi- 10 task model fusion with partial linearization. arXiv preprint arXiv:2310.04742, 2023. 3

  47. [47]

    Tesla: Test-time self-learning with automatic adversarial augmentation

    Devavrat Tomar, Guillaume Vray, Behzad Bozorgtabar, and Jean-Philippe Thiran. Tesla: Test-time self-learning with automatic adversarial augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20341–20350, 2023. 2

  48. [48]

    On the road to online adaptation for semantic image segmentation

    Riccardo V olpi, Pau De Jorge, Diane Larlus, and Gabriela Csurka. On the road to online adaptation for semantic image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 19184– 19195, 2022

  49. [49]

    Tent: Fully Test-time Adaptation by Entropy Minimization

    Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. arXiv preprint arXiv:2006.10726,

  50. [50]

    Stacked condi- tional generative adversarial networks for jointly learning shadow detection and shadow removal

    Jifeng Wang, Xiang Li, and Jian Yang. Stacked condi- tional generative adversarial networks for jointly learning shadow detection and shadow removal. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1788–1797, 2018. 6, 7, 8

  51. [51]

    Continual test-time domain adaptation

    Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022. 2

  52. [52]

    Dynamically instance- guided adaptation: A backward-free approach for test-time domain adaptive semantic segmentation

    Wei Wang, Zhun Zhong, Weijie Wang, Xi Chen, Charles Ling, Boyu Wang, and Nicu Sebe. Dynamically instance- guided adaptation: A backward-free approach for test-time domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24090–24099, 2023

  53. [53]

    Continual test-time domain adaptation via dynamic sample selection

    Yanshuo Wang, Jie Hong, Ali Cheraghian, Shafin Rahman, David Ahmedt-Aristizabal, Lars Petersson, and Mehrtash Harandi. Continual test-time domain adaptation via dynamic sample selection. In Proceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision , pages 1701–1710, 2024. 2

  54. [54]

    Focused and collaborative feedback integration for interactive image seg- mentation

    Qiaoqiao Wei, Hui Zhang, and Jun-Hai Yong. Focused and collaborative feedback integration for interactive image seg- mentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 18643– 18652, 2023. 2

  55. [55]

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing in- ference time

    Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Re- becca Roelofs, Raphael Gontijo-Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Ko- rnblith, et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing in- ference time. In International conference on machine learn- ing, pages 23965...

  56. [56]

    Robust fine-tuning of zero-shot models

    Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gon- tijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, et al. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 7959–7971, 2022. 3

  57. [57]

    Ties-merging: Resolving interference when merging models

    Prateek Yadav, Derek Tam, Leshem Choshen, Colin A Raf- fel, and Mohit Bansal. Ties-merging: Resolving interference when merging models. Advances in Neural Information Pro- cessing Systems, 36:7093–7115, 2023. 3

  58. [58]

    Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

    Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xi- aochun Cao, Jie Zhang, and Dacheng Tao. Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities. arXiv preprint arXiv:2408.07666, 2024. 3

  59. [59]

    Open-vocabulary sam: Seg- ment and recognize twenty-thousand classes interactively

    Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, and Chen Change Loy. Open-vocabulary sam: Seg- ment and recognize twenty-thousand classes interactively. In European Conference on Computer Vision, pages 419–437. Springer, 2024. 1, 2

  60. [60]

    Auxadapt: Stable and efficient test-time adaptation for temporally consistent video semantic segmentation

    Yizhe Zhang, Shubhankar Borse, Hong Cai, and Fatih Porikli. Auxadapt: Stable and efficient test-time adaptation for temporally consistent video semantic segmentation. In Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, pages 2339–2348, 2022. 2

  61. [61]

    Graco: Granularity-controllable interactive segmentation

    Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xi- awu Zheng, Rongrong Ji, Chang Liu, Li Yuan, and Jie Chen. Graco: Granularity-controllable interactive segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 3501–3510, 2024. 2

  62. [62]

    Interactive segmentation as gaussion process classification

    Minghao Zhou, Hong Wang, Qian Zhao, Yuexiang Li, Yawen Huang, Deyu Meng, and Yefeng Zheng. Interactive segmentation as gaussion process classification. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19488–19497, 2023. 2 11