pith. machine review for the scientific record. sign in

arxiv: 2604.02877 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 1 theorem link

· Lean Theorem

Unlocking Positive Transfer in Incrementally Learning Surgical Instruments: A Self-reflection Hierarchical Prompt Framework

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords class incremental learningsurgical instrument segmentationprompt learningknowledge transfercatastrophic forgettinghierarchical promptsself-reflectionincremental segmentation
0
0 comments X

The pith

A hierarchical prompt framework on frozen models unlocks positive forward and backward knowledge transfer for incrementally learning surgical instruments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper aims to fix the oversight in prior incremental segmentation work by actively exploiting how past instrument knowledge can simplify learning new classes and how new classes can in turn sharpen old representations. It freezes a pre-trained backbone and grows a tree of prompts whose shared partitions let new instruments borrow reusable features while a graph-propagation step lets the model reflect on associations to polish everything already known. If the approach holds, models could keep expanding their instrument vocabulary over time without losing earlier skills, which matters for surgical video systems that must handle evolving tool sets in real procedures. The method reports gains above five percent on one benchmark and eleven percent on the other, and it works for both CNN and transformer backbones.

Core claim

The framework freezes a pre-trained model and adaptively appends instrument-aware prompts organized into a hierarchical parsing tree, with an instrument-shared root, n-part-shared intermediate nodes, and instrument-distinct leaves, so that new classes can draw on historical reusable knowledge for faster learning. It then performs self-reflection by propagating knowledge associations along a directed-weighted graph built from the tree, refining existing prompt representations to improve their quality without catastrophic forgetting of prior instruments.

What carries the argument

The hierarchical prompt parsing tree together with directed-weighted graph propagation for self-reflection, which organizes prompts by degree of sharing and updates them bidirectionally to support positive transfer.

If this is right

  • New instrument classes learn more efficiently by accessing reusable historical knowledge through the shared prompt partitions.
  • Representations of previously learned instruments are refined and improved when new classes are added.
  • Catastrophic forgetting of old instruments is avoided while the model expands its capabilities.
  • The same prompt-tree mechanism delivers measurable gains on both CNN-based and transformer-based models.
  • Performance exceeds competing incremental methods by more than five percent on one public benchmark and eleven percent on the other.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tree-plus-graph structure could be tested on incremental segmentation tasks outside surgery, such as autonomous driving scenes or robotic assembly.
  • If the graph stays sparse, the self-reflection step might continue to work across dozens of sequentially added classes without retraining the backbone.
  • Real-time operating-room systems could use the framework to incorporate novel tools observed during a procedure and immediately improve recognition of familiar ones.

Load-bearing premise

The hierarchical prompt parsing tree and directed-weighted graph propagation will reliably expose reusable knowledge and refine old representations without introducing negative transfer or instability when instrument classes are added sequentially.

What would settle it

Training the framework on a long sequence of instrument classes and observing that accuracy on earlier classes falls below a non-incremental baseline or that new-class accuracy shows no improvement over plain prompt tuning would falsify the positive-transfer claim.

Figures

Figures reproduced from arXiv: 2604.02877 by Kang Li, Pheng-Ann Heng, Yu Zhu, Zheng Li.

Figure 1
Figure 1. Figure 1: (a) Prior works learn each instrument prompt in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our self-reflection hierarchical prompt framework. We progressively append instrument-aware prompts into a pre [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparisons of our approaches and highly competitive approaches. More comparison results are in the supplementary. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effects of the decay factor γ and the teleport probability α on model performance. method in comparison with other SOTA methods, as shown in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization comparison of segmentation results across all methods. [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Automatic estimation of instrument part count. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
read the original abstract

To continuously enhance model adaptability in surgical video scene parsing, recent studies incrementally update it to progressively learn to segment an increasing number of surgical instruments over time. However, prior works constantly overlooked the potential of positive forward knowledge transfer, i.e., how past knowledge could help learn new classes, and positive backward knowledge transfer, i.e., how learning new classes could help refine past knowledge. In this paper, we propose a self-reflection hierarchical prompt framework that unlocks the power of positive forward and backward knowledge transfer in class incremental segmentation, aiming to proficiently learn new instruments, improve existing skills of regular instruments, and avoid catastrophic forgetting of old instruments. Our framework is built on a frozen, pre-trained model that adaptively appends instrument-aware prompts for new classes throughout training episodes. To enable positive forward knowledge transfer, we organize instrument prompts into a hierarchical prompt parsing tree with the instrument-shared prompt partition as the root node, n-part-shared prompt partitions as intermediate nodes and instrument-distinct prompt partitions as leaf nodes, to expose the reusable historical knowledge for new classes to simplify their learning. Conversely, to encourage positive backward knowledge transfer, we conduct self-reflection refining on existing knowledge by directed-weighted graph propagation, examining the knowledge associations recorded in the tree to improve its representativeness without causing catastrophic forgetting. Our framework is applicable to both CNN-based models and advanced transformer-based foundation models, yielding more than 5% and 11% improvements over the competing methods on two public benchmarks respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims to unlock positive forward and backward knowledge transfer in class-incremental surgical instrument segmentation via a self-reflection hierarchical prompt framework. It builds on a frozen pre-trained model that appends instrument-aware prompts organized in a hierarchical parsing tree (instrument-shared root, n-part-shared intermediates, distinct leaves) to enable forward transfer of reusable knowledge, and uses directed-weighted graph propagation for self-reflection to refine old representations for backward transfer without catastrophic forgetting. The approach is stated to apply to both CNN and transformer models, with reported gains exceeding 5% and 11% over competing methods on two public benchmarks.

Significance. If the empirical claims hold under detailed validation, the work would be significant for incremental learning in medical computer vision by explicitly targeting positive transfer in both directions, an aspect often neglected in prior incremental segmentation methods. Applicability to both CNN-based models and transformer foundation models is a concrete strength that could extend impact to surgical video analysis pipelines.

major comments (3)
  1. [Abstract] Abstract: the central claim of positive forward and backward transfer yielding >5% and >11% improvements rests on benchmark results, yet the abstract (and visible description) provides no details on experimental protocol, statistical tests, ablation studies, or exact baselines, making the positive-transfer assertion difficult to evaluate.
  2. [Method (hierarchical prompt parsing tree and directed-weighted graph propagation)] Hierarchical prompt parsing tree and directed-weighted graph propagation: the directed-weighted graph propagation is presented as enabling positive backward transfer by examining associations in the tree, but no quantitative isolation of its isolated effect on old-class performance after new instruments are added is shown; without this, the risk of negative transfer remains unaddressed when instrument features overlap (e.g., graspers and scissors).
  3. [Method] Method: prompt partition granularity is a free parameter in the hierarchical tree construction; this undercuts the claim of fully adaptive, reusable knowledge exposure for new classes without additional tuning.
minor comments (1)
  1. [Method] The notation and propagation rule for the directed-weighted graph could be formalized with an equation or pseudocode to clarify how associations are updated without introducing instability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and provide additional evidence where needed.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of positive forward and backward transfer yielding >5% and >11% improvements rests on benchmark results, yet the abstract (and visible description) provides no details on experimental protocol, statistical tests, ablation studies, or exact baselines, making the positive-transfer assertion difficult to evaluate.

    Authors: We agree the abstract should better contextualize the claims. In revision we will expand it to name the two benchmarks (EndoVis and CholecSeg8k), list the primary baselines, and note that gains are supported by multiple runs with statistical testing. Full protocols, exact numbers, and ablations will stay in the main text and supplement due to length limits. revision: yes

  2. Referee: [Method (hierarchical prompt parsing tree and directed-weighted graph propagation)] Hierarchical prompt parsing tree and directed-weighted graph propagation: the directed-weighted graph propagation is presented as enabling positive backward transfer by examining associations in the tree, but no quantitative isolation of its isolated effect on old-class performance after new instruments are added is shown; without this, the risk of negative transfer remains unaddressed when instrument features overlap (e.g., graspers and scissors).

    Authors: We acknowledge the need for isolation. We will add an ablation in the revised Section 4.3 that reports old-class mIoU before/after new-class addition, with and without the graph-propagation module. This will quantify backward-transfer gains and show mitigation of negative transfer for overlapping instruments (graspers/scissors) via the hierarchical associations. revision: yes

  3. Referee: [Method] Method: prompt partition granularity is a free parameter in the hierarchical tree construction; this undercuts the claim of fully adaptive, reusable knowledge exposure for new classes without additional tuning.

    Authors: The granularity follows a fixed, predefined instrument taxonomy (root = all instruments, intermediates = shared parts such as shaft/tip, leaves = distinct tools) derived from standard surgical knowledge; it is not tuned per run or per incremental step. We will revise the method section to state this construction explicitly and add a sensitivity study in the supplement demonstrating robustness. revision: partial

Circularity Check

0 steps flagged

No circularity detected; framework is an independent architectural proposal

full rationale

The paper proposes a self-reflection hierarchical prompt framework consisting of a prompt parsing tree for forward transfer and directed-weighted graph propagation for backward transfer. These are presented as explicit design choices in the abstract and methods, with claimed gains validated empirically on public benchmarks rather than derived from equations that reduce outputs to fitted inputs or self-citations. No load-bearing step matches any enumerated circularity pattern; the derivation chain remains self-contained against external data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The framework rests on a frozen pre-trained backbone and introduces two new conceptual structures whose effectiveness is asserted rather than derived from prior results.

free parameters (1)
  • prompt partition granularity
    Choice of how to group instruments into n-part-shared nodes is not derived from data and must be selected per dataset.
axioms (1)
  • domain assumption A frozen pre-trained model remains effective for new classes when only prompts are added and updated.
    Explicitly stated as the foundation of the framework.
invented entities (2)
  • hierarchical prompt parsing tree no independent evidence
    purpose: Organize prompts to expose reusable historical knowledge for forward transfer
    Newly defined structure with root, intermediate, and leaf partitions.
  • directed-weighted graph propagation for self-reflection no independent evidence
    purpose: Refine existing knowledge associations for backward transfer
    New mechanism that examines tree-recorded associations.

pith-pipeline@v0.9.0 · 5570 in / 1350 out tokens · 48845 ms · 2026-05-13T20:55:02.870823+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    we organize instrument prompts into a hierarchical prompt parsing tree with the instrument-shared prompt partition as the root node, n-part-shared prompt partitions as intermediate nodes and instrument-distinct prompt partitions as leaf nodes... directed-weighted graph propagation

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

  1. [1]

    2017 robotic instrument segmentation challenge.arXiv preprint arXiv:1902.06426, 2019

    Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, et al. 2017 robotic instrument segmentation challenge.arXiv preprint arXiv:1902.06426, 2019. 6, 1

  2. [2]

    2018 robotic scene segmentation challenge

    Max Allan, Satoshi Kondo, Sebastian Bodenstedt, Stefan Leger, Rahim Kadkhodamohammadi, Imanol Luengo, Fe- lix Fuentes, Evangello Flouty, Ahmed Mohammed, Marius Pedersen, et al. 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190, 2020. 6, 1

  3. [3]

    Code-cl: Conceptor-based gradient projection for deep con- tinual learning

    Marco PE Apolinario, Sakshi Choudhary, and Kaushik Roy. Code-cl: Conceptor-based gradient projection for deep con- tinual learning. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 775–784,

  4. [4]

    Algebraic perron-frobenius theory.Linear Algebra and its Applications, 11(3):219–233,

    GP Barker and Hans Schneider. Algebraic perron-frobenius theory.Linear Algebra and its Applications, 11(3):219–233,

  5. [5]

    Modeling the background for incremental learning in semantic segmentation

    Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulo, Elisa Ricci, and Barbara Caputo. Modeling the background for incremental learning in semantic segmentation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9233–9242, 2020. 7

  6. [6]

    Ma-sam: Modality-agnostic sam adap- tation for 3d medical image segmentation.Medical Image Analysis, 98:103310, 2024

    Cheng Chen, Juzheng Miao, Dufan Wu, Aoxiao Zhong, Zhiling Yan, Sekeun Kim, Jiang Hu, Zhengliang Liu, Lichao Sun, Xiang Li, et al. Ma-sam: Modality-agnostic sam adap- tation for 3d medical image segmentation.Medical Image Analysis, 98:103310, 2024. 3, 1

  7. [7]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for seman- tic image segmentation.arXiv preprint arXiv:1706.05587,

  8. [8]

    Don't forget, there is more than forgetting: new metrics for Continual Learning

    Natalia D ´ıaz-Rodr´ıguez, Vincenzo Lomonaco, David Filliat, and Davide Maltoni. Don’t forget, there is more than for- getting: new metrics for continual learning.arXiv preprint arXiv:1810.13166, 2018. 2, 7, 1

  9. [9]

    Plop: Learning without forgetting for contin- ual semantic segmentation

    Arthur Douillard, Yifu Chen, Arnaud Dapogny, and Matthieu Cord. Plop: Learning without forgetting for contin- ual semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4040–4050, 2021. 7

  10. [10]

    Prompt tuning for parameter-efficient medical image segmentation

    Marc Fischer, Alexander Bartler, and Bin Yang. Prompt tuning for parameter-efficient medical image segmentation. Medical Image Analysis, 91:103024, 2024. 3

  11. [11]

    Multi-domain incremental learning for semantic segmenta- tion

    Prachi Garg, Rohit Saluja, Vineeth N Balasubramanian, Chetan Arora, Anbumani Subramanian, and CV Jawahar. Multi-domain incremental learning for semantic segmenta- tion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 761–771, 2022. 3

  12. [12]

    Isinet: an instance-based approach for surgical instru- ment segmentation

    Cristina Gonz ´alez, Laura Bravo-S ´anchez, and Pablo Arbe- laez. Isinet: an instance-based approach for surgical instru- ment segmentation. InInternational conference on medical image computing and computer-assisted intervention, pages 595–605. Springer, 2020. 6

  13. [13]

    Multiple prompt fusion for zero-shot lesion detection using vision-language models

    Miaotian Guo, Huahui Yi, Ziyuan Qin, Haiying Wang, Aidong Men, and Qicheng Lao. Multiple prompt fusion for zero-shot lesion detection using vision-language models. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 283–292. Springer,

  14. [14]

    Cholecseg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80.arXiv preprint arXiv:2012.12453, 2020

    W-Y Hong, C-L Kao, Y-H Kuo, J-R Wang, W-L Chang, and C-S Shih. Cholecseg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80.arXiv preprint arXiv:2012.12453, 2020. 6

  15. [15]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 4015–4026, 2023. 3, 6, 7, 1

  16. [16]

    Domain- incremental cardiac image segmentation with style-oriented replay and domain-sensitive feature whitening.IEEE Trans- actions on Medical Imaging, 42(3):570–581, 2022

    Kang Li, Lequan Yu, and Pheng-Ann Heng. Domain- incremental cardiac image segmentation with style-oriented replay and domain-sensitive feature whitening.IEEE Trans- actions on Medical Imaging, 42(3):570–581, 2022. 3

  17. [17]

    A dual enrichment synergistic strategy to handle data heterogeneity for domain incremental cardiac segmentation.IEEE Trans- actions on Medical Imaging, 43(6):2279–2290, 2024

    Kang Li, Yu Zhu, Lequan Yu, and Pheng-Ann Heng. A dual enrichment synergistic strategy to handle data heterogeneity for domain incremental cardiac segmentation.IEEE Trans- actions on Medical Imaging, 43(6):2279–2290, 2024. 3

  18. [18]

    Caprompt: Cyclic prompt aggregation for pre-trained model based class incremental learning

    Qiwei Li and Jiahuan Zhou. Caprompt: Cyclic prompt aggregation for pre-trained model based class incremental learning. InProceedings of the AAAI Conference on Arti- ficial Intelligence, pages 18421–18429, 2025. 2, 3

  19. [19]

    Learning without forgetting

    Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelli- gence, 40(12):2935–2947, 2017. 7

  20. [20]

    Learning incrementally to segment mul- tiple organs in a ct image

    Pengbo Liu, Xia Wang, Mengsi Fan, Hongli Pan, Minmin Yin, Xiaohong Zhu, Dandan Du, Xiaoying Zhao, Li Xiao, Lian Ding, et al. Learning incrementally to segment mul- tiple organs in a ct image. InInternational Conference on Medical Image Computing and Computer-Assisted Interven- tion, pages 714–724. Springer, 2022. 7

  21. [21]

    Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017

    David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017. 7, 1

  22. [22]

    m2caiseg: Semantic segmentation of laparoscopic images using convolutional neural networks.arXiv preprint arXiv:2008.10134, 2020

    Salman Maqbool, Aqsa Riaz, Hasan Sajid, and Osman Hasan. m2caiseg: Semantic segmentation of laparoscopic images using convolutional neural networks.arXiv preprint arXiv:2008.10134, 2020. 6

  23. [23]

    Incremental learn- ing techniques for semantic segmentation

    Umberto Michieli and Pietro Zanuttigh. Incremental learn- ing techniques for semantic segmentation. InProceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019. 7

  24. [24]

    Continual semantic segmentation via repulsion-attraction of sparse and disentan- gled latent representations

    Umberto Michieli and Pietro Zanuttigh. Continual semantic segmentation via repulsion-attraction of sparse and disentan- gled latent representations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1114–1124, 2021. 7

  25. [25]

    The pagerank citation ranking: Bringing order to the web

    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford infolab, 1999. 5

  26. [26]

    Pearl: Input- agnostic prompt enhancement with negative feedback regu- lation for class-incremental learning

    Yongchun Qin, Pengfei Fang, and Hui Xue. Pearl: Input- agnostic prompt enhancement with negative feedback regu- lation for class-incremental learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20051– 20059, 2025. 3

  27. [27]

    The surprising positive knowledge transfer in contin- ual 3d object shape reconstruction

    Anh Thai, Stefan Stojanov, Zixuan Huang, and James M Rehg. The surprising positive knowledge transfer in contin- ual 3d object shape reconstruction. In2022 International Conference on 3D Vision (3DV), pages 209–218. IEEE,

  28. [28]

    Digraph inception con- volutional networks.Advances in neural information pro- cessing systems, 33:17907–17918, 2020

    Zekun Tong, Yuxuan Liang, Changsheng Sun, Xinke Li, David Rosenblum, and Andrew Lim. Digraph inception con- volutional networks.Advances in neural information pro- cessing systems, 33:17907–17918, 2020. 5, 1

  29. [29]

    Three types of incremental learning.Nature Machine Intelligence, 4(12):1185–1197, 2022

    Gido M Van de Ven, Tinne Tuytelaars, and Andreas S To- lias. Three types of incremental learning.Nature Machine Intelligence, 4(12):1185–1197, 2022. 2

  30. [30]

    Emerging robotic platforms for min- imally invasive surgery.IEEE reviews in biomedical engi- neering, 6:111–126, 2012

    Valentina Vitiello, Su-Lin Lee, Thomas P Cundy, and Guang-Zhong Yang. Emerging robotic platforms for min- imally invasive surgery.IEEE reviews in biomedical engi- neering, 6:111–126, 2012. 1

  31. [31]

    Afec: Active forgetting of negative transfer in continual learning

    Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, and Yi Zhong. Afec: Active forgetting of negative transfer in continual learning. Advances in Neural Information Processing Systems, 34: 22379–22391, 2021. 2

  32. [32]

    S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning.Advances in Neural Informa- tion Processing Systems, 35:5682–5695, 2022

    Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning.Advances in Neural Informa- tion Processing Systems, 35:5682–5695, 2022. 2

  33. [33]

    Dualprompt: Complementary prompting for rehearsal-free continual learning

    Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vin- cent Perot, Jennifer Dy, et al. Dualprompt: Complementary prompting for rehearsal-free continual learning. InEuropean conference on computer vision, pages 631–648. Springer,

  34. [34]

    Learning to prompt for con- tinual learning

    Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for con- tinual learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 139–149,

  35. [35]

    Endpoints weight fusion for class incremental semantic segmentation

    Jia-Wen Xiao, Chang-Bin Zhang, Jiekang Feng, Xialei Liu, Joost van de Weijer, and Ming-Ming Cheng. Endpoints weight fusion for class incremental semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7204–7213, 2023. 3, 7

  36. [36]

    Privacy-preserving synthetic continual semantic seg- mentation for robotic surgery.IEEE transactions on medical imaging, 2024

    Mengya Xu, Mobarakol Islam, Long Bai, and Hongliang Ren. Privacy-preserving synthetic continual semantic seg- mentation for robotic surgery.IEEE transactions on medical imaging, 2024. 3, 7

  37. [37]

    Contin- ual learning through synaptic intelligence

    Friedemann Zenke, Ben Poole, and Surya Ganguli. Contin- ual learning through synaptic intelligence. InInternational conference on machine learning, pages 3987–3995. PMLR,

  38. [38]

    Representation compensation networks for continual semantic segmentation

    Chang-Bin Zhang, Jia-Wen Xiao, Xialei Liu, Ying-Cong Chen, and Ming-Ming Cheng. Representation compensation networks for continual semantic segmentation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7053–7064, 2022. 7

  39. [39]

    Coinseg: Contrast inter-and intra-class representations for incremental segmentation

    Zekang Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, and Yunchao Wei. Coinseg: Contrast inter-and intra-class representations for incremental segmentation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 843–853, 2023. 7

  40. [40]

    Memory-efficient prompt tuning for incremental histopathol- ogy classification

    Yu Zhu, Kang Li, Lequan Yu, and Pheng Ann Heng. Memory-efficient prompt tuning for incremental histopathol- ogy classification. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7802–7810, 2024. 3 Supplementary In the supplementary, we additionally provide implementa- tion details, training scheme descriptions, complete visual- izatio...

  41. [41]

    How many parts does this instrument consist of?

    MLLM-based Part Count Estimation In most surgical video datasets, instrument annotations are provided in a part-wise manner, i.e., each part of the instru- ment is labeled as a different category. Consequently, the number of parts for each instrument class can be directly obtained as a prior (all four datasets used in our paper fol- low this convention). ...

  42. [42]

    Based on the supplementary ma- terials, along with the provided code and the open-source implementations of [6] and [28], our project can be fully reproduced

    Implementation details We follow the preprocessing procedure described in [6] to preprocess the dataset. Based on the supplementary ma- terials, along with the provided code and the open-source implementations of [6] and [28], our project can be fully reproduced. When using SAM, we adopt the SAM ViT- H [15] version as our encoderθ Enc. In our practical im...

  43. [43]

    Illustration of HPPT construction

    Metrics for Class-Incremental Learning To provide a deeper understanding of model performance in the class-incremental learning (CIL) setting, we adopt two widely-used evaluation metrics: Backward Transfer (BWT) Figure 1. Illustration of HPPT construction. UP CAMCS PF LND BF GR SI VS 1-part 2-part 3-part 1-part-shared Partition 2-part-shared Partition 3-p...

  44. [44]

    Training and Inference Scheme In the first episode, we freeze the encoder of SAM and jointly train the decoder, segmentation head, adapter, and the prompt parsing tree. Since each image in the dataset typically contains more than one class of surgical instru- ment, we activate all instrument-aware prompts during the forward propagation, allowing each pixe...

  45. [45]

    Discussion Although prompt-based approaches and foundation mod- els have recently emerged in surgical instrument segmen- tation, none have addressed the Class-Incremental Learning (CIL) setting. To the best of our knowledge, this work is the first to integrate prompt tuning with the Segment Anything Model (SAM) specifically for surgical instrument class i...