Recognition: 2 theorem links
· Lean TheoremM⁴-SAM: Multi-Modal Mixture-of-Experts with Memory-Augmented SAM for RGB-D Video Salient Object Detection
Pith reviewed 2026-05-13 06:05 UTC · model grok-4.3
The pith
M⁴-SAM adapts SAM2 for RGB-D video salient object detection using modality-aware experts, gated multi-scale fusion, and prompt-free memory initialization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
M⁴-SAM equips SAM2 with Modality-Aware MoE-LORA that uses convolutional experts and a modality dispatcher for efficient fine-tuning, Gated Multi-Level Feature Fusion that hierarchically aggregates multi-scale features via adaptive gating, and Pseudo-Guided Initialization that bootstraps the memory bank from a coarse mask, enabling effective prompt-free RGB-D VSOD and state-of-the-art performance on three public datasets.
What carries the argument
The three integrated components of M⁴-SAM: Modality-Aware MoE-LORA for spatial and multi-modal adaptation, Gated Multi-Level Feature Fusion for hierarchical feature balancing, and Pseudo-Guided Initialization for prompt-free memory setup.
If this is right
- RGB-D video salient object detection becomes feasible in a fully prompt-free manner using only a coarse mask to seed the memory bank.
- Convolutional experts inside MoE-LoRA provide stronger local spatial priors than standard linear LoRA for video tasks involving depth.
- Adaptive gating in multi-level fusion lets SAM2 features trade off detail and semantics without manual scale selection.
- The overall design supports direct transfer to other multi-modal video segmentation tasks that currently rely on SAM2.
Where Pith is reading between the lines
- Memory banks seeded by coarse priors could reduce prompt engineering needs in other SAM-based video applications such as tracking or change detection.
- The modality dispatcher mechanism might generalize to additional input types like thermal or event data if retrained on mixed datasets.
- Hierarchical gated fusion may improve performance in any SAM2 downstream task where multi-scale encoder outputs are currently underutilized.
Load-bearing premise
That the three added components directly solve the listed challenges of linear LoRA, multi-scale underuse, and prompt dependence, and that the benchmark comparisons reflect genuine gains rather than dataset-specific tuning.
What would settle it
An ablation experiment on the same three datasets that removes any one of the three components and shows performance falling to or below existing RGB-D VSOD baselines would falsify the claim that the full combination is necessary for the reported gains.
Figures
read the original abstract
The Segment Anything Model 2 (SAM2) has emerged as a foundation model for universal segmentation. Owing to its generalizable visual representations, SAM2 has been successfully applied to various downstream tasks. However, extending SAM2 to the RGB-D video salient object detection (RGB-D VSOD) task encounters three challenges including limited spatial modeling of linear LoRA, insufficient employment of SAM's multi-scale features, and dependence of initialization on explicit prompts. To address the issues, we present Multi-Modal Mixture-of-Experts with Memory-Augmented SAM (M$^4$-SAM), which equips SAM2 with modality-related PEFT, hierarchical feature fusion, and prompt-free memory initialization. Firstly, we inject Modality-Aware MoE-LORA, which employs convolutional experts to encode local spatial priors and introduces a modality dispatcher for efficient multi-modal fine-tuning, into SAM2's encoder. Secondly, we deploy Gated Multi-Level Feature Fusion, which hierarchically aggregates multi-scale encoder features with an adaptive gating mechanism, to balance spatial details and semantic context. Finally, to conduct zero-shot VSOD without manual prompts, we utilize a Pseudo-Guided Initialization, where a coarse mask is regarded as a pseudo prior and used to bootstrap the memory bank. Extensive experiments demonstrate that M$^4$-SAM achieves the state-of-the-art performance across all evaluation metrics on three public RGB-D VSOD datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents M⁴-SAM, an extension of SAM2 for RGB-D video salient object detection. It identifies three challenges (limited spatial modeling in linear LoRA, insufficient multi-scale feature use, and prompt dependence) and proposes three components to address them: Modality-Aware MoE-LORA (convolutional experts plus modality dispatcher for multi-modal PEFT), Gated Multi-Level Feature Fusion (adaptive hierarchical aggregation of encoder features), and Pseudo-Guided Initialization (coarse-mask bootstrapping of the memory bank for zero-shot operation). The manuscript claims this yields state-of-the-art results across all metrics on three public RGB-D VSOD datasets.
Significance. If the performance claims and component contributions are substantiated, the work would offer a concrete recipe for adapting SAM2-style foundation models to multi-modal video tasks, particularly by injecting spatial priors, gated multi-scale fusion, and prompt-free memory. This could influence downstream applications in video segmentation where depth and RGB must be jointly modeled without manual prompts.
major comments (3)
- [Abstract] Abstract: The SOTA claim across all metrics on three datasets is stated without any quantitative tables, specific metric values, baseline numbers, or statistical significance tests, so the central empirical assertion cannot be evaluated or reproduced from the manuscript text.
- [Method] Method sections describing the three components: No ablation studies isolate the contribution of Modality-Aware MoE-LORA, Gated Multi-Level Feature Fusion, or Pseudo-Guided Initialization to the stated challenges, which is required to attribute any gains to the proposed innovations rather than other factors.
- [Experiments] Experiments: The manuscript supplies no implementation details, training protocols, hyper-parameter settings, dataset splits, or confirmation that baselines were re-implemented under identical conditions, leaving open the possibility that reported improvements arise from experimental confounds rather than the architectural changes.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The three major comments identify areas where the manuscript can be strengthened for clarity, reproducibility, and attribution of results. We address each point below and will incorporate the suggested improvements in the revised version.
read point-by-point responses
-
Referee: [Abstract] Abstract: The SOTA claim across all metrics on three datasets is stated without any quantitative tables, specific metric values, baseline numbers, or statistical significance tests, so the central empirical assertion cannot be evaluated or reproduced from the manuscript text.
Authors: We agree that the abstract would be more informative with concrete numbers. In the revision we will add the key quantitative results (maximum F-measure, S-measure, mean absolute error) achieved by M⁴-SAM together with the strongest baseline on each of the three RGB-D VSOD datasets. This will allow readers to directly assess the magnitude of the reported gains without needing to consult the tables. revision: yes
-
Referee: [Method] Method sections describing the three components: No ablation studies isolate the contribution of Modality-Aware MoE-LORA, Gated Multi-Level Feature Fusion, or Pseudo-Guided Initialization to the stated challenges, which is required to attribute any gains to the proposed innovations rather than other factors.
Authors: We acknowledge that explicit ablations are necessary to link each component to the three identified challenges. Although the method section describes the modules, we will add a dedicated ablation subsection that incrementally activates Modality-Aware MoE-LORA, Gated Multi-Level Feature Fusion, and Pseudo-Guided Initialization on top of the SAM2 baseline and reports the resulting performance deltas on all three datasets. This will provide direct evidence for the contribution of each innovation. revision: yes
-
Referee: [Experiments] Experiments: The manuscript supplies no implementation details, training protocols, hyper-parameter settings, dataset splits, or confirmation that baselines were re-implemented under identical conditions, leaving open the possibility that reported improvements arise from experimental confounds rather than the architectural changes.
Authors: We agree that full experimental transparency is required. We will insert a new “Implementation Details” subsection that specifies the optimizer, learning-rate schedule, batch size, number of epochs, hardware, exact train/validation/test splits for each dataset, and an explicit statement that all baselines were re-implemented and evaluated under the identical protocol and data splits used for M⁴-SAM. revision: yes
Circularity Check
No significant circularity; empirical model design with benchmark evaluation
full rationale
The paper presents an empirical architecture extension of SAM2 for RGB-D VSOD, introducing three components (Modality-Aware MoE-LORA, Gated Multi-Level Feature Fusion, Pseudo-Guided Initialization) and claiming SOTA via dataset experiments. No equations, derivations, or first-principles results are shown that reduce any claimed performance or prediction to quantities defined by the authors' own fitted parameters, self-citations, or ansatzes. The central claim rests on experimental comparisons rather than any self-referential mathematical chain, making this a standard non-circular empirical contribution.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption SAM2 supplies generalizable visual representations that can be extended to RGB-D video salient object detection
invented entities (3)
-
Modality-Aware MoE-LORA
no independent evidence
-
Gated Multi-Level Feature Fusion
no independent evidence
-
Pseudo-Guided Initialization
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Modality-Aware MoE-LoRA... Gated Multi-Level Feature Fusion... Pseudo-Guided Initialization... AdamW... rank r=4... top-2 experts
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experiments demonstrate that M⁴-SAM achieves the state-of-the-art performance across all evaluation metrics on three public RGB-D VSOD datasets
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Frequency-tuned salient region detec- tion
Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. Frequency-tuned salient region detec- tion. InCVPR, pages 1597–1604. IEEE, 2009
work page 2009
-
[2]
Liuxin Bao, Xiaofei Zhou, Xiankai Lu, Yaoqi Sun, Haib- ing Yin, Zhenghui Hu, Jiyong Zhang, and Chenggang Yan. Quality-aware selective fusion network for VDT salient ob- ject detection.IEEE TIP, 33:3212–3226, 2024
work page 2024
-
[3]
Salient object detection: A survey.CVM, 5(2):117– 150, 2019
Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, and Jia Li. Salient object detection: A survey.CVM, 5(2):117– 150, 2019
work page 2019
-
[4]
Tianrun Chen, Ankang Lu, Lanyun Zhu, Chaotao Ding, Chu- nan Yu, Deyi Ji, Zejian Li, Lingyun Sun, Papa Mao, and Ying Zang. SAM2-Adapter: Evaluating & adapting Seg- ment Anything 2 in downstream tasks: Camouflage, shadow, medical image segmentation, and more.arXiv preprint arXiv:2408.04579, 2024
-
[5]
XMem: Long- term video object segmentation with an Atkinson-Shiffrin memory model
Ho Kei Cheng and Alexander G Schwing. XMem: Long- term video object segmentation with an Atkinson-Shiffrin memory model. InECCV, pages 640–658. Springer, 2022
work page 2022
-
[6]
Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. Rethink- ing space-time networks with improved memory coverage for efficient video object segmentation. InNeurIPS, pages 11781–11794, 2021
work page 2021
-
[7]
Dual pro- totype attention for unsupervised video object segmentation
Suhwan Cho, Minhyeok Lee, Seunghoon Lee, Dogyoon Lee, Heeseung Choi, Ig-Jae Kim, and Sangyoun Lee. Dual pro- totype attention for unsupervised video object segmentation. InCVPR, pages 19238–19247, 2024
work page 2024
-
[8]
Transflow: Motion knowledge transfer from video diffusion models to video salient object detec- tion
Suhwan Cho, Minhyeok Lee, Jungho Lee, Sunghun Yang, and Sangyoun Lee. Transflow: Motion knowledge transfer from video diffusion models to video salient object detec- tion. InICCVW, pages 3803–3813, 2025
work page 2025
-
[9]
Point-aware interaction and CNN-induced refinement network for RGB-D salient object detection
Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song, and Sam Kwong. Point-aware interaction and CNN-induced refinement network for RGB-D salient object detection. InACM MM, pages 406–416, 2023
work page 2023
-
[10]
MemSAM: Taming Segment Anything Model for echocar- diography video segmentation
Xiaolong Deng, Huisi Wu, Runhao Zeng, and Jing Qin. MemSAM: Taming Segment Anything Model for echocar- diography video segmentation. InCVPR, pages 9622–9631, 2024
work page 2024
-
[11]
Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine In- telligence, 5(3):220–235, 2023
work page 2023
-
[12]
Structure-measure: A new way to evaluate foreground maps
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. Structure-measure: A new way to evaluate foreground maps. InICCV, pages 4548–4557, 2017
work page 2017
-
[13]
Enhanced-alignment measure for binary foreground map evaluation
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming- Ming Cheng, and Ali Borji. Enhanced-alignment measure for binary foreground map evaluation. InIJCAI, pages 698– 704, 2018
work page 2018
-
[14]
Shifting more attention to video salient object detection
Deng-Ping Fan, Wenguan Wang, Ming-Ming Cheng, and Jianbing Shen. Shifting more attention to video salient object detection. InCVPR, pages 8554–8564, 2019
work page 2019
-
[15]
Multi-scale and detail-enhanced Segment Anything Model for salient object detection
Shixuan Gao, Pingping Zhang, Tianyu Yan, and Huchuan Lu. Multi-scale and detail-enhanced Segment Anything Model for salient object detection. InACM MM, pages 9894– 9903, 2024
work page 2024
-
[16]
Junwei Han, Hao Chen, Nian Liu, Chenggang Yan, and Xue- long Li. CNNs-based RGB-D saliency detection via cross- view transfer and multiview fusion.IEEE TCYB, 48(11): 3171–3183, 2017
work page 2017
-
[17]
Deeply supervised salient object detection with short connections
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip HS Torr. Deeply supervised salient object detection with short connections. InCVPR, pages 3203–3212, 2017
work page 2017
-
[18]
Parameter-efficient transfer learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InICML, pages 2790–2799. PMLR, 2019
work page 2019
-
[19]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In ICLR, 2022
work page 2022
-
[20]
Calibrated RGB-D salient object detection
Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, et al. Calibrated RGB-D salient object detection. InCVPR, pages 9471–9481, 2021
work page 2021
-
[21]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment Any- thing. InICCV, pages 4015–4026, 2023
work page 2023
-
[22]
DVSOD: RGB-D video salient object detection
Jingjing Li, Wei Ji, Size Wang, Wenbo Li, and Li Cheng. DVSOD: RGB-D video salient object detection. InNeurIPS, pages 8774–8787, 2023
work page 2023
-
[23]
Ping Li, Yu Zhang, Li Yuan, Huaxin Xiao, Binbin Lin, and Xianghua Xu. Efficient long-short temporal attention net- work for unsupervised video object segmentation.PR, 146: 110078, 2024
work page 2024
-
[24]
Xingyuan Li, Ruichao Hou, Tongwei Ren, and Gangshan Wu. KAN-SAM: Kolmogorov-Arnold network guided Seg- ment Anything Model for RGB-T salient object detection. In ICME, pages 1–6. IEEE, 2025
work page 2025
-
[25]
ViDSOD-100: A new dataset and a baseline model for RGB-D video salient object detection
Junhao Lin, Lei Zhu, Jiaxing Shen, Huazhu Fu, Qing Zhang, and Liansheng Wang. ViDSOD-100: A new dataset and a baseline model for RGB-D video salient object detection. IJCV, 132(11):5173–5191, 2024
work page 2024
-
[26]
Nian Liu, Ni Zhang, Kaiyuan Wan, Ling Shao, and Junwei Han. Visual saliency transformer. InICCV, pages 4722– 4732, 2021
work page 2021
-
[27]
Receptive field block net for accurate and fast object detection
Songtao Liu, Di Huang, and Yunhong Wang. Receptive field block net for accurate and fast object detection. InECCV, pages 385–400, 2018
work page 2018
-
[28]
KAN: Kolmogorov-Arnold Networks
Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Solja ˇci´c, Thomas Y Hou, and Max Tegmark. KAN: Kolmogorov-Arnold networks.arXiv preprint arXiv:2404.19756, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Salient object detection in RGB-D videos
Ao Mou, Yukang Lu, Jiahao He, Dingyao Min, Keren Fu, and Qijun Zhao. Salient object detection in RGB-D videos. IEEE TIP, 33:6660–6675, 2024
work page 2024
-
[30]
Segmentation of moving objects by long term video analysis.IEEE TPAMI, 36(6):1187–1200, 2013
Peter Ochs, Jitendra Malik, and Thomas Brox. Segmentation of moving objects by long term video analysis.IEEE TPAMI, 36(6):1187–1200, 2013
work page 2013
-
[31]
Video object segmentation using space-time memory networks
Seoung Wug Oh, Joon-Young Lee, Ning Xu, and Seon Joo Kim. Video object segmentation using space-time memory networks. InICCV, pages 9226–9235, 2019
work page 2019
-
[32]
Multi-scale interactive network for salient object detection
Youwei Pang, Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. Multi-scale interactive network for salient object detection. InCVPR, pages 9413–9422, 2020
work page 2020
-
[33]
RGBD salient object detection: A benchmark and algorithms
Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. RGBD salient object detection: A benchmark and algorithms. InECCV, pages 92–109. Springer, 2014
work page 2014
-
[34]
BASNet: Boundary-aware salient object detection
Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. BASNet: Boundary-aware salient object detection. InCVPR, pages 7479–7489, 2019
work page 2019
-
[35]
RGBD salient object detection via deep fusion.IEEE TIP, 26(5):2274– 2285, 2017
Liangqiong Qu, Shengfeng He, Jiawei Zhang, Jiandong Tian, Yandong Tang, and Qingxiong Yang. RGBD salient object detection via deep fusion.IEEE TIP, 26(5):2274– 2285, 2017
work page 2017
-
[36]
SAM 2: Segment Anything in images and videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. SAM 2: Segment Anything in images and videos. InICLR, 2025
work page 2025
-
[37]
Hi- era: A hierarchical vision transformer without the bells-and- whistles
Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, et al. Hi- era: A hierarchical vision transformer without the bells-and- whistles. InICML, pages 29441–29454. PMLR, 2023
work page 2023
-
[38]
Ex- plore the potential of CLIP for training-free open vocabulary semantic segmentation
Tong Shao, Zhuotao Tian, Hang Zhao, and Jingyong Su. Ex- plore the potential of CLIP for training-free open vocabulary semantic segmentation. InECCV, pages 139–156. Springer, 2024
work page 2024
-
[39]
Outra- geously large neural networks: The sparsely-gated mixture- of-experts layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outra- geously large neural networks: The sparsely-gated mixture- of-experts layer. InICLR, 2017
work page 2017
-
[40]
Haoran Shen, Peixian Zhuang, Jiahao Kou, Yuxin Zeng, Haoying Xu, and Jiangyun Li. MGD-SAM2: Multi- view guided detail-enhanced segment anything model 2 for high-resolution class-agnostic segmentation.arXiv preprint arXiv:2503.23786, 2025
-
[41]
Zhuo Su, Li Liu, Matthias M ¨uller, Jiehua Zhang, Diana Wofk, Ming-Ming Cheng, and Matti Pietik ¨ainen. Rapid salient object detection with difference convolutional neural networks.IEEE TPAMI, 47(10):9061–9077, 2025
work page 2025
-
[42]
Lightweight multi-frequency enhancement network for RGB-D video salient object detec- tion
Daerji Suolang, Jiahao He, Wangchuk Tsering, Keren Fu, Xiaofeng Li, and Qijun Zhao. Lightweight multi-frequency enhancement network for RGB-D video salient object detec- tion. InICASSP, pages 1–5. IEEE, 2025
work page 2025
-
[43]
HRTransNet: HRFormer-driven two-modality salient object detection.IEEE TCSVT, 33(2):728–742, 2022
Bin Tang, Zhengyi Liu, Yacheng Tan, and Qian He. HRTransNet: HRFormer-driven two-modality salient object detection.IEEE TCSVT, 33(2):728–742, 2022
work page 2022
-
[44]
RGBT salient object detection: A large-scale dataset and benchmark.IEEE TMM, 25:4163– 4176, 2022
Zhengzheng Tu, Yan Ma, Zhun Li, Chenglong Li, Jieming Xu, and Yongtao Liu. RGBT salient object detection: A large-scale dataset and benchmark.IEEE TMM, 25:4163– 4176, 2022
work page 2022
-
[45]
Bin Wan, Xiaofei Zhou, Bolun Zheng, Haibing Yin, Zunjie Zhu, Hongkui Wang, Yaoqi Sun, Jiyong Zhang, and Cheng- gang Yan. LFRNet: Localizing, focus, and refinement net- work for salient object detection of surface defects.IEEE TIM, 72:1–12, 2023
work page 2023
-
[46]
Adaptive fusion for RGB-D salient object detection.IEEE Access, 7:55277– 55284, 2019
Ningning Wang and Xiaojin Gong. Adaptive fusion for RGB-D salient object detection.IEEE Access, 7:55277– 55284, 2019
work page 2019
-
[47]
Wenguan Wang, Jianbing Shen, and Ling Shao. Consistent video saliency using local gradient flow optimization and global refinement.IEEE TIP, 24(11):4185–4196, 2015
work page 2015
-
[48]
F 3Net: fu- sion, feedback and focus for salient object detection
Jun Wei, Shuhui Wang, and Qingming Huang. F 3Net: fu- sion, feedback and focus for salient object detection. In AAAI, pages 12321–12328, 2020
work page 2020
-
[49]
Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Fei- long Tang, Ying Chen, Siying Li, Jie Ma, and Guanbin Li. SAM2-UNet: Segment Anything 2 makes strong encoder for natural and medical image segmentation.Visual Intelligence, 4(1):2, 2026
work page 2026
-
[50]
arXiv preprint arXiv:2304.13785 (2023)
Kaidong Zhang and Dong Liu. Customized Segment Any- thing Model for medical image segmentation.arXiv preprint arXiv:2304.13785, 2023
-
[51]
A single stream network for robust and real-time RGB-D salient object detection
Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, and Lei Zhang. A single stream network for robust and real-time RGB-D salient object detection. InECCV, pages 646–662. Springer, 2020
work page 2020
-
[52]
Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, and Chun Yuan. Convolution meets LoRA: Parameter efficient finetuning for Segment Anything Model.arXiv preprint arXiv:2401.17868, 2024
-
[53]
RGB-D salient object detection: A survey.CVM, 7(1):37–69, 2021
Tao Zhou, Deng-Ping Fan, Ming-Ming Cheng, Jianbing Shen, and Ling Shao. RGB-D salient object detection: A survey.CVM, 7(1):37–69, 2021
work page 2021
-
[54]
Xiaofei Zhou, Hao Fang, Zhi Liu, Bolun Zheng, Yaoqi Sun, Jiyong Zhang, and Chenggang Yan. Dense attention-guided cascaded network for salient object detection of strip steel surface defects.IEEE TIM, 71:1–14, 2021
work page 2021
-
[55]
Xiaofei Zhou, Weipeng Cao, Hanxiao Gao, Zhong Ming, and Jiyong Zhang. STI-Net: Spatiotemporal integration network for video saliency detection.Information Sciences, 628:134– 147, 2023
work page 2023
-
[56]
Xiaofei Zhou, Songhe Wu, Ran Shi, Bolun Zheng, Shuai Wang, Haibing Yin, Jiyong Zhang, and Chenggang Yan. Transformer-based multi-scale feature integration network for video saliency prediction.IEEE TCSVT, 33(12):7696– 7707, 2023
work page 2023
-
[57]
Salient object detection via integrity learning.IEEE TPAMI, 45(3):3738–3752, 2022
Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, and Ling Shao. Salient object detection via integrity learning.IEEE TPAMI, 45(3):3738–3752, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.