HVPNet: A Bio-Inspired Network for General Salient and Camouflaged Object Detection

Jiacong Yu; Jiawei Xu; Qiangqiang Zhou; Yanjiao Shi; Yugen Yi; Zhouping Li

arxiv: 2606.31496 · v1 · pith:5PYVFK44new · submitted 2026-06-30 · 💻 cs.CV

HVPNet: A Bio-Inspired Network for General Salient and Camouflaged Object Detection

Jiawei Xu , Qiangqiang Zhou , Zhouping Li , Yanjiao Shi , Yugen Yi , Jiacong Yu This is my paper

Pith reviewed 2026-07-01 06:07 UTC · model grok-4.3

classification 💻 cs.CV

keywords salient object detectioncamouflaged object detectionbio-inspired networkmultimodal fusionretinal integrationcortical decodercomputer visionobject detection

0 comments

The pith

A bio-inspired network modeled on retinal integration and cortical decoding detects salient and camouflaged objects accurately across modalities with simpler structure than complex fusion methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that drawing from the human visual system's multi-layered retinal processing and hierarchical cortical decoding allows a simpler architecture to handle multimodal salient and camouflaged object detection without the parameter bloat of typical cross-modal fusion. Current methods often add redundant structures that enlarge models and sometimes hurt performance. HVPNet tests this by introducing a level-specific integration module and a two-stage decoder that together support seven tasks over four modalities. The result is presented as an accuracy-efficiency balance on 22 datasets. If correct, this indicates that bio-mimetic staging can replace elaborate fusion designs for these detection problems.

Core claim

HVPNet is built around a Retinal Integration Module that fuses multimodal features via level-specific multi-stage integration and a cortical decoder that splits decoding into low- and high-level stages. This pair of components lets the single architecture extend directly to seven tasks across four modalities and deliver competitive accuracy with lower complexity on 22 datasets without extra fusion modules or task-specific tuning.

What carries the argument

Retinal Integration Module (RIM) that applies level-specific multi-stage integration to multimodal features, paired with a cortical decoder (CD) that separates low- and high-level visual processing stages.

If this is right

The single architecture applies unchanged to seven distinct detection tasks spanning four input modalities.
Accuracy-efficiency trade-offs hold across all 22 evaluated datasets for both salient and camouflaged object detection.
Structural redundancy is reduced by replacing explicit cross-modal fusion blocks with staged retinal-style integration.
No task-specific redesign or additional modules are required to reach the reported performance levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The staged integration pattern could be tested on other multimodal vision problems such as semantic segmentation or instance segmentation.
Efficiency gains may prove especially useful in embedded or mobile settings where parameter count directly limits deployment.
Further work might check whether the low/high-level split in the decoder aligns with measurable differences in feature complexity at those stages.

Load-bearing premise

That modeling retinal multi-stage integration and cortical hierarchy produces simpler yet equally or more accurate detection by avoiding the redundancy of conventional cross-modal fusion.

What would settle it

Direct comparison on one of the 22 multimodal datasets where a standard complex fusion model records both higher detection accuracy and lower runtime or parameter count than HVPNet.

Figures

Figures reproduced from arXiv: 2606.31496 by Jiacong Yu, Jiawei Xu, Qiangqiang Zhou, Yanjiao Shi, Yugen Yi, Zhouping Li.

**Figure 2.** Figure 2: Overall architecture of the proposed HVPNet for general SOD and COD tasks. It consists of three stages: feature extraction, fusion, and cortical decoding [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the retinal integration module (RIM). We employ three distinct stages to address the specificities of features at di [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the cortical decoder (CD) module. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: PR curves comparison of different models on eight RGB and RGB-D SOD datasets [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of our model with state-of-the-art methods, including [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of failure cases. (GT: ground truth) [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Visualized feature maps at each stage, along with corresponding [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

In recent years, most research on multimodal salient object detection (SOD) and camouflaged object detection (COD) typically aims to improve performance through complex cross-modal feature fusion and decoding structures. However, this approach leads to an excessively large model parameter scale and often fails to deliver satisfactory detection performance due to structural redundancy. In contrast, the human visual process is able to efficiently perform salient and camouflaged object identification without such complex structures. This contrast raises an important question: Can we draw conceptual inspiration from the human visual process to achieve a simpler modeling strategy, and still realize accurate and efficient object detection? To answer this question, we propose HVPNet, a simple yet general bio-inspired computational architecture. Drawing on the multi-layered information integration of the retina as a conceptual metaphor, we designed a Retinal Integration Module (RIM), which effectively integrates multimodal features through a level-specific multi-stage integration strategy. To fully exploit these features, we further design a cortical decoder (CD) that breaks down the decoding process into low- and high-level visual stages, abstracting the hierarchical processing in the human visual cortex. Benefiting from these designs, HVPNet can readily extend to seven tasks across four modalities. Without bells and whistles, it establishes an excellent accuracy-efficiency trade-off across 22 datasets spanning these seven tasks. Our code is available at https://github.com/jiaweiXu1029/HVPNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HVPNet gives a clean bio-inspired architecture with RIM and cortical decoder that aims for simpler multimodal fusion, but the strength rests entirely on whether the 22-dataset results hold up in the full experiments.

read the letter

The main takeaway is that HVPNet replaces heavy cross-modal fusion with a staged retinal integration module and a low/high-level cortical decoder, then shows it can handle seven tasks across four modalities on 22 datasets while keeping parameter counts down.

What is actually new is the concrete mapping of level-specific multi-stage integration in the RIM and the split decoding stages in the CD. The paper does a reasonable job explaining why this avoids redundancy that typical fusion stacks introduce, and releasing the code lets others verify the implementation directly.

The soft spots are on the empirical side. The abstract states strong accuracy-efficiency numbers without any tables, ablations, or variance numbers visible here, so it is impossible to tell whether the gains trace to the bio-inspired choices or to training details and dataset selection. If the full paper supplies clear ablations and consistent improvements over recent baselines, the claim lands; otherwise the architecture is just another incremental CNN variant. The bio-inspiration stays at the level of useful metaphor rather than a strict derivation, which is fine as long as the results are reproducible.

This paper is for computer-vision researchers who work on salient or camouflaged object detection and want a lighter multimodal baseline to compare against. Readers looking for new architectural patterns in efficient detection would get value from the module descriptions and the multi-task scope.

It deserves a serious referee because it supplies a complete, runnable system with a clear motivation and broad experimental scope, even if the presentation of results may need tightening.

Referee Report

1 major / 0 minor

Summary. The paper proposes HVPNet, a bio-inspired architecture for salient object detection (SOD) and camouflaged object detection (COD). It introduces a Retinal Integration Module (RIM) that performs level-specific multi-stage integration of multimodal features, modeled on retinal processing, and a cortical decoder (CD) that decomposes decoding into low- and high-level stages, modeled on cortical hierarchy. The central claim is that this simpler design avoids structural redundancy of conventional cross-modal fusion, extends readily to seven tasks across four modalities, and delivers an excellent accuracy-efficiency trade-off on 22 datasets without bells and whistles.

Significance. If the quantitative results and ablations hold, the work would demonstrate that a bio-inspired, non-redundant architecture can match or exceed the performance of more complex fusion-based models while remaining lightweight, offering a practical template for general multimodal detection across modalities and tasks.

major comments (1)

[Abstract] Abstract: the central claim of performance gains and an 'excellent accuracy-efficiency trade-off' across 22 datasets is asserted without any tables, ablation studies, statistical tests, or implementation details visible in the manuscript text, rendering it impossible to assess whether the empirical results support the claim or are affected by post-hoc choices.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. The manuscript contains extensive experimental validation supporting the abstract claims; we address the specific concern below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of performance gains and an 'excellent accuracy-efficiency trade-off' across 22 datasets is asserted without any tables, ablation studies, statistical tests, or implementation details visible in the manuscript text, rendering it impossible to assess whether the empirical results support the claim or are affected by post-hoc choices.

Authors: The full manuscript text includes Section 4 (Experiments) with 22 datasets, seven tasks, and four modalities. It reports quantitative tables comparing HVPNet against state-of-the-art methods on all benchmarks, ablation studies isolating RIM and CD contributions, efficiency metrics (parameters, FLOPs, FPS), and implementation details (training protocol, hyperparameters). Consistent gains across diverse datasets provide the empirical basis summarized in the abstract; no post-hoc selection is involved as all reported results follow the same protocol. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claim is an empirical performance result: a bio-inspired architecture (RIM + hierarchical cortical decoder) achieves strong accuracy-efficiency trade-offs when evaluated on 22 external public datasets across seven tasks and four modalities. No equations, fitted parameters, or self-citations are presented that reduce the reported metrics to algebraic identities or inputs introduced within the same paper. The design motivation draws on biological metaphors but does not define the evaluation quantities in terms of themselves, and the quantitative claims rest on measured benchmark numbers rather than internal construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central performance claim rests on the effectiveness of two newly introduced modules whose design choices are justified by a biological metaphor rather than by independent empirical or theoretical evidence outside the reported experiments.

free parameters (1)

Architecture hyperparameters and training schedule
Specific layer counts, channel widths, and optimization settings inside RIM and CD are selected to achieve the reported trade-off on the target datasets.

axioms (1)

domain assumption The human visual system performs multimodal object identification via layered retinal integration followed by hierarchical cortical decoding.
This metaphor is invoked to motivate the level-specific multi-stage integration and the low/high-level decoder split.

invented entities (2)

Retinal Integration Module (RIM) no independent evidence
purpose: Level-specific multi-stage integration of multimodal features
New module introduced by the paper; no prior independent validation cited.
Cortical decoder (CD) no independent evidence
purpose: Hierarchical decoding that separates low- and high-level visual stages
New decoder component introduced by the paper; no prior independent validation cited.

pith-pipeline@v0.9.1-grok · 5803 in / 1478 out tokens · 48887 ms · 2026-07-01T06:07:31.273122+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

100 extracted references · 3 canonical work pages

[1]

Z. Luo, N. Liu, W. Zhao, X. Yang, D. Zhang, D.-P. Fan, F. Khan, J. Han, Vscode: General visual salient and cam- ouflaged object detection with 2d prompt learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 17169–17180

2024
[2]

Lysova, Intersecting perspectives: Video surveillance in urban spaces through surveillance society and security state frameworks, Cities 156 (2025) 105544

T. Lysova, Intersecting perspectives: Video surveillance in urban spaces through surveillance society and security state frameworks, Cities 156 (2025) 105544

2025
[3]

Y . Yu, C. Wang, Q. Fu, R. Kou, F. Huang, B. Yang, T. Yang, M. Gao, Techniques and challenges of image segmentation: A review, Electronics 12 (5) (2023) 1199

2023
[4]

Z. Zou, K. Chen, Z. Shi, Y . Guo, J. Ye, Object detection in 20 years: A survey, Proceedings of the IEEE 111 (3) (2023) 257–276

2023
[5]

Apostolidis, E

E. Apostolidis, E. Adamantidou, A. I. Metsai, V . Mezaris, I. Patras, Video summarization using deep neural networks: A survey, Proceedings of the IEEE 109 (11) (2021) 1838–1863

2021
[6]

C. Yang, L. Zhang, H. Lu, X. Ruan, M.-H. Yang, Saliency detection via graph-based manifold ranking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3166–3173

2013
[7]

Y . Niu, Y . Geng, X. Li, F. Liu, Leveraging stereopsis for saliency analysis, in: 2012 IEEE conference on com- puter vision and pattern recognition, IEEE, 2012, pp. 454–461

2012
[8]

G. Wang, C. Li, Y . Ma, A. Zheng, J. Tang, B. Luo, Rgb-t saliency detection benchmark: Dataset, baselines, anal- ysis and a novel approach, in: Image and graphics tech- nologies and applications: 13th conference on image and graphics technologies and applications, IGTA 2018, Bei- jing, China, April 8–10, 2018, revised selected papers 13, Springer, 2018, p...

2018
[9]

F. Li, T. Kim, A. Humayun, D. Tsai, J. M. Rehg, Video segmentation by tracking many figure-ground segments, in: Proceedings of the IEEE international conference on computer vision, 2013, pp. 2192–2199

2013
[10]

T.-N. Le, T. V . Nguyen, Z. Nie, M.-T. Tran, A. Sugi- moto, Anabranch network for camouflaged object seg- mentation, Computer vision and image understanding 184 (2019) 45–56

2019
[11]

Bideau, E

P. Bideau, E. Learned-Miller, It’s moving! a probabilistic model for causal motion segmentation in moving camera videos, in: Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, Springer, 2016, pp. 433–449

2016
[12]

X. Fang, M. Jiang, J. Zhu, X. Shao, H. Wang, Group- transnet: Group transformer network for rgb-d salient object detection, Neurocomputing 594 (2024) 127865

2024
[13]

K. Wang, Z. Tu, C. Li, C. Zhang, B. Luo, Learning adap- tive fusion bank for multi-modal salient object detection, IEEE Transactions on Circuits and Systems for Video Technology 34 (8) (2024) 7344–7358

2024
[14]

B. Yin, X. Zhang, D.-P. Fan, S. Jiao, M.-M. Cheng, L. Van Gool, Q. Hou, Camoformer: Masked separable attention for camouflaged object detection, IEEE Trans- actions on Pattern Analysis and Machine Intelligence (2024)

2024
[15]

Z. Wu, D. P. Paudel, D.-P. Fan, J. Wang, S. Wang, C. De- monceaux, R. Timofte, L. Van Gool, Source-free depth for object pop-out, in: ICCV , 2023

2023
[16]

H. Wen, K. Song, L. Huang, H. Wang, Y . Yan, Cross- modality salient object detection network with univer- sality and anti-interference, Knowledge-Based Systems 264 (2023) 110322

2023
[17]

H. Gao, Y . Su, F. Wang, H. Li, Heterogeneous fusion and integrity learning network for rgb-d salient object de- tection, ACM Transactions on Multimedia Computing, Communications and Applications 20 (7) (2024) 1–24

2024
[18]

G. Chen, Q. Wang, B. Dong, R. Ma, N. Liu, H. Fu, Y . Xia, Em-trans: Edge-aware multimodal transformer for rgb-d salient object detection, IEEE Transactions on Neural Networks and Learning Systems 36 (2) (2024) 3175–3188

2024
[19]

J. Xu, Q. Zhou, J. Yu, C. Liao, D. Zhu, Semantic- orthogonal multi-modal attention network for rgb-d salient object detection, The Visual Computer (2025) 1– 13

2025
[20]

J. Zhu, X. Qin, A. Elsaddik, Dc-net: Divide-and-conquer for salient object detection, Pattern Recognition 157 (2025) 110903

2025
[21]

F. Sun, P. Ren, B. Yin, F. Wang, H. Li, Catnet: A cascaded and aggregated transformer network for rgb-d salient object detection, IEEE Transactions on Multime- dia 26 (2023) 2249–2262

2023
[22]

X. Hu, F. Sun, J. Sun, F. Wang, H. Li, Cross-modal fu- sion and progressive decoding network for rgb-d salient object detection, International Journal of Computer Vi- sion 132 (8) (2024) 3067–3085. 13

2024
[23]

Gollisch, M

T. Gollisch, M. Meister, Eye smarter than scientists be- lieved: neural computations in circuits of the retina, Neu- ron 65 (2) (2010) 150–164

2010
[24]

D. C. Van Essen, C. H. Anderson, D. J. Felleman, Infor- mation processing in the primate visual system: an in- tegrated systems perspective, Science 255 (5043) (1992) 419–423

1992
[25]

Zhang, Z.-F

Y .-J. Zhang, Z.-F. Yu, J. K. Liu, T.-J. Huang, Neural decoding of visual information across different neural recording modalities and approaches, Machine Intelli- gence Research 19 (5) (2022) 350–365

2022
[26]

Z. Shao, L. Ma, B. Li, D. M. Beck, Leveraging the hu- man ventral visual stream to improve neural network ro- bustness, arXiv preprint arXiv:2405.02564 (2024)

work page arXiv 2024
[27]

Z. Wu, L. Su, Q. Huang, Cascaded partial decoder for fast and accurate salient object detection, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

2019
[28]

J.-J. Liu, Q. Hou, Z.-A. Liu, M.-M. Cheng, Poolnet+: Exploring the potential of pooling for salient object de- tection, IEEE Transactions on Pattern Analysis and Ma- chine Intelligence 45 (1) (2023) 887–904

2023
[29]

X. Zhou, K. Shen, Z. Liu, Admnet: Attention-guided densely multi-scale network for lightweight salient ob- ject detection, IEEE Transactions on Multimedia 26 (2024) 10828–10841

2024
[30]

B.-W. Yin, Z. Lin, Exploring salient object detection with adder neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 39, 2025, pp. 9490–9498

2025
[31]

Zhuge, D.-P

M. Zhuge, D.-P. Fan, N. Liu, D. Zhang, D. Xu, L. Shao, Salient object detection via integrity learning, IEEE Transactions on Pattern Analysis and Machine Intelli- gence 45 (3) (2023) 3738–3752

2023
[32]

Y . K. Yun, W. Lin, Towards a complete and detail- preserved salient object detection, IEEE Transactions on Multimedia 26 (2023) 4667–4680

2023
[33]

Y . Wang, R. Wang, X. Fan, T. Wang, X. He, Pixels, re- gions, and objects: Multiple enhancement for salient ob- ject detection, in: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2023, pp. 10031–10040

2023
[34]

N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4722–4732

2021
[35]

J. Xu, Q. Zhou, D. Zhu, Y . Chen, Y . Yi, X. Zhao, Tp- seg: Task-prototype framework for unified medical le- sion segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2026, pp. 5452–5462

2026
[36]

Q. Zhou, J. Xu, Y . Chen, D. Zhu, Y . Yi, X. Zhao, Dif- ferseg: Towards diverse multimodal binary segmentation via differential perception and frequency guidance, IEEE Transactions on Circuits and Systems for Video Technol- ogy (2026)

2026
[37]

Zhong, J

M. Zhong, J. Sun, P. Ren, F. Wang, F. Sun, Magnet: multi-scale awareness and global fusion network for rgb- d salient object detection, Knowledge-Based Systems 299 (2024) 112126

2024
[38]

H. Chen, F. Shen, D. Ding, Y . Deng, C. Li, Disentangled cross-modal transformer for rgb-d salient object detec- tion and beyond, IEEE Transactions on Image Process- ing (2024)

2024
[39]

N. Liu, Z. Luo, N. Zhang, J. Han, Vst++: Efficient and stronger visual saliency transformer, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

2024
[40]

F. Sun, W. Zhou, W. Yan, Y . Zhang, Hfenet: Hybrid fea- ture encoder network for detecting salient objects in rgb- thermal images, Digital Signal Processing 148 (2024) 104439

2024
[41]

S. Duan, X. Yang, N. Wang, X. Gao, Lightweight rgb-d salient object detection from a speed-accuracy tradeoffperspective, IEEE Transactions on Image Pro- cessingEarly Access (2025)

2025
[42]

B. Xu, Q. Jiang, X. Zhao, C. Lu, H. Liang, R. Liang, Multidimensional exploration of segment any- thing model for weakly supervised video salient object detection, IEEE Transactions on circuits and systems for video technology (2024)

2024
[43]

M. Lee, S. Cho, S. Lee, C. Park, S. Lee, Unsupervised video object segmentation via prototype memory net- work, in: Proceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, 2023, pp. 5924–5934

2023
[44]

N. Liu, K. Nan, W. Zhao, X. Yao, J. Han, Learning complementary spatial–temporal transformer for video salient object detection, IEEE Transactions on Neural Networks and Learning Systems 35 (8) (2023) 10663– 10673

2023
[45]

Y . Piao, C. Lu, M. Zhang, H. Lu, Semi-supervised video salient object detection based on uncertainty- guided pseudo labels, Advances in Neural Information Processing Systems 35 (2022) 5614–5627

2022
[46]

Y . Su, J. Deng, R. Sun, G. Lin, Q. Wu, A uni- fied transformer framework for group-based segmenta- tion: Co-segmentation, co-saliency detection and video salient object detection, IEEE Transactions on Multime- dia (2023). 14

2023
[47]

Q. Jia, S. Yao, Y . Liu, X. Fan, R. Liu, Z. Luo, Segment, magnify and reiterate: Detecting camouflaged objects the hard way, in: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2022, pp. 4713–4722

2022
[48]

Y . Sun, C. Xu, J. Yang, H. Xuan, L. Luo, Frequency- spatial entanglement learning for camouflaged object de- tection (2024) 343–360

2024
[49]

Y . Liu, C. Li, X. Dong, L. Li, D. Zhang, S. Xu, J. Han, Seamless detection: Unifying salient object detection and camouflaged object detection, Expert Systems with Applications 274 (2025) 126912

2025
[50]

Z. Yu, X. Zhang, L. Zhao, Y . Bin, G. Xiao, Explor- ing deeper! segment anything model with depth percep- tion for camouflaged object detection, in: Proceedings of the 32nd ACM international conference on multimedia, 2024, pp. 4322–4330

2024
[51]

R. Cong, Q. Lin, C. Zhang, C. Li, X. Cao, Q. Huang, Y . Zhao, Cir-net: Cross-modality interaction and refine- ment for rgb-d salient object detection, IEEE Transac- tions on Image Processing 31 (2022) 6800–6815

2022
[52]

Y . Lv, J. Zhang, Y . Dai, A. Li, B. Liu, N. Barnes, D.- P. Fan, Simultaneously localize, segment and rank the camouflaged objects, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11591–11601

2021
[53]

Y . Liu, S. Chen, H. Tang, S. Wang, Lightweight hybrid attention rgb-d networks for accurate camouflaged object detection, The Visual Computer (2025) 1–17

2025
[54]

H. Bi, Y . Tong, J. Zhang, C. Zhang, J. Tong, W. Jin, Depth alignment interaction network for camouflaged object detection, Multimedia Systems 30 (1) (2024) 51

2024
[55]

Huang, H

Z. Huang, H. Dai, T.-Z. Xiang, S. Wang, H.-X. Chen, J. Qin, H. Xiong, Feature shrinkage pyramid for camou- flaged object detection with transformers, in: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5557–5566

2023
[56]

C. He, K. Li, Y . Zhang, L. Tang, Y . Zhang, Z. Guo, X. Li, Camouflaged object detection with feature decom- position and edge reconstruction, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22046–22055

2023
[57]

S. Yao, H. Sun, T.-Z. Xiang, X. Wang, X. Cao, Hier- archical graph interaction transformer with dynamic to- ken clustering for camouflaged object detection, arXiv preprint arXiv:2408.15020 (2024)

work page arXiv 2024
[58]

L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans- actions on Pattern Analysis and Machine Intelligence 20 (11) (1998) 1254–1259

1998
[59]

Simonyan, A

K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in: Advances in Neural Information Processing Systems, 2014, pp. 568–576

2014
[60]

W. Wang, J. Shen, X. Dong, A. Borji, Salient object de- tection driven by fixation prediction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1711–1720

2018
[61]

W. Zhai, Y . Cao, J. Zhang, Z.-J. Zha, Exploring figure- ground assignment mechanism in perceptual organiza- tion, Advances in Neural Information Processing Sys- tems 35 (2022) 17030–17042

2022
[62]

Yan, T.-N

J. Yan, T.-N. Le, K.-D. Nguyen, M.-T. Tran, T.-T. Do, T. V . Nguyen, Mirrornet: Bio-inspired camouflaged ob- ject segmentation, IEEE access 9 (2021) 43290–43300

2021
[63]

W. Zhai, Y . Cao, H. Xie, Z.-J. Zha, Deep texton- coherence network for camouflaged object detection, IEEE Transactions on Multimedia 25 (2022) 5155–5165

2022
[64]

L. Xu, X. You, F. Jia, K. Liu, Bicod: a camouflaged object detection method directed by cognitive attention, IEEE Sensors Journal 24 (4) (2023) 4711–4721

2023
[65]

Z. Chen, J. Zhang, D. Tao, Recurrent glimpse-based de- coder for detection with transformer, in: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2022, pp. 5260–5269

2022
[66]

F. Yang, Q. Zhai, X. Li, R. Huang, A. Luo, H. Cheng, D.-P. Fan, Uncertainty-guided transformer reasoning for camouflaged object detection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4146–4155

2021
[67]

Zhang, M

Z. Zhang, M. Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels, Advances in neural information processing systems 31 (2018)

2018
[68]

Rezatofighi, N

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized intersection over union: A met- ric and a loss for bounding box regression, in: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

2019
[69]

L. Wang, H. Lu, Y . Wang, M. Feng, D. Wang, B. Yin, X. Ruan, Learning to detect salient objects with image- level supervision, in: Proceedings of the IEEE confer- ence on computer vision and pattern recognition, 2017, pp. 136–145

2017
[70]

G. Li, Y . Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455–5463. 15

2015
[71]

Q. Yan, L. Xu, J. Shi, J. Jia, Hierarchical saliency detec- tion, in: Proceedings of the IEEE conference on com- puter vision and pattern recognition, 2013, pp. 1155– 1162

2013
[72]

W. Liu, X. Shen, C.-M. Pun, X. Cun, Explicit visual prompting for low-level structure segmentations, in: Pro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2023, pp. 19434–19445

2023
[73]

C. Cen, F. Li, Z. Li, Y . Wang, Towards salient object detection via parallel dual-decoder network, Engineer- ing Applications of Artificial Intelligence 139 (2025) 109638

2025
[74]

H. Peng, B. Li, W. Xiong, W. Hu, R. Ji, Rgbd salient ob- ject detection: A benchmark and algorithms, in: Com- puter Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceed- ings, Part III 13, Springer, 2014, pp. 92–109

2014
[75]

R. Ju, L. Ge, W. Geng, T. Ren, G. Wu, Depth saliency based on anisotropic center-surround difference, in: 2014 IEEE international conference on image process- ing (ICIP), IEEE, 2014, pp. 1115–1119

2014
[76]

Y . Piao, W. Ji, J. Li, M. Zhang, H. Lu, Depth-induced multi-scale recurrent attention network for saliency de- tection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7254–7263

2019
[77]

W. Zhou, Y . Zhu, J. Lei, R. Yang, L. Yu, Lsnet: Lightweight spatial boosting network for detecting salient objects in rgb-thermal images, IEEE Transactions on Image Processing 32 (2023) 1329–1340

2023
[78]

Z. Zeng, H. Liu, F. Chen, X. Tan, Airsod: A lightweight network for rgb-d salient object detection, IEEE Trans- actions on Circuits and Systems for Video Technology 34 (3) (2024) 1656–1669

2024
[79]

Y . Zhan, Z. Zeng, H. Liu, X. Tan, Y . Tian, Mambasod: Dual mamba-driven cross-modal fusion network for rgb- d salient object detection, Neurocomputing 631 (2025) 129718

2025
[80]

Z. Tu, T. Xia, C. Li, X. Wang, Y . Ma, J. Tang, Rgb-t im- age saliency detection via collaborative graph learning, IEEE Transactions on Multimedia 22 (1) (2019) 160– 173

2019

Showing first 80 references.

[1] [1]

Z. Luo, N. Liu, W. Zhao, X. Yang, D. Zhang, D.-P. Fan, F. Khan, J. Han, Vscode: General visual salient and cam- ouflaged object detection with 2d prompt learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 17169–17180

2024

[2] [2]

Lysova, Intersecting perspectives: Video surveillance in urban spaces through surveillance society and security state frameworks, Cities 156 (2025) 105544

T. Lysova, Intersecting perspectives: Video surveillance in urban spaces through surveillance society and security state frameworks, Cities 156 (2025) 105544

2025

[3] [3]

Y . Yu, C. Wang, Q. Fu, R. Kou, F. Huang, B. Yang, T. Yang, M. Gao, Techniques and challenges of image segmentation: A review, Electronics 12 (5) (2023) 1199

2023

[4] [4]

Z. Zou, K. Chen, Z. Shi, Y . Guo, J. Ye, Object detection in 20 years: A survey, Proceedings of the IEEE 111 (3) (2023) 257–276

2023

[5] [5]

Apostolidis, E

E. Apostolidis, E. Adamantidou, A. I. Metsai, V . Mezaris, I. Patras, Video summarization using deep neural networks: A survey, Proceedings of the IEEE 109 (11) (2021) 1838–1863

2021

[6] [6]

C. Yang, L. Zhang, H. Lu, X. Ruan, M.-H. Yang, Saliency detection via graph-based manifold ranking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3166–3173

2013

[7] [7]

Y . Niu, Y . Geng, X. Li, F. Liu, Leveraging stereopsis for saliency analysis, in: 2012 IEEE conference on com- puter vision and pattern recognition, IEEE, 2012, pp. 454–461

2012

[8] [8]

G. Wang, C. Li, Y . Ma, A. Zheng, J. Tang, B. Luo, Rgb-t saliency detection benchmark: Dataset, baselines, anal- ysis and a novel approach, in: Image and graphics tech- nologies and applications: 13th conference on image and graphics technologies and applications, IGTA 2018, Bei- jing, China, April 8–10, 2018, revised selected papers 13, Springer, 2018, p...

2018

[9] [9]

F. Li, T. Kim, A. Humayun, D. Tsai, J. M. Rehg, Video segmentation by tracking many figure-ground segments, in: Proceedings of the IEEE international conference on computer vision, 2013, pp. 2192–2199

2013

[10] [10]

T.-N. Le, T. V . Nguyen, Z. Nie, M.-T. Tran, A. Sugi- moto, Anabranch network for camouflaged object seg- mentation, Computer vision and image understanding 184 (2019) 45–56

2019

[11] [11]

Bideau, E

P. Bideau, E. Learned-Miller, It’s moving! a probabilistic model for causal motion segmentation in moving camera videos, in: Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, Springer, 2016, pp. 433–449

2016

[12] [12]

X. Fang, M. Jiang, J. Zhu, X. Shao, H. Wang, Group- transnet: Group transformer network for rgb-d salient object detection, Neurocomputing 594 (2024) 127865

2024

[13] [13]

K. Wang, Z. Tu, C. Li, C. Zhang, B. Luo, Learning adap- tive fusion bank for multi-modal salient object detection, IEEE Transactions on Circuits and Systems for Video Technology 34 (8) (2024) 7344–7358

2024

[14] [14]

B. Yin, X. Zhang, D.-P. Fan, S. Jiao, M.-M. Cheng, L. Van Gool, Q. Hou, Camoformer: Masked separable attention for camouflaged object detection, IEEE Trans- actions on Pattern Analysis and Machine Intelligence (2024)

2024

[15] [15]

Z. Wu, D. P. Paudel, D.-P. Fan, J. Wang, S. Wang, C. De- monceaux, R. Timofte, L. Van Gool, Source-free depth for object pop-out, in: ICCV , 2023

2023

[16] [16]

H. Wen, K. Song, L. Huang, H. Wang, Y . Yan, Cross- modality salient object detection network with univer- sality and anti-interference, Knowledge-Based Systems 264 (2023) 110322

2023

[17] [17]

H. Gao, Y . Su, F. Wang, H. Li, Heterogeneous fusion and integrity learning network for rgb-d salient object de- tection, ACM Transactions on Multimedia Computing, Communications and Applications 20 (7) (2024) 1–24

2024

[18] [18]

G. Chen, Q. Wang, B. Dong, R. Ma, N. Liu, H. Fu, Y . Xia, Em-trans: Edge-aware multimodal transformer for rgb-d salient object detection, IEEE Transactions on Neural Networks and Learning Systems 36 (2) (2024) 3175–3188

2024

[19] [19]

J. Xu, Q. Zhou, J. Yu, C. Liao, D. Zhu, Semantic- orthogonal multi-modal attention network for rgb-d salient object detection, The Visual Computer (2025) 1– 13

2025

[20] [20]

J. Zhu, X. Qin, A. Elsaddik, Dc-net: Divide-and-conquer for salient object detection, Pattern Recognition 157 (2025) 110903

2025

[21] [21]

F. Sun, P. Ren, B. Yin, F. Wang, H. Li, Catnet: A cascaded and aggregated transformer network for rgb-d salient object detection, IEEE Transactions on Multime- dia 26 (2023) 2249–2262

2023

[22] [22]

X. Hu, F. Sun, J. Sun, F. Wang, H. Li, Cross-modal fu- sion and progressive decoding network for rgb-d salient object detection, International Journal of Computer Vi- sion 132 (8) (2024) 3067–3085. 13

2024

[23] [23]

Gollisch, M

T. Gollisch, M. Meister, Eye smarter than scientists be- lieved: neural computations in circuits of the retina, Neu- ron 65 (2) (2010) 150–164

2010

[24] [24]

D. C. Van Essen, C. H. Anderson, D. J. Felleman, Infor- mation processing in the primate visual system: an in- tegrated systems perspective, Science 255 (5043) (1992) 419–423

1992

[25] [25]

Zhang, Z.-F

Y .-J. Zhang, Z.-F. Yu, J. K. Liu, T.-J. Huang, Neural decoding of visual information across different neural recording modalities and approaches, Machine Intelli- gence Research 19 (5) (2022) 350–365

2022

[26] [26]

Z. Shao, L. Ma, B. Li, D. M. Beck, Leveraging the hu- man ventral visual stream to improve neural network ro- bustness, arXiv preprint arXiv:2405.02564 (2024)

work page arXiv 2024

[27] [27]

Z. Wu, L. Su, Q. Huang, Cascaded partial decoder for fast and accurate salient object detection, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

2019

[28] [28]

J.-J. Liu, Q. Hou, Z.-A. Liu, M.-M. Cheng, Poolnet+: Exploring the potential of pooling for salient object de- tection, IEEE Transactions on Pattern Analysis and Ma- chine Intelligence 45 (1) (2023) 887–904

2023

[29] [29]

X. Zhou, K. Shen, Z. Liu, Admnet: Attention-guided densely multi-scale network for lightweight salient ob- ject detection, IEEE Transactions on Multimedia 26 (2024) 10828–10841

2024

[30] [30]

B.-W. Yin, Z. Lin, Exploring salient object detection with adder neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 39, 2025, pp. 9490–9498

2025

[31] [31]

Zhuge, D.-P

M. Zhuge, D.-P. Fan, N. Liu, D. Zhang, D. Xu, L. Shao, Salient object detection via integrity learning, IEEE Transactions on Pattern Analysis and Machine Intelli- gence 45 (3) (2023) 3738–3752

2023

[32] [32]

Y . K. Yun, W. Lin, Towards a complete and detail- preserved salient object detection, IEEE Transactions on Multimedia 26 (2023) 4667–4680

2023

[33] [33]

Y . Wang, R. Wang, X. Fan, T. Wang, X. He, Pixels, re- gions, and objects: Multiple enhancement for salient ob- ject detection, in: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2023, pp. 10031–10040

2023

[34] [34]

N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4722–4732

2021

[35] [35]

J. Xu, Q. Zhou, D. Zhu, Y . Chen, Y . Yi, X. Zhao, Tp- seg: Task-prototype framework for unified medical le- sion segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2026, pp. 5452–5462

2026

[36] [36]

Q. Zhou, J. Xu, Y . Chen, D. Zhu, Y . Yi, X. Zhao, Dif- ferseg: Towards diverse multimodal binary segmentation via differential perception and frequency guidance, IEEE Transactions on Circuits and Systems for Video Technol- ogy (2026)

2026

[37] [37]

Zhong, J

M. Zhong, J. Sun, P. Ren, F. Wang, F. Sun, Magnet: multi-scale awareness and global fusion network for rgb- d salient object detection, Knowledge-Based Systems 299 (2024) 112126

2024

[38] [38]

H. Chen, F. Shen, D. Ding, Y . Deng, C. Li, Disentangled cross-modal transformer for rgb-d salient object detec- tion and beyond, IEEE Transactions on Image Process- ing (2024)

2024

[39] [39]

N. Liu, Z. Luo, N. Zhang, J. Han, Vst++: Efficient and stronger visual saliency transformer, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

2024

[40] [40]

F. Sun, W. Zhou, W. Yan, Y . Zhang, Hfenet: Hybrid fea- ture encoder network for detecting salient objects in rgb- thermal images, Digital Signal Processing 148 (2024) 104439

2024

[41] [41]

S. Duan, X. Yang, N. Wang, X. Gao, Lightweight rgb-d salient object detection from a speed-accuracy tradeoffperspective, IEEE Transactions on Image Pro- cessingEarly Access (2025)

2025

[42] [42]

B. Xu, Q. Jiang, X. Zhao, C. Lu, H. Liang, R. Liang, Multidimensional exploration of segment any- thing model for weakly supervised video salient object detection, IEEE Transactions on circuits and systems for video technology (2024)

2024

[43] [43]

M. Lee, S. Cho, S. Lee, C. Park, S. Lee, Unsupervised video object segmentation via prototype memory net- work, in: Proceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, 2023, pp. 5924–5934

2023

[44] [44]

N. Liu, K. Nan, W. Zhao, X. Yao, J. Han, Learning complementary spatial–temporal transformer for video salient object detection, IEEE Transactions on Neural Networks and Learning Systems 35 (8) (2023) 10663– 10673

2023

[45] [45]

Y . Piao, C. Lu, M. Zhang, H. Lu, Semi-supervised video salient object detection based on uncertainty- guided pseudo labels, Advances in Neural Information Processing Systems 35 (2022) 5614–5627

2022

[46] [46]

Y . Su, J. Deng, R. Sun, G. Lin, Q. Wu, A uni- fied transformer framework for group-based segmenta- tion: Co-segmentation, co-saliency detection and video salient object detection, IEEE Transactions on Multime- dia (2023). 14

2023

[47] [47]

Q. Jia, S. Yao, Y . Liu, X. Fan, R. Liu, Z. Luo, Segment, magnify and reiterate: Detecting camouflaged objects the hard way, in: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2022, pp. 4713–4722

2022

[48] [48]

Y . Sun, C. Xu, J. Yang, H. Xuan, L. Luo, Frequency- spatial entanglement learning for camouflaged object de- tection (2024) 343–360

2024

[49] [49]

Y . Liu, C. Li, X. Dong, L. Li, D. Zhang, S. Xu, J. Han, Seamless detection: Unifying salient object detection and camouflaged object detection, Expert Systems with Applications 274 (2025) 126912

2025

[50] [50]

Z. Yu, X. Zhang, L. Zhao, Y . Bin, G. Xiao, Explor- ing deeper! segment anything model with depth percep- tion for camouflaged object detection, in: Proceedings of the 32nd ACM international conference on multimedia, 2024, pp. 4322–4330

2024

[51] [51]

R. Cong, Q. Lin, C. Zhang, C. Li, X. Cao, Q. Huang, Y . Zhao, Cir-net: Cross-modality interaction and refine- ment for rgb-d salient object detection, IEEE Transac- tions on Image Processing 31 (2022) 6800–6815

2022

[52] [52]

Y . Lv, J. Zhang, Y . Dai, A. Li, B. Liu, N. Barnes, D.- P. Fan, Simultaneously localize, segment and rank the camouflaged objects, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11591–11601

2021

[53] [53]

Y . Liu, S. Chen, H. Tang, S. Wang, Lightweight hybrid attention rgb-d networks for accurate camouflaged object detection, The Visual Computer (2025) 1–17

2025

[54] [54]

H. Bi, Y . Tong, J. Zhang, C. Zhang, J. Tong, W. Jin, Depth alignment interaction network for camouflaged object detection, Multimedia Systems 30 (1) (2024) 51

2024

[55] [55]

Huang, H

Z. Huang, H. Dai, T.-Z. Xiang, S. Wang, H.-X. Chen, J. Qin, H. Xiong, Feature shrinkage pyramid for camou- flaged object detection with transformers, in: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5557–5566

2023

[56] [56]

C. He, K. Li, Y . Zhang, L. Tang, Y . Zhang, Z. Guo, X. Li, Camouflaged object detection with feature decom- position and edge reconstruction, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22046–22055

2023

[57] [57]

S. Yao, H. Sun, T.-Z. Xiang, X. Wang, X. Cao, Hier- archical graph interaction transformer with dynamic to- ken clustering for camouflaged object detection, arXiv preprint arXiv:2408.15020 (2024)

work page arXiv 2024

[58] [58]

L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans- actions on Pattern Analysis and Machine Intelligence 20 (11) (1998) 1254–1259

1998

[59] [59]

Simonyan, A

K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in: Advances in Neural Information Processing Systems, 2014, pp. 568–576

2014

[60] [60]

W. Wang, J. Shen, X. Dong, A. Borji, Salient object de- tection driven by fixation prediction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1711–1720

2018

[61] [61]

W. Zhai, Y . Cao, J. Zhang, Z.-J. Zha, Exploring figure- ground assignment mechanism in perceptual organiza- tion, Advances in Neural Information Processing Sys- tems 35 (2022) 17030–17042

2022

[62] [62]

Yan, T.-N

J. Yan, T.-N. Le, K.-D. Nguyen, M.-T. Tran, T.-T. Do, T. V . Nguyen, Mirrornet: Bio-inspired camouflaged ob- ject segmentation, IEEE access 9 (2021) 43290–43300

2021

[63] [63]

W. Zhai, Y . Cao, H. Xie, Z.-J. Zha, Deep texton- coherence network for camouflaged object detection, IEEE Transactions on Multimedia 25 (2022) 5155–5165

2022

[64] [64]

L. Xu, X. You, F. Jia, K. Liu, Bicod: a camouflaged object detection method directed by cognitive attention, IEEE Sensors Journal 24 (4) (2023) 4711–4721

2023

[65] [65]

Z. Chen, J. Zhang, D. Tao, Recurrent glimpse-based de- coder for detection with transformer, in: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2022, pp. 5260–5269

2022

[66] [66]

F. Yang, Q. Zhai, X. Li, R. Huang, A. Luo, H. Cheng, D.-P. Fan, Uncertainty-guided transformer reasoning for camouflaged object detection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4146–4155

2021

[67] [67]

Zhang, M

Z. Zhang, M. Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels, Advances in neural information processing systems 31 (2018)

2018

[68] [68]

Rezatofighi, N

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized intersection over union: A met- ric and a loss for bounding box regression, in: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

2019

[69] [69]

L. Wang, H. Lu, Y . Wang, M. Feng, D. Wang, B. Yin, X. Ruan, Learning to detect salient objects with image- level supervision, in: Proceedings of the IEEE confer- ence on computer vision and pattern recognition, 2017, pp. 136–145

2017

[70] [70]

G. Li, Y . Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455–5463. 15

2015

[71] [71]

Q. Yan, L. Xu, J. Shi, J. Jia, Hierarchical saliency detec- tion, in: Proceedings of the IEEE conference on com- puter vision and pattern recognition, 2013, pp. 1155– 1162

2013

[72] [72]

W. Liu, X. Shen, C.-M. Pun, X. Cun, Explicit visual prompting for low-level structure segmentations, in: Pro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2023, pp. 19434–19445

2023

[73] [73]

C. Cen, F. Li, Z. Li, Y . Wang, Towards salient object detection via parallel dual-decoder network, Engineer- ing Applications of Artificial Intelligence 139 (2025) 109638

2025

[74] [74]

H. Peng, B. Li, W. Xiong, W. Hu, R. Ji, Rgbd salient ob- ject detection: A benchmark and algorithms, in: Com- puter Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceed- ings, Part III 13, Springer, 2014, pp. 92–109

2014

[75] [75]

R. Ju, L. Ge, W. Geng, T. Ren, G. Wu, Depth saliency based on anisotropic center-surround difference, in: 2014 IEEE international conference on image process- ing (ICIP), IEEE, 2014, pp. 1115–1119

2014

[76] [76]

Y . Piao, W. Ji, J. Li, M. Zhang, H. Lu, Depth-induced multi-scale recurrent attention network for saliency de- tection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7254–7263

2019

[77] [77]

W. Zhou, Y . Zhu, J. Lei, R. Yang, L. Yu, Lsnet: Lightweight spatial boosting network for detecting salient objects in rgb-thermal images, IEEE Transactions on Image Processing 32 (2023) 1329–1340

2023

[78] [78]

Z. Zeng, H. Liu, F. Chen, X. Tan, Airsod: A lightweight network for rgb-d salient object detection, IEEE Trans- actions on Circuits and Systems for Video Technology 34 (3) (2024) 1656–1669

2024

[79] [79]

Y . Zhan, Z. Zeng, H. Liu, X. Tan, Y . Tian, Mambasod: Dual mamba-driven cross-modal fusion network for rgb- d salient object detection, Neurocomputing 631 (2025) 129718

2025

[80] [80]

Z. Tu, T. Xia, C. Li, X. Wang, Y . Ma, J. Tang, Rgb-t im- age saliency detection via collaborative graph learning, IEEE Transactions on Multimedia 22 (1) (2019) 160– 173

2019