pith. sign in

arxiv: 2606.31496 · v1 · pith:5PYVFK44new · submitted 2026-06-30 · 💻 cs.CV

HVPNet: A Bio-Inspired Network for General Salient and Camouflaged Object Detection

Pith reviewed 2026-07-01 06:07 UTC · model grok-4.3

classification 💻 cs.CV
keywords salient object detectioncamouflaged object detectionbio-inspired networkmultimodal fusionretinal integrationcortical decodercomputer visionobject detection
0
0 comments X

The pith

A bio-inspired network modeled on retinal integration and cortical decoding detects salient and camouflaged objects accurately across modalities with simpler structure than complex fusion methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that drawing from the human visual system's multi-layered retinal processing and hierarchical cortical decoding allows a simpler architecture to handle multimodal salient and camouflaged object detection without the parameter bloat of typical cross-modal fusion. Current methods often add redundant structures that enlarge models and sometimes hurt performance. HVPNet tests this by introducing a level-specific integration module and a two-stage decoder that together support seven tasks over four modalities. The result is presented as an accuracy-efficiency balance on 22 datasets. If correct, this indicates that bio-mimetic staging can replace elaborate fusion designs for these detection problems.

Core claim

HVPNet is built around a Retinal Integration Module that fuses multimodal features via level-specific multi-stage integration and a cortical decoder that splits decoding into low- and high-level stages. This pair of components lets the single architecture extend directly to seven tasks across four modalities and deliver competitive accuracy with lower complexity on 22 datasets without extra fusion modules or task-specific tuning.

What carries the argument

Retinal Integration Module (RIM) that applies level-specific multi-stage integration to multimodal features, paired with a cortical decoder (CD) that separates low- and high-level visual processing stages.

If this is right

  • The single architecture applies unchanged to seven distinct detection tasks spanning four input modalities.
  • Accuracy-efficiency trade-offs hold across all 22 evaluated datasets for both salient and camouflaged object detection.
  • Structural redundancy is reduced by replacing explicit cross-modal fusion blocks with staged retinal-style integration.
  • No task-specific redesign or additional modules are required to reach the reported performance levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The staged integration pattern could be tested on other multimodal vision problems such as semantic segmentation or instance segmentation.
  • Efficiency gains may prove especially useful in embedded or mobile settings where parameter count directly limits deployment.
  • Further work might check whether the low/high-level split in the decoder aligns with measurable differences in feature complexity at those stages.

Load-bearing premise

That modeling retinal multi-stage integration and cortical hierarchy produces simpler yet equally or more accurate detection by avoiding the redundancy of conventional cross-modal fusion.

What would settle it

Direct comparison on one of the 22 multimodal datasets where a standard complex fusion model records both higher detection accuracy and lower runtime or parameter count than HVPNet.

Figures

Figures reproduced from arXiv: 2606.31496 by Jiacong Yu, Jiawei Xu, Qiangqiang Zhou, Yanjiao Shi, Yugen Yi, Zhouping Li.

Figure 1
Figure 1. Figure 1: Comparison of parameter counts and [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of the proposed HVPNet for general SOD and COD tasks. It consists of three stages: feature extraction, fusion, and cortical decoding [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the retinal integration module (RIM). We employ three distinct stages to address the specificities of features at di [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the cortical decoder (CD) module. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: PR curves comparison of different models on eight RGB and RGB-D SOD datasets [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison of our model with state-of-the-art methods, including [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of failure cases. (GT: ground truth) [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualized feature maps at each stage, along with corresponding [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

In recent years, most research on multimodal salient object detection (SOD) and camouflaged object detection (COD) typically aims to improve performance through complex cross-modal feature fusion and decoding structures. However, this approach leads to an excessively large model parameter scale and often fails to deliver satisfactory detection performance due to structural redundancy. In contrast, the human visual process is able to efficiently perform salient and camouflaged object identification without such complex structures. This contrast raises an important question: Can we draw conceptual inspiration from the human visual process to achieve a simpler modeling strategy, and still realize accurate and efficient object detection? To answer this question, we propose HVPNet, a simple yet general bio-inspired computational architecture. Drawing on the multi-layered information integration of the retina as a conceptual metaphor, we designed a Retinal Integration Module (RIM), which effectively integrates multimodal features through a level-specific multi-stage integration strategy. To fully exploit these features, we further design a cortical decoder (CD) that breaks down the decoding process into low- and high-level visual stages, abstracting the hierarchical processing in the human visual cortex. Benefiting from these designs, HVPNet can readily extend to seven tasks across four modalities. Without bells and whistles, it establishes an excellent accuracy-efficiency trade-off across 22 datasets spanning these seven tasks. Our code is available at https://github.com/jiaweiXu1029/HVPNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes HVPNet, a bio-inspired architecture for salient object detection (SOD) and camouflaged object detection (COD). It introduces a Retinal Integration Module (RIM) that performs level-specific multi-stage integration of multimodal features, modeled on retinal processing, and a cortical decoder (CD) that decomposes decoding into low- and high-level stages, modeled on cortical hierarchy. The central claim is that this simpler design avoids structural redundancy of conventional cross-modal fusion, extends readily to seven tasks across four modalities, and delivers an excellent accuracy-efficiency trade-off on 22 datasets without bells and whistles.

Significance. If the quantitative results and ablations hold, the work would demonstrate that a bio-inspired, non-redundant architecture can match or exceed the performance of more complex fusion-based models while remaining lightweight, offering a practical template for general multimodal detection across modalities and tasks.

major comments (1)
  1. [Abstract] Abstract: the central claim of performance gains and an 'excellent accuracy-efficiency trade-off' across 22 datasets is asserted without any tables, ablation studies, statistical tests, or implementation details visible in the manuscript text, rendering it impossible to assess whether the empirical results support the claim or are affected by post-hoc choices.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. The manuscript contains extensive experimental validation supporting the abstract claims; we address the specific concern below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of performance gains and an 'excellent accuracy-efficiency trade-off' across 22 datasets is asserted without any tables, ablation studies, statistical tests, or implementation details visible in the manuscript text, rendering it impossible to assess whether the empirical results support the claim or are affected by post-hoc choices.

    Authors: The full manuscript text includes Section 4 (Experiments) with 22 datasets, seven tasks, and four modalities. It reports quantitative tables comparing HVPNet against state-of-the-art methods on all benchmarks, ablation studies isolating RIM and CD contributions, efficiency metrics (parameters, FLOPs, FPS), and implementation details (training protocol, hyperparameters). Consistent gains across diverse datasets provide the empirical basis summarized in the abstract; no post-hoc selection is involved as all reported results follow the same protocol. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claim is an empirical performance result: a bio-inspired architecture (RIM + hierarchical cortical decoder) achieves strong accuracy-efficiency trade-offs when evaluated on 22 external public datasets across seven tasks and four modalities. No equations, fitted parameters, or self-citations are presented that reduce the reported metrics to algebraic identities or inputs introduced within the same paper. The design motivation draws on biological metaphors but does not define the evaluation quantities in terms of themselves, and the quantitative claims rest on measured benchmark numbers rather than internal construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central performance claim rests on the effectiveness of two newly introduced modules whose design choices are justified by a biological metaphor rather than by independent empirical or theoretical evidence outside the reported experiments.

free parameters (1)
  • Architecture hyperparameters and training schedule
    Specific layer counts, channel widths, and optimization settings inside RIM and CD are selected to achieve the reported trade-off on the target datasets.
axioms (1)
  • domain assumption The human visual system performs multimodal object identification via layered retinal integration followed by hierarchical cortical decoding.
    This metaphor is invoked to motivate the level-specific multi-stage integration and the low/high-level decoder split.
invented entities (2)
  • Retinal Integration Module (RIM) no independent evidence
    purpose: Level-specific multi-stage integration of multimodal features
    New module introduced by the paper; no prior independent validation cited.
  • Cortical decoder (CD) no independent evidence
    purpose: Hierarchical decoding that separates low- and high-level visual stages
    New decoder component introduced by the paper; no prior independent validation cited.

pith-pipeline@v0.9.1-grok · 5803 in / 1478 out tokens · 48887 ms · 2026-07-01T06:07:31.273122+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

100 extracted references · 3 canonical work pages

  1. [1]

    Z. Luo, N. Liu, W. Zhao, X. Yang, D. Zhang, D.-P. Fan, F. Khan, J. Han, Vscode: General visual salient and cam- ouflaged object detection with 2d prompt learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 17169–17180

  2. [2]

    Lysova, Intersecting perspectives: Video surveillance in urban spaces through surveillance society and security state frameworks, Cities 156 (2025) 105544

    T. Lysova, Intersecting perspectives: Video surveillance in urban spaces through surveillance society and security state frameworks, Cities 156 (2025) 105544

  3. [3]

    Y . Yu, C. Wang, Q. Fu, R. Kou, F. Huang, B. Yang, T. Yang, M. Gao, Techniques and challenges of image segmentation: A review, Electronics 12 (5) (2023) 1199

  4. [4]

    Z. Zou, K. Chen, Z. Shi, Y . Guo, J. Ye, Object detection in 20 years: A survey, Proceedings of the IEEE 111 (3) (2023) 257–276

  5. [5]

    Apostolidis, E

    E. Apostolidis, E. Adamantidou, A. I. Metsai, V . Mezaris, I. Patras, Video summarization using deep neural networks: A survey, Proceedings of the IEEE 109 (11) (2021) 1838–1863

  6. [6]

    C. Yang, L. Zhang, H. Lu, X. Ruan, M.-H. Yang, Saliency detection via graph-based manifold ranking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3166–3173

  7. [7]

    Y . Niu, Y . Geng, X. Li, F. Liu, Leveraging stereopsis for saliency analysis, in: 2012 IEEE conference on com- puter vision and pattern recognition, IEEE, 2012, pp. 454–461

  8. [8]

    G. Wang, C. Li, Y . Ma, A. Zheng, J. Tang, B. Luo, Rgb-t saliency detection benchmark: Dataset, baselines, anal- ysis and a novel approach, in: Image and graphics tech- nologies and applications: 13th conference on image and graphics technologies and applications, IGTA 2018, Bei- jing, China, April 8–10, 2018, revised selected papers 13, Springer, 2018, p...

  9. [9]

    F. Li, T. Kim, A. Humayun, D. Tsai, J. M. Rehg, Video segmentation by tracking many figure-ground segments, in: Proceedings of the IEEE international conference on computer vision, 2013, pp. 2192–2199

  10. [10]

    T.-N. Le, T. V . Nguyen, Z. Nie, M.-T. Tran, A. Sugi- moto, Anabranch network for camouflaged object seg- mentation, Computer vision and image understanding 184 (2019) 45–56

  11. [11]

    Bideau, E

    P. Bideau, E. Learned-Miller, It’s moving! a probabilistic model for causal motion segmentation in moving camera videos, in: Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, Springer, 2016, pp. 433–449

  12. [12]

    X. Fang, M. Jiang, J. Zhu, X. Shao, H. Wang, Group- transnet: Group transformer network for rgb-d salient object detection, Neurocomputing 594 (2024) 127865

  13. [13]

    K. Wang, Z. Tu, C. Li, C. Zhang, B. Luo, Learning adap- tive fusion bank for multi-modal salient object detection, IEEE Transactions on Circuits and Systems for Video Technology 34 (8) (2024) 7344–7358

  14. [14]

    B. Yin, X. Zhang, D.-P. Fan, S. Jiao, M.-M. Cheng, L. Van Gool, Q. Hou, Camoformer: Masked separable attention for camouflaged object detection, IEEE Trans- actions on Pattern Analysis and Machine Intelligence (2024)

  15. [15]

    Z. Wu, D. P. Paudel, D.-P. Fan, J. Wang, S. Wang, C. De- monceaux, R. Timofte, L. Van Gool, Source-free depth for object pop-out, in: ICCV , 2023

  16. [16]

    H. Wen, K. Song, L. Huang, H. Wang, Y . Yan, Cross- modality salient object detection network with univer- sality and anti-interference, Knowledge-Based Systems 264 (2023) 110322

  17. [17]

    H. Gao, Y . Su, F. Wang, H. Li, Heterogeneous fusion and integrity learning network for rgb-d salient object de- tection, ACM Transactions on Multimedia Computing, Communications and Applications 20 (7) (2024) 1–24

  18. [18]

    G. Chen, Q. Wang, B. Dong, R. Ma, N. Liu, H. Fu, Y . Xia, Em-trans: Edge-aware multimodal transformer for rgb-d salient object detection, IEEE Transactions on Neural Networks and Learning Systems 36 (2) (2024) 3175–3188

  19. [19]

    J. Xu, Q. Zhou, J. Yu, C. Liao, D. Zhu, Semantic- orthogonal multi-modal attention network for rgb-d salient object detection, The Visual Computer (2025) 1– 13

  20. [20]

    J. Zhu, X. Qin, A. Elsaddik, Dc-net: Divide-and-conquer for salient object detection, Pattern Recognition 157 (2025) 110903

  21. [21]

    F. Sun, P. Ren, B. Yin, F. Wang, H. Li, Catnet: A cascaded and aggregated transformer network for rgb-d salient object detection, IEEE Transactions on Multime- dia 26 (2023) 2249–2262

  22. [22]

    X. Hu, F. Sun, J. Sun, F. Wang, H. Li, Cross-modal fu- sion and progressive decoding network for rgb-d salient object detection, International Journal of Computer Vi- sion 132 (8) (2024) 3067–3085. 13

  23. [23]

    Gollisch, M

    T. Gollisch, M. Meister, Eye smarter than scientists be- lieved: neural computations in circuits of the retina, Neu- ron 65 (2) (2010) 150–164

  24. [24]

    D. C. Van Essen, C. H. Anderson, D. J. Felleman, Infor- mation processing in the primate visual system: an in- tegrated systems perspective, Science 255 (5043) (1992) 419–423

  25. [25]

    Zhang, Z.-F

    Y .-J. Zhang, Z.-F. Yu, J. K. Liu, T.-J. Huang, Neural decoding of visual information across different neural recording modalities and approaches, Machine Intelli- gence Research 19 (5) (2022) 350–365

  26. [26]

    Z. Shao, L. Ma, B. Li, D. M. Beck, Leveraging the hu- man ventral visual stream to improve neural network ro- bustness, arXiv preprint arXiv:2405.02564 (2024)

  27. [27]

    Z. Wu, L. Su, Q. Huang, Cascaded partial decoder for fast and accurate salient object detection, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

  28. [28]

    J.-J. Liu, Q. Hou, Z.-A. Liu, M.-M. Cheng, Poolnet+: Exploring the potential of pooling for salient object de- tection, IEEE Transactions on Pattern Analysis and Ma- chine Intelligence 45 (1) (2023) 887–904

  29. [29]

    X. Zhou, K. Shen, Z. Liu, Admnet: Attention-guided densely multi-scale network for lightweight salient ob- ject detection, IEEE Transactions on Multimedia 26 (2024) 10828–10841

  30. [30]

    B.-W. Yin, Z. Lin, Exploring salient object detection with adder neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 39, 2025, pp. 9490–9498

  31. [31]

    Zhuge, D.-P

    M. Zhuge, D.-P. Fan, N. Liu, D. Zhang, D. Xu, L. Shao, Salient object detection via integrity learning, IEEE Transactions on Pattern Analysis and Machine Intelli- gence 45 (3) (2023) 3738–3752

  32. [32]

    Y . K. Yun, W. Lin, Towards a complete and detail- preserved salient object detection, IEEE Transactions on Multimedia 26 (2023) 4667–4680

  33. [33]

    Y . Wang, R. Wang, X. Fan, T. Wang, X. He, Pixels, re- gions, and objects: Multiple enhancement for salient ob- ject detection, in: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2023, pp. 10031–10040

  34. [34]

    N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4722–4732

  35. [35]

    J. Xu, Q. Zhou, D. Zhu, Y . Chen, Y . Yi, X. Zhao, Tp- seg: Task-prototype framework for unified medical le- sion segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2026, pp. 5452–5462

  36. [36]

    Q. Zhou, J. Xu, Y . Chen, D. Zhu, Y . Yi, X. Zhao, Dif- ferseg: Towards diverse multimodal binary segmentation via differential perception and frequency guidance, IEEE Transactions on Circuits and Systems for Video Technol- ogy (2026)

  37. [37]

    Zhong, J

    M. Zhong, J. Sun, P. Ren, F. Wang, F. Sun, Magnet: multi-scale awareness and global fusion network for rgb- d salient object detection, Knowledge-Based Systems 299 (2024) 112126

  38. [38]

    H. Chen, F. Shen, D. Ding, Y . Deng, C. Li, Disentangled cross-modal transformer for rgb-d salient object detec- tion and beyond, IEEE Transactions on Image Process- ing (2024)

  39. [39]

    N. Liu, Z. Luo, N. Zhang, J. Han, Vst++: Efficient and stronger visual saliency transformer, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  40. [40]

    F. Sun, W. Zhou, W. Yan, Y . Zhang, Hfenet: Hybrid fea- ture encoder network for detecting salient objects in rgb- thermal images, Digital Signal Processing 148 (2024) 104439

  41. [41]

    S. Duan, X. Yang, N. Wang, X. Gao, Lightweight rgb-d salient object detection from a speed-accuracy tradeoffperspective, IEEE Transactions on Image Pro- cessingEarly Access (2025)

  42. [42]

    B. Xu, Q. Jiang, X. Zhao, C. Lu, H. Liang, R. Liang, Multidimensional exploration of segment any- thing model for weakly supervised video salient object detection, IEEE Transactions on circuits and systems for video technology (2024)

  43. [43]

    M. Lee, S. Cho, S. Lee, C. Park, S. Lee, Unsupervised video object segmentation via prototype memory net- work, in: Proceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, 2023, pp. 5924–5934

  44. [44]

    N. Liu, K. Nan, W. Zhao, X. Yao, J. Han, Learning complementary spatial–temporal transformer for video salient object detection, IEEE Transactions on Neural Networks and Learning Systems 35 (8) (2023) 10663– 10673

  45. [45]

    Y . Piao, C. Lu, M. Zhang, H. Lu, Semi-supervised video salient object detection based on uncertainty- guided pseudo labels, Advances in Neural Information Processing Systems 35 (2022) 5614–5627

  46. [46]

    Y . Su, J. Deng, R. Sun, G. Lin, Q. Wu, A uni- fied transformer framework for group-based segmenta- tion: Co-segmentation, co-saliency detection and video salient object detection, IEEE Transactions on Multime- dia (2023). 14

  47. [47]

    Q. Jia, S. Yao, Y . Liu, X. Fan, R. Liu, Z. Luo, Segment, magnify and reiterate: Detecting camouflaged objects the hard way, in: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2022, pp. 4713–4722

  48. [48]

    Y . Sun, C. Xu, J. Yang, H. Xuan, L. Luo, Frequency- spatial entanglement learning for camouflaged object de- tection (2024) 343–360

  49. [49]

    Y . Liu, C. Li, X. Dong, L. Li, D. Zhang, S. Xu, J. Han, Seamless detection: Unifying salient object detection and camouflaged object detection, Expert Systems with Applications 274 (2025) 126912

  50. [50]

    Z. Yu, X. Zhang, L. Zhao, Y . Bin, G. Xiao, Explor- ing deeper! segment anything model with depth percep- tion for camouflaged object detection, in: Proceedings of the 32nd ACM international conference on multimedia, 2024, pp. 4322–4330

  51. [51]

    R. Cong, Q. Lin, C. Zhang, C. Li, X. Cao, Q. Huang, Y . Zhao, Cir-net: Cross-modality interaction and refine- ment for rgb-d salient object detection, IEEE Transac- tions on Image Processing 31 (2022) 6800–6815

  52. [52]

    Y . Lv, J. Zhang, Y . Dai, A. Li, B. Liu, N. Barnes, D.- P. Fan, Simultaneously localize, segment and rank the camouflaged objects, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11591–11601

  53. [53]

    Y . Liu, S. Chen, H. Tang, S. Wang, Lightweight hybrid attention rgb-d networks for accurate camouflaged object detection, The Visual Computer (2025) 1–17

  54. [54]

    H. Bi, Y . Tong, J. Zhang, C. Zhang, J. Tong, W. Jin, Depth alignment interaction network for camouflaged object detection, Multimedia Systems 30 (1) (2024) 51

  55. [55]

    Huang, H

    Z. Huang, H. Dai, T.-Z. Xiang, S. Wang, H.-X. Chen, J. Qin, H. Xiong, Feature shrinkage pyramid for camou- flaged object detection with transformers, in: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5557–5566

  56. [56]

    C. He, K. Li, Y . Zhang, L. Tang, Y . Zhang, Z. Guo, X. Li, Camouflaged object detection with feature decom- position and edge reconstruction, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22046–22055

  57. [57]

    S. Yao, H. Sun, T.-Z. Xiang, X. Wang, X. Cao, Hier- archical graph interaction transformer with dynamic to- ken clustering for camouflaged object detection, arXiv preprint arXiv:2408.15020 (2024)

  58. [58]

    L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans- actions on Pattern Analysis and Machine Intelligence 20 (11) (1998) 1254–1259

  59. [59]

    Simonyan, A

    K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in: Advances in Neural Information Processing Systems, 2014, pp. 568–576

  60. [60]

    W. Wang, J. Shen, X. Dong, A. Borji, Salient object de- tection driven by fixation prediction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1711–1720

  61. [61]

    W. Zhai, Y . Cao, J. Zhang, Z.-J. Zha, Exploring figure- ground assignment mechanism in perceptual organiza- tion, Advances in Neural Information Processing Sys- tems 35 (2022) 17030–17042

  62. [62]

    Yan, T.-N

    J. Yan, T.-N. Le, K.-D. Nguyen, M.-T. Tran, T.-T. Do, T. V . Nguyen, Mirrornet: Bio-inspired camouflaged ob- ject segmentation, IEEE access 9 (2021) 43290–43300

  63. [63]

    W. Zhai, Y . Cao, H. Xie, Z.-J. Zha, Deep texton- coherence network for camouflaged object detection, IEEE Transactions on Multimedia 25 (2022) 5155–5165

  64. [64]

    L. Xu, X. You, F. Jia, K. Liu, Bicod: a camouflaged object detection method directed by cognitive attention, IEEE Sensors Journal 24 (4) (2023) 4711–4721

  65. [65]

    Z. Chen, J. Zhang, D. Tao, Recurrent glimpse-based de- coder for detection with transformer, in: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2022, pp. 5260–5269

  66. [66]

    F. Yang, Q. Zhai, X. Li, R. Huang, A. Luo, H. Cheng, D.-P. Fan, Uncertainty-guided transformer reasoning for camouflaged object detection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4146–4155

  67. [67]

    Zhang, M

    Z. Zhang, M. Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels, Advances in neural information processing systems 31 (2018)

  68. [68]

    Rezatofighi, N

    H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized intersection over union: A met- ric and a loss for bounding box regression, in: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

  69. [69]

    L. Wang, H. Lu, Y . Wang, M. Feng, D. Wang, B. Yin, X. Ruan, Learning to detect salient objects with image- level supervision, in: Proceedings of the IEEE confer- ence on computer vision and pattern recognition, 2017, pp. 136–145

  70. [70]

    G. Li, Y . Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455–5463. 15

  71. [71]

    Q. Yan, L. Xu, J. Shi, J. Jia, Hierarchical saliency detec- tion, in: Proceedings of the IEEE conference on com- puter vision and pattern recognition, 2013, pp. 1155– 1162

  72. [72]

    W. Liu, X. Shen, C.-M. Pun, X. Cun, Explicit visual prompting for low-level structure segmentations, in: Pro- ceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2023, pp. 19434–19445

  73. [73]

    C. Cen, F. Li, Z. Li, Y . Wang, Towards salient object detection via parallel dual-decoder network, Engineer- ing Applications of Artificial Intelligence 139 (2025) 109638

  74. [74]

    H. Peng, B. Li, W. Xiong, W. Hu, R. Ji, Rgbd salient ob- ject detection: A benchmark and algorithms, in: Com- puter Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceed- ings, Part III 13, Springer, 2014, pp. 92–109

  75. [75]

    R. Ju, L. Ge, W. Geng, T. Ren, G. Wu, Depth saliency based on anisotropic center-surround difference, in: 2014 IEEE international conference on image process- ing (ICIP), IEEE, 2014, pp. 1115–1119

  76. [76]

    Y . Piao, W. Ji, J. Li, M. Zhang, H. Lu, Depth-induced multi-scale recurrent attention network for saliency de- tection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7254–7263

  77. [77]

    W. Zhou, Y . Zhu, J. Lei, R. Yang, L. Yu, Lsnet: Lightweight spatial boosting network for detecting salient objects in rgb-thermal images, IEEE Transactions on Image Processing 32 (2023) 1329–1340

  78. [78]

    Z. Zeng, H. Liu, F. Chen, X. Tan, Airsod: A lightweight network for rgb-d salient object detection, IEEE Trans- actions on Circuits and Systems for Video Technology 34 (3) (2024) 1656–1669

  79. [79]

    Y . Zhan, Z. Zeng, H. Liu, X. Tan, Y . Tian, Mambasod: Dual mamba-driven cross-modal fusion network for rgb- d salient object detection, Neurocomputing 631 (2025) 129718

  80. [80]

    Z. Tu, T. Xia, C. Li, X. Wang, Y . Ma, J. Tang, Rgb-t im- age saliency detection via collaborative graph learning, IEEE Transactions on Multimedia 22 (1) (2019) 160– 173

Showing first 80 references.