Recognition: unknown
EviRCOD: Evidence-Guided Probabilistic Decoding for Referring Camouflaged Object Detection
Pith reviewed 2026-05-10 15:30 UTC · model grok-4.3
The pith
EviRCOD improves referring camouflaged object detection by combining reference-guided encoding, Dirichlet-based uncertainty modeling, and boundary refinement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that an integrated framework called EviRCOD, consisting of a Reference-Guided Deformable Encoder that employs hierarchical reference-driven modulation and multi-scale deformable aggregation to inject semantic priors, an Uncertainty-Aware Evidential Decoder that incorporates Dirichlet evidence estimation into hierarchical decoding to model uncertainty and propagate confidence across scales, and a Boundary-Aware Refinement Module that selectively enhances ambiguous boundaries by exploiting low-level edge cues and prediction confidence, solves the core limitations in Ref-COD and attains state-of-the-art detection performance with well-calibrated uncertainty estimates.
What carries the argument
The Uncertainty-Aware Evidential Decoder (UAED) that incorporates Dirichlet evidence estimation into hierarchical decoding to model uncertainty and propagate confidence across scales.
If this is right
- State-of-the-art detection performance on the Ref-COD benchmark.
- Well-calibrated uncertainty estimates that reflect true prediction reliability.
- Improved semantic alignment between category references and target objects.
- Better preservation of object boundaries in regions of high camouflage ambiguity.
Where Pith is reading between the lines
- The uncertainty outputs could support downstream systems that defer low-confidence detections to human review.
- The reference-guided modulation pattern may transfer to other reference-based segmentation problems outside camouflage.
- Combining the evidential decoder with temporal data could extend the method to video sequences of moving hidden objects.
Load-bearing premise
The RGDE, UAED, and BARM components together address semantic alignment, uncertainty modeling, and boundary preservation without introducing new failure modes in the detection process.
What would settle it
Running EviRCOD on a held-out set of Ref-COD images with highly complex camouflage and ambiguous boundaries and checking whether uncertainty calibration degrades or detection accuracy falls below prior methods would falsify the central claim.
Figures
read the original abstract
Referring Camouflaged Object Detection (Ref-COD) focuses on segmenting specific camouflaged targets in a query image using category-aligned references. Despite recent advances, existing methods struggle with reference-target semantic alignment, explicit uncertainty modeling, and robust boundary preservation. To address these issues, we propose EviRCOD, an integrated framework consisting of three core components: (1) a Reference-Guided Deformable Encoder (RGDE) that employs hierarchical reference-driven modulation and multi-scale deformable aggregation to inject semantic priors and align cross-scale representations; (2) an Uncertainty-Aware Evidential Decoder (UAED) that incorporates Dirichlet evidence estimation into hierarchical decoding to model uncertainty and propagate confidence across scales; and (3) a Boundary-Aware Refinement Module (BARM) that selectively enhances ambiguous boundaries by exploiting low-level edge cues and prediction confidence. Experiments on the Ref-COD benchmark demonstrate that EviRCOD achieves state-of-the-art detection performance while providing well-calibrated uncertainty estimates. Code is available at: https://github.com/blueecoffee/EviRCOD.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes EviRCOD, an integrated framework for Referring Camouflaged Object Detection (Ref-COD) that combines a Reference-Guided Deformable Encoder (RGDE) using hierarchical reference-driven modulation and multi-scale deformable aggregation, an Uncertainty-Aware Evidential Decoder (UAED) incorporating Dirichlet evidence estimation for uncertainty modeling and confidence propagation, and a Boundary-Aware Refinement Module (BARM) that exploits low-level edge cues and prediction confidence. The central claim is that this architecture achieves state-of-the-art detection performance on the Ref-COD benchmark while delivering well-calibrated uncertainty estimates.
Significance. If the reported performance and calibration results are robustly validated, the work would advance referring segmentation in challenging camouflaged settings by jointly tackling semantic alignment, explicit uncertainty quantification via evidential learning, and boundary preservation. The public code release supports reproducibility and follow-on research.
major comments (2)
- [Experiments] The abstract states SOTA results and well-calibrated uncertainty but supplies no quantitative metrics, baseline comparisons, error bars, ablation tables, or specific gains on the Ref-COD benchmark. The experimental section must include these details (e.g., mIoU/F-measure tables against prior Ref-COD methods) to substantiate the central empirical claim; without them the soundness of the SOTA assertion cannot be assessed.
- [3.2] The description of UAED claims Dirichlet-based uncertainty modeling and cross-scale propagation, yet no concrete formulation, loss function, or calibration metric (e.g., ECE or NLL) is visible in the provided material. Section 3.2 (or equivalent) should supply the exact evidence accumulation equations and the procedure used to obtain the reported calibration to allow verification that the uncertainty estimates are not post-hoc fitted.
minor comments (2)
- Acronyms RGDE, UAED, and BARM are introduced in the abstract without immediate parenthetical expansion, which reduces immediate readability for readers outside the sub-area.
- The abstract mentions 'hierarchical decoding' and 'multi-scale deformable aggregation' but does not clarify the backbone network or the exact reference encoding mechanism; a brief statement of the overall architecture diagram would help.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to provide the requested details and clarifications.
read point-by-point responses
-
Referee: [Experiments] The abstract states SOTA results and well-calibrated uncertainty but supplies no quantitative metrics, baseline comparisons, error bars, ablation tables, or specific gains on the Ref-COD benchmark. The experimental section must include these details (e.g., mIoU/F-measure tables against prior Ref-COD methods) to substantiate the central empirical claim; without them the soundness of the SOTA assertion cannot be assessed.
Authors: We agree that quantitative substantiation is essential. The experimental section (Section 4) already contains mIoU and F-measure tables comparing EviRCOD against prior Ref-COD methods, component ablations, and calibration results. We have now added error bars from three independent runs, explicit percentage gains, and a summary table of key metrics. To improve accessibility, we have also inserted a concise results summary into the abstract. revision: yes
-
Referee: [3.2] The description of UAED claims Dirichlet-based uncertainty modeling and cross-scale propagation, yet no concrete formulation, loss function, or calibration metric (e.g., ECE or NLL) is visible in the provided material. Section 3.2 (or equivalent) should supply the exact evidence accumulation equations and the procedure used to obtain the reported calibration to allow verification that the uncertainty estimates are not post-hoc fitted.
Authors: We apologize for any lack of clarity in the excerpt. Section 3.2 derives the Dirichlet parameters as alpha = 1 + f_theta(x), with evidence accumulation e = sum over scales of modulated features, uncertainty u = K / sum(alpha), and the evidential loss L_ev = L_seg + lambda * KL(Dir(alpha) || Dir(1)). Calibration is evaluated via ECE and NLL on held-out data. We have expanded Section 3.2 with the full set of equations for evidence accumulation, cross-scale propagation, and the exact calibration computation procedure. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper proposes an empirical neural architecture (RGDE for reference-guided modulation, UAED for Dirichlet-based uncertainty, BARM for boundary refinement) and reports benchmark results on Ref-COD. No derivation chain, equations, or first-principles predictions are present that reduce to self-definitions, fitted inputs renamed as outputs, or self-citation load-bearing steps. Claims rest on externally testable experimental performance and uncertainty calibration, with no internal reduction to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A survey on deep learning-based camouflaged object detection,
Junmin Zhong, Anzhi Wang, Chunhong Ren, and Jintao Wu, “A survey on deep learning-based camouflaged object detection,”Multimedia Systems, vol. 30, no. 5, pp. 268, 2024
2024
-
[2]
Referring camouflaged object detection,
Xuying Zhang, Bowen Yin, Zheng Lin, Qibin Hou, Deng-Ping Fan, and Ming-Ming Cheng, “Referring camouflaged object detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[3]
Uncertainty-aware transformer for referring camouflaged object detection,
Ranwan Wu, Tian-Zhu Xiang, Guo-Sen Xie, Rongrong Gao, Xiangbo Shu, Fang Zhao, and Ling Shao, “Uncertainty-aware transformer for referring camouflaged object detection,”IEEE Transactions on Image Processing, 2025
2025
-
[4]
Evidential deep learning to quantify classification uncertainty,
Murat Sensoy, Lance Kaplan, and Melih Kandemir, “Evidential deep learning to quantify classification uncertainty,”Advances in neural information processing systems, vol. 31, 2018
2018
-
[5]
Salient object detection via integrity learning,
Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, and Ling Shao, “Salient object detection via integrity learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3738–3752, 2022
2022
-
[6]
Pvt v2: Improved baselines with pyramid vision transformer,
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao, “Pvt v2: Improved baselines with pyramid vision transformer,”Computational Visual Media, vol. 8, no. 3, pp. 415–424, 2022
2022
-
[7]
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai, “Deformable detr: Deformable transformers for end-to-end object detection,”arXiv preprint arXiv:2010.04159, 2020
work page internal anchor Pith review arXiv 2010
-
[8]
Camouflaged object segmentation with distraction mining,
Haiyang Mei, Ge-Peng Ji, Ziqi Wei, Xin Yang, Xiaopeng Wei, and Deng- Ping Fan, “Camouflaged object segmentation with distraction mining,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8772–8781
2021
-
[9]
Preynet: Preying on camouflaged objects,
Miao Zhang, Shuang Xu, Yongri Piao, Dongxiang Shi, Shusen Lin, and Huchuan Lu, “Preynet: Preying on camouflaged objects,” inProceedings of the 30th ACM international conference on multimedia, 2022, pp. 5323–5332
2022
-
[10]
Con- cealed object detection,
Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, and Ling Shao, “Con- cealed object detection,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 10, pp. 6024–6042, 2021
2021
-
[11]
Boundary-guided camouflaged object detection,
Yujia Sun, Shuo Wang, Chenglizhao Chen, and Tian-Zhu Xiang, “Boundary-guided camouflaged object detection,”arXiv preprint arXiv:2207.00794, 2022
-
[12]
I can find you! boundary-guided separated attention network for camouflaged object detection,
Hongwei Zhu, Peng Li, Haoran Xie, Xu Yan, Dong Liang, Dapeng Chen, Mingqiang Wei, and Jing Qin, “I can find you! boundary-guided separated attention network for camouflaged object detection,” inAAAI Conference on Artificial Intelligence, 2022
2022
-
[13]
Zoom in and out: A mixed-scale triplet network for camouflaged object detection,
Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, and Huchuan Lu, “Zoom in and out: A mixed-scale triplet network for camouflaged object detection,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 2160–2170
2022
-
[14]
Deep gradient learning for efficient camouflaged object detection,
Ge-Peng Ji, Deng-Ping Fan, Yu-Cheng Chou, Dengxin Dai, Alexander Liniger, and Luc Van Gool, “Deep gradient learning for efficient camouflaged object detection,”Machine Intelligence Research, vol. 20, no. 1, pp. 92–108, 2023
2023
-
[15]
Unlocking attributes’ contribution to successful camouflage: A combined textual and visual analysis strategy,
Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Ding Yuan, and Yifan Yang, “Unlocking attributes’ contribution to successful camouflage: A combined textual and visual analysis strategy,” in European Conference on Computer Vision. Springer, 2024, pp. 315–331
2024
-
[16]
Glconet: Learning multisource perception representation for camouflaged object detection,
Yanguang Sun, Hanyu Xuan, Jian Yang, and Lei Luo, “Glconet: Learning multisource perception representation for camouflaged object detection,”IEEE Transactions on Neural Networks and Learning Systems, 2024
2024
-
[17]
Frequency-spatial entanglement learning for camouflaged object detec- tion,
Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, and Lei Luo, “Frequency-spatial entanglement learning for camouflaged object detec- tion,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 343–360
2024
-
[18]
F 3net: fusion, feedback and focus for salient object detection,
Jun Wei, Shuhui Wang, and Qingming Huang, “F 3net: fusion, feedback and focus for salient object detection,” inProceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, pp. 12321–12328
2020
-
[19]
Structure-measure: A new way to evaluate foreground maps,
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4548–4557
2017
-
[20]
How to evaluate foreground maps?,
Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal, “How to evaluate foreground maps?,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 248–255
2014
-
[21]
Enhanced-alignment measure for binary foreground map evaluation,
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji, “Enhanced-alignment measure for binary foreground map evaluation,”arXiv preprint arXiv:1805.10421, 2018
-
[22]
Estimating expected calibra- tion errors,
Nicolas Posocco and Antoine Bonnefoy, “Estimating expected calibra- tion errors,” inInternational conference on artificial neural networks. Springer, 2021, pp. 139–150
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.