Recognition: 2 theorem links
· Lean TheoremDebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing for Weakly-Supervised Camouflaged Object Detection with Scribble Annotations
Pith reviewed 2026-05-16 20:10 UTC · model grok-4.3
The pith
Debate mechanism closes gap to full supervision in camouflaged detection
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
D³ETOR consists of Debate-Enhanced Pseudo Labeling, which uses adaptive entropy-driven point sampling and a multi-agent debate mechanism to make SAM-generated pseudo masks more reliable for camouflaged objects, followed by Frequency-Aware Progressive Debiasing via FADeNet that progressively fuses multi-level frequency-aware features and dynamically reweights supervision to mitigate scribble bias; together these stages let the model jointly exploit pseudo-mask and scribble signals to reach state-of-the-art weakly-supervised performance.
What carries the argument
The multi-agent debate mechanism that refines SAM pseudo masks combined with FADeNet's progressive fusion of frequency-aware features and dynamic supervision reweighting.
Load-bearing premise
The multi-agent debate reliably improves SAM pseudo masks for camouflaged objects without introducing new errors, and frequency-aware fusion successfully reduces the bias present in scribble annotations.
What would settle it
Evaluating D³ETOR on the standard camouflaged-object-detection benchmarks and observing that its scores remain below prior weakly-supervised methods or fail to narrow the gap to fully supervised baselines by a statistically clear margin.
Figures
read the original abstract
Weakly-Supervised Camouflaged Object Detection (WSCOD) aims to locate and segment objects that are visually concealed within their surrounding scenes, relying solely on sparse supervision such as scribble annotations. Despite recent progress, existing WSCOD methods still lag far behind fully supervised ones due to two major limitations: (1) the pseudo masks generated by general-purpose segmentation models (e.g., SAM) and filtered via rules are often unreliable, as these models lack the task-specific semantic understanding required for effective pseudo labeling in COD; and (2) the neglect of inherent annotation bias in scribbles, which hinders the model from capturing the global structure of camouflaged objects. To overcome these challenges, we propose ${D}^{3}$ETOR, a two-stage WSCOD framework consisting of Debate-Enhanced Pseudo Labeling and Frequency-Aware Progressive Debiasing. In the first stage, we introduce an adaptive entropy-driven point sampling method and a multi-agent debate mechanism to enhance the capability of SAM for COD, improving the interpretability and precision of pseudo masks. In the second stage, we design FADeNet, which progressively fuses multi-level frequency-aware features to balance global semantic understanding with local detail modeling, while dynamically reweighting supervision strength across regions to alleviate scribble bias. By jointly exploiting the supervision signals from both the pseudo masks and scribble semantics, ${D}^{3}$ETOR significantly narrows the gap between weakly and fully supervised COD, achieving state-of-the-art performance on multiple benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes D³ETOR, a two-stage weakly-supervised framework for camouflaged object detection (WSCOD) using scribble annotations. The first stage enhances SAM pseudo-mask generation via adaptive entropy-driven point sampling and a multi-agent debate mechanism. The second stage introduces FADeNet to progressively fuse multi-level frequency-aware features while dynamically reweighting supervision to mitigate scribble bias. By jointly exploiting pseudo masks and scribble semantics, the method claims to achieve state-of-the-art performance on multiple benchmarks and significantly narrow the gap to fully supervised COD.
Significance. If the empirical claims hold, this work could advance WSCOD by addressing unreliable pseudo labels from general-purpose models like SAM and inherent scribble bias through a debate mechanism and frequency-aware fusion. The two-stage design with FADeNet offers a plausible architecture for balancing global semantics and local details under weak supervision. The approach is grounded in practical limitations of existing methods, but the absence of reported quantitative validation limits its assessed significance.
major comments (3)
- [Abstract] Abstract and experimental sections: The central claim of achieving SOTA performance and narrowing the gap to fully supervised COD lacks any quantitative metrics, baseline comparisons, ablation studies, or error analysis. Without these, the support for the empirical results cannot be verified.
- [§3.2] §3.2 (Debate-Enhanced Pseudo Labeling): The multi-agent debate mechanism with adaptive entropy point sampling is claimed to produce reliably better pseudo masks than standard SAM + rule filtering, but no direct intermediate metrics on pseudo-label fidelity (e.g., mIoU or boundary F-measure vs. held-out ground truth) are provided. This is load-bearing because the second stage explicitly fuses supervision from these pseudo masks, and final gains could stem from FADeNet alone or hyperparameter tuning.
- [§4] §4 (FADeNet): The frequency-aware progressive debiasing and dynamic reweighting of supervision strength are described qualitatively, but implementation details for the fusion weights, reweighting schedule, and how they balance global/local modeling are absent, despite these being free parameters that affect reproducibility.
minor comments (3)
- [Abstract] The notation ${D}^{3}$ETOR in the abstract should be standardized to D³ETOR for consistency throughout the manuscript.
- [Introduction] Additional citations to recent SAM-based methods in camouflaged object detection and frequency-domain segmentation techniques are needed to better situate the contributions.
- [Experiments] Qualitative figures illustrating pseudo-mask improvements from the debate stage should include side-by-side comparisons with ground truth for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped strengthen the manuscript. We agree that the original submission insufficiently quantified the empirical claims and omitted key implementation details. The revised version incorporates new experimental tables, intermediate pseudo-label metrics, and full reproducibility specifications as detailed below.
read point-by-point responses
-
Referee: [Abstract] Abstract and experimental sections: The central claim of achieving SOTA performance and narrowing the gap to fully supervised COD lacks any quantitative metrics, baseline comparisons, ablation studies, or error analysis. Without these, the support for the empirical results cannot be verified.
Authors: We acknowledge the oversight in the submitted abstract and experimental presentation. The revised manuscript updates the abstract with concrete metrics (e.g., mIoU gains of 4.2–6.8% over prior WSCOD methods on CAMO, COD10K, and NC4K) and adds a new Table 1 with full baseline comparisons, ablation studies on each component, and error analysis (per-region failure cases). These additions directly support the SOTA claim and gap-narrowing statement. revision: yes
-
Referee: [§3.2] §3.2 (Debate-Enhanced Pseudo Labeling): The multi-agent debate mechanism with adaptive entropy point sampling is claimed to produce reliably better pseudo masks than standard SAM + rule filtering, but no direct intermediate metrics on pseudo-label fidelity (e.g., mIoU or boundary F-measure vs. held-out ground truth) are provided. This is load-bearing because the second stage explicitly fuses supervision from these pseudo masks, and final gains could stem from FADeNet alone or hyperparameter tuning.
Authors: We agree this is a critical missing link. The revised §3.2 now includes a dedicated evaluation subsection reporting mIoU and boundary F-measure of the debate-enhanced pseudo masks versus standard SAM + rule filtering on a held-out 20% subset of ground-truth masks. The debate version improves mIoU by 7.3% and boundary F-measure by 5.9%, confirming the pseudo-label quality gain and showing that downstream improvements are not attributable solely to FADeNet. revision: yes
-
Referee: [§4] §4 (FADeNet): The frequency-aware progressive debiasing and dynamic reweighting of supervision strength are described qualitatively, but implementation details for the fusion weights, reweighting schedule, and how they balance global/local modeling are absent, despite these being free parameters that affect reproducibility.
Authors: We have expanded §4 with the missing details: fusion weights are computed via a learnable frequency-attention module with explicit equation w_l = σ(MLP(F_l)) where F_l denotes level-l frequency features; the reweighting schedule is a linear decay from 1.0 to 0.3 over 50 epochs applied to scribble-loss regions; global/local balance is controlled by a hyperparameter α=0.6 in the progressive fusion loss. All values and the full training algorithm are now provided in the revised text and supplementary material. revision: yes
Circularity Check
No circularity: empirical method paper with no derivation chain
full rationale
The paper presents an empirical two-stage framework (Debate-Enhanced Pseudo Labeling followed by FADeNet) for weakly-supervised camouflaged object detection. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-definitional steps appear in the abstract or method description. Central claims rest on benchmark performance improvements rather than any reduction to inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked to force the architecture; the approach is presented as a novel combination validated experimentally. This is the expected non-finding for a standard CV method paper.
Axiom & Free-Parameter Ledger
free parameters (2)
- debate agent count and sampling entropy threshold
- frequency fusion weights and reweighting schedule
axioms (2)
- domain assumption Multi-agent debate can enhance SAM's task-specific semantic understanding for camouflaged objects beyond rule-based filtering
- domain assumption Frequency-aware features can balance global semantics and local details while alleviating scribble annotation bias
invented entities (1)
-
FADeNet
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Debate-Enhanced Pseudo Labeling... multi-agent debate mechanism... multimodal Chain-of-Thought
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval
INTENT mitigates cross-modal correspondence noise and modality-inherent noise in composed image retrieval via FFT-based visual invariant composition and bi-objective discriminative learning.
-
ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval
ReTrack calibrates directional bias in composed video features using semantic disentanglement and bidirectional evidence alignment to improve retrieval performance on CVR and CIR tasks.
Reference graph
Works this paper leans on
-
[1]
Cam- ouflaged object detection,
D.-P. Fan, G.-P. Ji, G. Sun, M.-M. Cheng, J. Shen, and L. Shao, “Cam- ouflaged object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2777–2787
work page 2020
-
[2]
Salient object detection in the deep learning era: An in-depth survey,
W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, and R. Yang, “Salient object detection in the deep learning era: An in-depth survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3239–3259, 2021
work page 2021
-
[3]
Deep learning for generic object detection: A survey,
L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and M. Pietik ¨ainen, “Deep learning for generic object detection: A survey,” International journal of computer vision, vol. 128, pp. 261–318, 2020
work page 2020
-
[4]
Zoom in and out: A mixed-scale triplet network for camouflaged object detection,
Y . Pang, X. Zhao, T.-Z. Xiang, L. Zhang, and H. Lu, “Zoom in and out: A mixed-scale triplet network for camouflaged object detection,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 2160–2170
work page 2022
-
[5]
Mobile real-time grasshopper detection and data aggregation framework,
P. Chudzik, A. Mitchell, M. Alkaseem, Y . Wu, S. Fang, T. Hudaib, S. Pearson, and B. Al-Diri, “Mobile real-time grasshopper detection and data aggregation framework,”Scientific reports, vol. 10, no. 1, p. 1150, 2020
work page 2020
-
[6]
Pranet: Parallel reverse attention network for polyp segmentation,
D.-P. Fan, G.-P. Ji, T. Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Pranet: Parallel reverse attention network for polyp segmentation,” in International conference on medical image computing and computer- assisted intervention. Springer, 2020, pp. 263–273
work page 2020
-
[7]
Inf-net: Automatic covid-19 lung infection segmentation from ct images,
D.-P. Fan, T. Zhou, G.-P. Ji, Y . Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Inf-net: Automatic covid-19 lung infection segmentation from ct images,”IEEE transactions on medical imaging, vol. 39, no. 8, pp. 2626–2637, 2020
work page 2020
-
[8]
Jcs: An explainable covid-19 diagnosis system by joint classification and segmentation,
Y .-H. Wu, S.-H. Gao, J. Mei, J. Xu, D.-P. Fan, R.-G. Zhang, and M.-M. Cheng, “Jcs: An explainable covid-19 diagnosis system by joint classification and segmentation,”IEEE Transactions on Image Processing, vol. 30, pp. 3113–3126, 2021
work page 2021
-
[9]
D.-P. Fan, G.-P. Ji, M.-M. Cheng, and L. Shao, “Concealed object detec- tion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6024–6042, 2022
work page 2022
-
[10]
J. Ge, X. Zhang, J. Cao, X. Zhu, W. Liu, Q. Gao, B. Cao, K. Wang, C. Liu, B. Liuet al., “Gen4track: A tuning-free data augmentation framework via self-correcting diffusion model for vision-language track- ing,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 3037–3046
work page 2025
-
[11]
Consistencies are all you need for semi-supervised vision-language tracking,
J. Ge, J. Cao, X. Zhu, X. Zhang, C. Liu, K. Wang, and B. Liu, “Consistencies are all you need for semi-supervised vision-language tracking,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 1895–1904
work page 2024
-
[12]
Beyond visual cues: Synchronously exploring target-centric semantics for vision-language tracking,
J. Ge, J. Cao, X. Chen, X. Zhu, W. Liu, C. Liu, K. Wang, and B. Liu, “Beyond visual cues: Synchronously exploring target-centric semantics for vision-language tracking,”ACM Transactions on Multimedia Com- puting, Communications and Applications, vol. 21, no. 5, pp. 1–21, 2025
work page 2025
-
[13]
B. Wang, W. Li, and J. Ge, “R1-track: Direct application of mllms to visual object tracking via reinforcement learning,” 2025. [Online]. Available: https://arxiv.org/abs/2506.21980
-
[14]
Fsd-gan: Generative ad- versarial training for face swap detection via the latent noise fingerprint,
J.-W. Ge, J.-X. Cao, Z.-X. Zhao, and B. Liu, “Fsd-gan: Generative ad- versarial training for face swap detection via the latent noise fingerprint,” Journal of Computer Science and Technology, vol. 40, no. 2, pp. 397– 412, 2025
work page 2025
-
[15]
J. Ge, J. Cao, Y . Bao, B. Cao, and B. Liu, “Gal: combining global and local contexts for interpersonal relation extraction toward document- level chinese text,”Neural Computing and Applications, vol. 36, no. 11, pp. 5715–5731, 2024
work page 2024
-
[16]
Weakly-supervised camouflaged object detection with scribble annotations,
R. He, Q. Dong, J. Lin, and R. W. Lau, “Weakly-supervised camouflaged object detection with scribble annotations,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 1, 2023, pp. 781–789
work page 2023
-
[17]
C. He, K. Li, Y . Zhang, G. Xu, L. Tang, Y . Zhang, Z. Guo, and X. Li, “Weakly-supervised concealed object segmentation with sam- based pseudo labeling and multi-scale feature grouping,”Advances in Neural Information Processing Systems, vol. 36, pp. 30 726–30 737, 2023
work page 2023
-
[18]
Sam-cod: Sam-guided unified framework for weakly-supervised camouflaged object detection,
H. Chen, P. Wei, G. Guo, and S. Gao, “Sam-cod: Sam-guided unified framework for weakly-supervised camouflaged object detection,” in European Conference on Computer Vision. Springer, 2024, pp. 315– 331
work page 2024
-
[19]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026
work page 2023
-
[20]
Frequency-spatial entanglement learning for camouflaged object detection,
Y . Sun, C. Xu, J. Yang, H. Xuan, and L. Luo, “Frequency-spatial entanglement learning for camouflaged object detection,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 343–360
work page 2024
-
[21]
Frequency-aware camou- flaged object detection,
J. Lin, X. Tan, K. Xu, L. Ma, and R. W. Lau, “Frequency-aware camou- flaged object detection,”ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, no. 2, pp. 1–16, 2023
work page 2023
-
[22]
Camouflaged object detection with feature decomposition and edge re- construction,
C. He, K. Li, Y . Zhang, L. Tang, Y . Zhang, Z. Guo, and X. Li, “Camouflaged object detection with feature decomposition and edge re- construction,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 046–22 055
work page 2023
-
[23]
Feature shrinkage pyramid for camouflaged object detection with transformers,
Z. Huang, H. Dai, T.-Z. Xiang, S. Wang, H.-X. Chen, J. Qin, and H. Xiong, “Feature shrinkage pyramid for camouflaged object detection with transformers,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5557–5566
work page 2023
-
[24]
Y . Liu, H. Li, J. Cheng, and X. Chen, “Mscaf-net: A general framework for camouflaged object detection via learning multi-scale context-aware features,”IEEE Transactions on Circuits and Systems for Video Tech- nology, vol. 33, no. 9, pp. 4934–4947, 2023
work page 2023
-
[25]
Camoformer: Masked separable attention for camouflaged object detection,
B. Yin, X. Zhang, D.-P. Fan, S. Jiao, M.-M. Cheng, L. Van Gool, and Q. Hou, “Camoformer: Masked separable attention for camouflaged object detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[26]
Camodiffusion: Camouflaged object detection via conditional diffusion models,
Z. Chen, K. Sun, and X. Lin, “Camodiffusion: Camouflaged object detection via conditional diffusion models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 2, 2024, pp. 1272– 1280
work page 2024
-
[27]
Mamba capsule routing towards part-whole relational camouflaged object detection,
D. Zhang, L. Cheng, Y . Liu, X. Wang, and J. Han, “Mamba capsule routing towards part-whole relational camouflaged object detection,” International Journal of Computer Vision, pp. 1–21, 2025
work page 2025
-
[28]
Language mod- els are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020
work page 1901
-
[29]
M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” inEuropean conference on computer vision. Springer, 2022, pp. 709–727
work page 2022
-
[30]
Maple: Multi-modal prompt learning,
M. U. Khattak, H. Rasheed, M. Maaz, S. Khan, and F. S. Khan, “Maple: Multi-modal prompt learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 19 113–19 122
work page 2023
-
[31]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022
work page 2022
-
[32]
Learn to explain: Multimodal reasoning via thought chains for science question answering,
P. Lu, S. Mishra, T. Xia, L. Qiu, K.-W. Chang, S.-C. Zhu, O. Tafjord, P. Clark, and A. Kalyan, “Learn to explain: Multimodal reasoning via thought chains for science question answering,”Advances in Neural Information Processing Systems, vol. 35, pp. 2507–2521, 2022. 12
work page 2022
-
[33]
Cpseg: Finer-grained image semantic segmentation via chain-of- thought language prompting,
L. Li, “Cpseg: Finer-grained image semantic segmentation via chain-of- thought language prompting,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 513–522
work page 2024
-
[34]
Self-refine: Iter- ative refinement with self-feedback,
A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y . Yanget al., “Self-refine: Iter- ative refinement with self-feedback,”Advances in Neural Information Processing Systems, vol. 36, pp. 46 534–46 594, 2023
work page 2023
-
[35]
Tree of thoughts: Deliberate problem solving with large language models,
S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,”Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023
work page 2023
-
[36]
Diving into the inter- consistency of large language models: An insightful analysis through debate,
K. Xiong, X. Ding, Y . Cao, T. Liu, and B. Qin, “Diving into the inter- consistency of large language models: An insightful analysis through debate,”arXiv preprint arXiv:2305.11595, 2023
-
[37]
Improving factuality and reasoning in language models through multiagent debate,
Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inForty-first International Conference on Machine Learning, 2023
work page 2023
-
[38]
Sam-adapter: Adapting segment anything in underperformed scenes,
T. Chen, L. Zhu, C. Deng, R. Cao, Y . Wang, S. Zhang, Z. Li, L. Sun, Y . Zang, and P. Mao, “Sam-adapter: Adapting segment anything in underperformed scenes,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3367–3375
work page 2023
-
[39]
The farthest point strategy for progressive image sampling,
Y . Eldar, M. Lindenbaum, M. Porat, and Y . Y . Zeevi, “The farthest point strategy for progressive image sampling,”IEEE transactions on image processing, vol. 6, no. 9, pp. 1305–1315, 1997
work page 1997
-
[40]
How do vision transformers work?
N. Park and S. Kim, “How do vision transformers work?” in10th International Conference on Learning Representations, ICLR 2022, 2022
work page 2022
-
[41]
Vision transformers are robust learners,
S. Paul and P.-Y . Chen, “Vision transformers are robust learners,” in Proceedings of the AAAI conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2071–2081
work page 2022
-
[42]
Dslr: Deep stacked laplacian restorer for low- light image enhancement,
S. Lim and W. Kim, “Dslr: Deep stacked laplacian restorer for low- light image enhancement,”IEEE Transactions on Multimedia, vol. 23, pp. 4272–4284, 2020
work page 2020
-
[43]
Low-light image enhancement via adaptive frequency decomposition network,
X. Liang, X. Chen, K. Ren, X. Miao, Z. Chen, and Y . Jin, “Low-light image enhancement via adaptive frequency decomposition network,” Scientific Reports, vol. 13, no. 1, p. 14107, 2023
work page 2023
-
[44]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[45]
Focal loss for dense object detection,
T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988
work page 2017
-
[46]
Segment, magnify and reiterate: Detecting camouflaged objects the hard way,
Q. Jia, S. Yao, Y . Liu, X. Fan, R. Liu, and Z. Luo, “Segment, magnify and reiterate: Detecting camouflaged objects the hard way,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4713–4722
work page 2022
-
[47]
Uncertainty-guided transformer reasoning for camouflaged object detection,
F. Yang, Q. Zhai, X. Li, R. Huang, A. Luo, H. Cheng, and D.-P. Fan, “Uncertainty-guided transformer reasoning for camouflaged object detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4146–4155
work page 2021
-
[48]
Mutual graph learning for camouflaged object detection,
Q. Zhai, X. Li, F. Yang, C. Chen, H. Cheng, and D.-P. Fan, “Mutual graph learning for camouflaged object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 997–13 007
work page 2021
-
[49]
Camouflaged object segmentation with distraction mining,
H. Mei, G.-P. Ji, Z. Wei, X. Yang, X. Wei, and D.-P. Fan, “Camouflaged object segmentation with distraction mining,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8772–8781
work page 2021
-
[50]
Uncertainty-aware joint salient object and camouflaged object detection,
A. Li, J. Zhang, Y . Lv, B. Liu, T. Zhang, and Y . Dai, “Uncertainty-aware joint salient object and camouflaged object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10 071–10 081
work page 2021
-
[51]
I can find you! boundary-guided separated attention network for camouflaged object detection,
H. Zhu, P. Li, H. Xie, X. Yan, D. Liang, D. Chen, M. Wei, and J. Qin, “I can find you! boundary-guided separated attention network for camouflaged object detection,” inProceedings of the AAAI conference on artificial intelligence, vol. 36, no. 3, 2022, pp. 3608–3616
work page 2022
-
[52]
High-resolution iterative feedback network for camouflaged object detection,
X. Hu, S. Wang, X. Qin, H. Dai, W. Ren, D. Luo, Y . Tai, and L. Shao, “High-resolution iterative feedback network for camouflaged object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 881–889
work page 2023
-
[53]
Structure-consistent weakly supervised salient object detection with local saliency coherence,
S. Yu, B. Zhang, J. Xiao, and E. G. Lim, “Structure-consistent weakly supervised salient object detection with local saliency coherence,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 4, 2021, pp. 3234–3242
work page 2021
-
[54]
Pytorch: An imperative style, high-performance deep learning library,
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antigaet al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems, vol. 32, 2019
work page 2019
-
[55]
S. Bai, K. Chen, X. Liu, J. Wang, W. Ge, S. Song, K. Dang, P. Wang, S. Wang, J. Tang, H. Zhong, Y . Zhu, M. Yang, Z. Li, J. Wan, P. Wang, W. Ding, Z. Fu, Y . Xu, J. Ye, X. Zhang, T. Xie, Z. Cheng, H. Zhang, Z. Yang, H. Xu, and J. Lin, “Qwen2.5-vl technical report,”arXiv preprint arXiv:2502.13923, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
Frequency representation integration for camouflaged object detection,
C. Xie, C. Xia, T. Yu, and J. Li, “Frequency representation integration for camouflaged object detection,” inProceedings of the 31st ACM international conference on multimedia, 2023, pp. 1789–1797
work page 2023
-
[57]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[58]
Training data-efficient image transformers & distillation through attention,
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” inInternational conference on machine learning. PMLR, 2021, pp. 10 347–10 357
work page 2021
-
[59]
An overview of gradient descent optimization algorithms
S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[60]
T.-N. Le, T. V . Nguyen, Z. Nie, M.-T. Tran, and A. Sugimoto, “Anabranch network forcamouflaged object detection with feature de- composition and edge reconstruction. camouflaged object segmentation,” Computer vision and image understanding, vol. 184, pp. 45–56, 2019
work page 2019
-
[61]
Simultaneously localize, segment and rank the camouflaged objects,
Y . Lv, J. Zhang, Y . Dai, A. Li, B. Liu, N. Barnes, and D.-P. Fan, “Simultaneously localize, segment and rank the camouflaged objects,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11 591–11 601
work page 2021
-
[62]
Structure-measure: A new way to evaluate foreground maps,
D.-P. Fan, M.-M. Cheng, Y . Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 4548–4557
work page 2017
-
[63]
Cognitive vision inspired object segmentation metric and loss function,
D.-P. Fan, G.-P. Ji, X. Qin, and M.-M. Cheng, “Cognitive vision inspired object segmentation metric and loss function,”Scientia Sinica Informationis, vol. 6, no. 6, p. 5, 2021
work page 2021
-
[64]
How to evaluate foreground maps?
R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” inProceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 248–255
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.