pith. sign in

arxiv: 2604.16582 · v1 · submitted 2026-04-17 · 💻 cs.CV · cs.AI

Camo-M3FD: A New Benchmark Dataset for Cross-Spectral Camouflaged Pedestrian Detection

Pith reviewed 2026-05-10 08:56 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords camouflaged pedestrian detectioncross-spectral imagingmultispectral fusionthermal imagingbenchmark datasetvisible-thermal pairsobject detection
0
0 comments X

The pith

The Camo-M3FD benchmark shows thermal signals locate camouflaged pedestrians while multispectral fusion refines structural details

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Camo-M3FD as a benchmark of registered visible and thermal image pairs selected for high foreground-background similarity to support detection of camouflaged pedestrians. Existing camouflaged object detection work focuses on animals, leaving a gap for human targets in safety-critical applications such as autonomous driving and surveillance. The dataset supplies pixel-level masks and a standard evaluation setup using current models. Results indicate thermal data supplies key localization cues while combining both spectra improves accuracy on finer structural features. Readers care because better handling of hidden pedestrians can reduce risks in poor-visibility or cluttered real scenes.

Core claim

We introduce Camo-M3FD, a benchmark derived from the M3FD dataset and consisting of registered visible-thermal image pairs curated via quantitative metrics to ensure high foreground-background similarity. High-quality pixel-level masks are supplied along with a standardized evaluation framework using state-of-the-art camouflaged object detection models. Our results demonstrate that while thermal signals provide indispensable localization cues, multispectral fusion is essential for refining structural details.

What carries the argument

The Camo-M3FD dataset of cross-spectral visible-thermal pairs selected by quantitative foreground-background similarity metrics and equipped with pixel-level masks for evaluating camouflaged pedestrian detection

If this is right

  • Thermal imaging supplies essential localization cues for pedestrians that blend with their surroundings.
  • Multispectral fusion of visible and thermal data is required to refine structural accuracy in the detections.
  • The benchmark supplies a standardized platform for testing and improving detection systems aimed at safety-critical uses.
  • Existing models can now be assessed specifically on human camouflaged targets rather than only biological species.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Autonomous vehicle developers could adopt the benchmark to train models that lower missed detections in environments where people blend into backgrounds.
  • The similarity-metric curation method could be reused to build benchmarks for other hidden-object problems with different sensor pairs.
  • Extensions might add temporal sequences or additional spectral channels to examine motion and multi-band fusion effects.

Load-bearing premise

Quantitative similarity metrics on the source images correctly identify pairs that represent real-world camouflaged pedestrians in safety-critical conditions.

What would settle it

A test in which human raters judge the camouflage level of the selected pairs against random pairs from the source collection, or field trials where models trained on the benchmark show no detection gain over standard pedestrian datasets in actual scenes with hidden humans.

Figures

Figures reproduced from arXiv: 2604.16582 by Andrea Mero, Angel D. Sappa, Guillermo A. Castillo, Henry O. Velesaca.

Figure 1
Figure 1. Figure 1: Example images of the Camo-M3FD dataset. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Spatial distribution of the centroids of the annotated GT [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Aspect-ratio distribution of the GT masks. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of accepted and rejected (marked in red) images alongside their respective edges extracted by RGB using Sobel, edges [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results using SoTA COD techniques that have achieved first or second place in at least one of the metrics. Successful matches [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Pedestrian detection is fundamental to autonomous driving, robotics, and surveillance. Despite progress in deep learning, reliable identification remains challenging due to occlusions, cluttered backgrounds, and degraded visibility. While multispectral detection-combining visible and thermal sensors-mitigates poor visibility, the challenge of camouflaged pedestrians remains largely unexplored. Existing Camouflaged Object Detection (COD) benchmarks focus on biological species, leaving a gap in safety-critical human detection where targets blend into their surroundings. To address this, we introduce Camo-M3FD (derived from the M3FD dataset), a novel benchmark for cross-spectral camouflaged pedestrian detection, consisting of registered visible-thermal image pairs. The dataset is curated using quantitative metrics to ensure high foreground-background similarity. We provide high-quality pixel-level masks and establish a standardized evaluation framework using state-of-the-art COD models. Our results demonstrate that while thermal signals provide indispensable localization cues, multispectral fusion is essential for refining structural details. Camo-M3FD serves as a foundational resource for developing robust and safety-critical detection systems. The dataset is available on GitHub: https://cod-espol.github.io/Camo-M3FD/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Camo-M3FD, a benchmark dataset of registered visible-thermal image pairs derived from M3FD and curated via quantitative foreground-background similarity metrics to represent camouflaged pedestrians. It supplies pixel-level masks and benchmarks state-of-the-art camouflaged object detection (COD) models, claiming that thermal cues enable localization while multispectral fusion refines structural details for safety-critical applications.

Significance. If the curation successfully isolates genuinely challenging camouflaged cases, the dataset addresses a clear gap in existing COD benchmarks (which focus on biological species) by targeting human pedestrians in autonomous driving and surveillance contexts. The provision of masks and standardized evaluation on SOTA models supplies a reproducible starting point for cross-spectral detector development.

major comments (2)
  1. [Dataset Curation] Dataset curation section: the quantitative similarity metrics (e.g., color histograms, SSIM, feature distances) used to select high foreground-background similarity pairs from M3FD are not validated against human camouflage ratings or by comparing detection difficulty on selected vs. unselected pairs. This is load-bearing for the central claim that Camo-M3FD forms a representative benchmark for real-world camouflaged pedestrians, since thermal signatures often reveal body heat regardless of visible blending.
  2. [Experiments and Results] Experimental results and conclusion: the claim that 'thermal signals provide indispensable localization cues, multispectral fusion is essential for refining structural details' is not supported by ablations showing performance degradation specifically attributable to the selected camouflage cases versus standard M3FD pairs or other multispectral benchmarks.
minor comments (2)
  1. The GitHub link is mentioned only in the abstract; include it (or a permanent DOI) in the main text and data availability statement.
  2. [Related Work] Related work should explicitly compare Camo-M3FD against other multispectral pedestrian datasets (e.g., KAIST, FLIR) to clarify the novelty of the camouflage focus.

Circularity Check

0 steps flagged

No circularity: dataset curation and empirical benchmarking are self-contained

full rationale

The paper introduces Camo-M3FD by selecting registered visible-thermal pairs from M3FD via quantitative similarity metrics and supplies pixel masks plus COD-model baselines. No equations, fitted parameters, or derivations are present; the central claims rest on the curation process and reported detection results rather than any self-referential reduction. Self-citations, if any, are not load-bearing for the benchmark construction itself. The contribution is therefore independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work introduces no new free parameters, mathematical axioms, or invented entities; it builds directly on the existing M3FD dataset and established COD models.

pith-pipeline@v0.9.0 · 5521 in / 981 out tokens · 47412 ms · 2026-05-10T08:56:36.384878+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    Frequency-tuned salient region de- tection

    Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. Frequency-tuned salient region de- tection. In2009 IEEE conference on computer vision and pattern recognition, pages 1597–1604. IEEE, 2009. 5

  2. [2]

    Computer vision and deep learning techniques for pedes- trian detection and tracking: A survey.Neurocomputing, 300:17–33, 2018

    Antonio Brunetti, Domenico Buongiorno, Gian- paolo Francesco Trotta, and Vitoantonio Bevilacqua. Computer vision and deep learning techniques for pedes- trian detection and tracking: A survey.Neurocomputing, 300:17–33, 2018. 2

  3. [3]

    Camouflaged object detection via context- aware cross-level fusion.IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6981–6993, 2022

    Geng Chen, Si-Jie Liu, Yu-Jia Sun, Ge-Peng Ji, Ya-Feng Wu, and Tao Zhou. Camouflaged object detection via context- aware cross-level fusion.IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6981–6993, 2022. 2, 4, 6

  4. [4]

    Boundary-guided network for camou- flaged object detection.Knowledge-based systems, 248: 108901, 2022

    Tianyou Chen, Jin Xiao, Xiaoguang Hu, Guofeng Zhang, and Shaojie Wang. Boundary-guided network for camou- flaged object detection.Knowledge-based systems, 248: 108901, 2022. 2, 4, 6, 7

  5. [5]

    Occlusion and multi-scale pedestrian detection a re- view.Array, 19:100318, 2023

    Wei Chen, Yuxuan Zhu, Zijian Tian, Fan Zhang, and Minda Yao. Occlusion and multi-scale pedestrian detection a re- view.Array, 19:100318, 2023. 1

  6. [6]

    Histograms of oriented gra- dients for human detection

    Navneet Dalal and Bill Triggs. Histograms of oriented gra- dients for human detection. In2005 IEEE computer soci- ety conference on computer vision and pattern recognition (CVPR’05), pages 886–893. Ieee, 2005. 2

  7. [7]

    Pedestrian detection: A benchmark

    Piotr Doll ´ar, Christian Wojek, Bernt Schiele, and Pietro Per- ona. Pedestrian detection: A benchmark. In2009 IEEE con- ference on computer vision and pattern recognition, pages 304–311. IEEE, 2009. 2

  8. [8]

    Pedestrian detection: An evaluation of the state of the art.IEEE transactions on pattern analysis and machine in- telligence, 34(4):743–761, 2011

    Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Per- ona. Pedestrian detection: An evaluation of the state of the art.IEEE transactions on pattern analysis and machine in- telligence, 34(4):743–761, 2011. 2

  9. [9]

    Structure-measure: A new way to evaluate foreground maps

    Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. Structure-measure: A new way to evaluate foreground maps. InProceedings of the IEEE international conference on computer vision, pages 4548–4557, 2017. 5

  10. [10]

    Enhanced-alignment measure for binary foreground map evaluation,

    Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming- Ming Cheng, and Ali Borji. Enhanced-alignment mea- sure for binary foreground map evaluation.arXiv preprint arXiv:1805.10421, 2018. 5

  11. [11]

    Camouflaged object detec- tion

    Deng-Ping Fan, Ge-Peng Ji, Guolei Sun, Ming-Ming Cheng, Jianbing Shen, and Ling Shao. Camouflaged object detec- tion. InCVPR, 2020. 4

  12. [12]

    Concealed object detection.IEEE transactions on pat- tern analysis and machine intelligence, 44(10):6024–6042,

    Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, and Ling Shao. Concealed object detection.IEEE transactions on pat- tern analysis and machine intelligence, 44(10):6024–6042,

  13. [13]

    A discriminatively trained, multiscale, deformable part model

    Pedro Felzenszwalb, David McAllester, and Deva Ra- manan. A discriminatively trained, multiscale, deformable part model. In2008 IEEE conference on computer vision and pattern recognition, pages 1–8. Ieee, 2008. 2

  14. [14]

    Res2net: A new multi-scale backbone architecture.IEEE transactions on pattern analysis and machine intelligence, 43(2):652–662,

    Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. Res2net: A new multi-scale backbone architecture.IEEE transactions on pattern analysis and machine intelligence, 43(2):652–662,

  15. [15]

    Pedestrian detection using adaboost learning of features and vehicle pitch estimation

    David Ger ´onimo, Angel D Sappa, Antonio L ´opez, and Daniel Ponsa. Pedestrian detection using adaboost learning of features and vehicle pitch estimation. InProceedings of the International Conference on Visualization, Imaging, and Image Processing, Palma de Mallorca, Spain, 2006. 2

  16. [16]

    Com- puter vision approaches to pedestrian detection: visible spec- trum survey

    David Ger ´onimo, Antonio L´opez, and Angel D Sappa. Com- puter vision approaches to pedestrian detection: visible spec- trum survey. InIberian Conference on Pattern Recognition and Image Analysis, pages 547–554. Springer, 2007. 2

  17. [17]

    Pedestrian detection in low-light con- ditions: A comprehensive survey.Image and Vision Com- puting, 148:105106, 2024

    Bahareh Ghari, Ali Tourani, Asadollah Shahbahrami, and Georgi Gaydadjiev. Pedestrian detection in low-light con- ditions: A comprehensive survey.Image and Vision Com- puting, 148:105106, 2024. 1

  18. [18]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InConf. on Computer Vision and Pattern Recognition, pages 770–778,

  19. [19]

    High-resolution it- erative feedback network for camouflaged object detection

    Xiaobin Hu, Shuo Wang, Xuebin Qin, Hang Dai, Wenqi Ren, Donghao Luo, Ying Tai, and Ling Shao. High-resolution it- erative feedback network for camouflaged object detection. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 881–889, 2023. 2, 4, 6

  20. [20]

    Multispectral pedestrian detection: Benchmark dataset and baseline

    Soonmin Hwang, Jaesik Park, Namil Kim, Yukyung Choi, and In So Kweon. Multispectral pedestrian detection: Benchmark dataset and baseline. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 1037–1045, 2015. 1, 2

  21. [21]

    Deep gradient learn- ing for efficient camouflaged object detection.Machine In- telligence Research, 20(1):92–108, 2023

    Ge-Peng Ji, Deng-Ping Fan, Yu-Cheng Chou, Dengxin Dai, Alexander Liniger, and Luc Van Gool. Deep gradient learn- ing for efficient camouflaged object detection.Machine In- telligence Research, 20(1):92–108, 2023. 2, 4, 6

  22. [22]

    The making and breaking of camouflage

    Hala Lamdouar, Weidi Xie, and Andrew Zisserman. The making and breaking of camouflage. InProceedings of the IEEE/CVF international conference on computer vision, pages 832–842, 2023. 3

  23. [23]

    Nguyen, Zhongliang Nie, Minh- Triet Tran, and Akihiro Sugimoto

    Trung-Nghia Le, Tam V . Nguyen, Zhongliang Nie, Minh- Triet Tran, and Akihiro Sugimoto. Anabranch network for camouflaged object segmentation.Journal of Computer Vi- sion and Image Understanding, 184:45–56, 2019. 4

  24. [24]

    Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection

    Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5802–5811, 2022. 3

  25. [25]

    Modeling aleatoric uncertainty for camouflaged object detection

    Jiawei Liu, Jing Zhang, and Nick Barnes. Modeling aleatoric uncertainty for camouflaged object detection. InProceed- ings of the IEEE/CVF winter conference on applications of computer vision, pages 1445–1454, 2022. 2, 4, 6, 7

  26. [26]

    Camou- flaged people detection based on a semi-supervised search identification network.Defence Technology, 21:176–183,

    Yang Liu, Cong-qing Wang, and Yong-jun Zhou. Camou- flaged people detection based on a semi-supervised search identification network.Defence Technology, 21:176–183,

  27. [27]

    Simultaneously lo- calize, segment and rank the camouflaged objects

    Yunqiu Lyu, Jing Zhang, Yuchao Dai, Aixuan Li, Bowen Liu, Nick Barnes, and Deng-Ping Fan. Simultaneously lo- calize, segment and rank the camouflaged objects. InConf. on Computer Vision and Pattern Recognition, 2021. 4

  28. [28]

    How to evaluate foreground maps? InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 248–255, 2014

    Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. How to evaluate foreground maps? InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 248–255, 2014. 5

  29. [29]

    Saliency filters: Contrast based filter- ing for salient region detection

    Federico Perazzi, Philipp Kr ¨ahenb¨uhl, Yael Pritch, and Alexander Hornung. Saliency filters: Contrast based filter- ing for salient region detection. In2012 IEEE conference on computer vision and pattern recognition, pages 733–740. IEEE, 2012. 5

  30. [30]

    Basnet: Boundary- aware salient object detection

    Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. Basnet: Boundary- aware salient object detection. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2, 4, 6, 7

  31. [31]

    Animal camouflage analysis: Chameleon database

    Przemysław Skurowski, Hassan Abdulameer, Jakub Błaszczyk, Tomasz Depta, Adam Kornacki, and Przemysław Kozieł. Animal camouflage analysis: Chameleon database. Unpublished manuscript, 2(6):7, 2018. 4

  32. [32]

    Edge-aware mirror network for camouflaged object detection

    Dongyue Sun, Shiyao Jiang, and Lin Qi. Edge-aware mirror network for camouflaged object detection. In2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2465–2470. IEEE, 2023. 2, 4, 6

  33. [33]

    Efficientnet: Rethinking model scaling for convolutional neural networks

    Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning, pages 6105–6114. PMLR,

  34. [34]

    Computer vision in the infrared spectrum: challenges and approaches

    Michael Teutsch, Angel D Sappa, and Riad I Hammoud. Computer vision in the infrared spectrum: challenges and approaches. 2021. 1

  35. [35]

    Pedestrian detection in severe weather conditions.Ieee Ac- cess, 8:62775–62784, 2020

    Paulius Tumas, Adam Nowosielski, and Arturas Serackis. Pedestrian detection in severe weather conditions.Ieee Ac- cess, 8:62775–62784, 2020. 1

  36. [36]

    A VNet: Cross-Spectral Attention-Vision Model for Camouflaged Object Detection in Ecological Conservation

    Henry Velesaca, Hector Villegas, and Angel Sappa. A VNet: Cross-Spectral Attention-Vision Model for Camouflaged Object Detection in Ecological Conservation. InProceed- ings of the 14th International Conference on Computer Vision Theory and Applications, pages 1–10. INSTICC, SciTePress, 2026. 2, 4, 5, 6, 7

  37. [37]

    Assisted refinement network based on channel information interaction for camouflaged object de- tection

    Kuan Wang, Xiuhong Li, Yulong Bai, Songlin Li, Mengge Lu, and Zhenhong Jia. Assisted refinement network based on channel information interaction for camouflaged object de- tection. InInt. Conf. on Multimedia Retrieval, pages 2058– 2062, 2025. 4

  38. [38]

    Efficient camouflaged object detection network based on channel reconstruction and hy- brid attention

    Kuan Wang, Xiuhong Li, Songlin Li, Yulong Bai, Boyuan Li, Mengge Lu, and Zhenhong Jia. Efficient camouflaged object detection network based on channel reconstruction and hy- brid attention. InInt. Conf. on Multimedia Retrieval, pages 2063–2067, 2025. 4

  39. [39]

    Assisted refinement network based on channel information interaction for camouflaged and salient object detection.arXiv preprint arXiv:2512.11369, 2025

    Kuan Wang, Yanjun Qin, Mengge Lu, Liejun Wang, and Xi- aoming Tao. Assisted refinement network based on channel information interaction for camouflaged and salient object detection.arXiv preprint arXiv:2512.11369, 2025. 4

  40. [40]

    Pvt v2: Improved baselines with pyramid vision transformer

    Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8(3):415–424, 2022. 4

  41. [41]

    A survey of camouflaged object detection and beyond.arXiv preprint arXiv:2408.14562, 2024

    Fengyang Xiao, Sujie Hu, Yuqi Shen, Chengyu Fang, Jinfa Huang, Chunming He, Longxiang Tang, Ziyun Yang, and Xiu Li. A survey of camouflaged object detection and be- yond.arXiv preprint arXiv:2408.14562, 2024. 1

  42. [42]

    Jinnan Yan, Trung-Nghia Le, Khanh-Duy Nguyen, Minh- Triet Tran, Thanh-Toan Do, and Tam V . Nguyen. Mirror- net: Bio-inspired camouflaged object segmentation.IEEE Access, 9:43290–43300, 2021. 4

  43. [43]

    Plantcamo: Plant camouflage detection,

    Jinyu Yang, Qingwei Wang, Feng Zheng, Peng Chen, Ale ˇs Leonardis, and Deng-Ping Fan. Plantcamo: Plant camou- flage detection.arXiv preprint arXiv:2410.17598, 2024. 2, 4, 6

  44. [44]

    An effective cnn and transformer fusion network for camouflaged object detection.Computer Vision and Image Understanding, page 104431, 2025

    Dongdong Zhang, Chunping Wang, Huiying Wang, Qiang Fu, and Zhaorui Li. An effective cnn and transformer fusion network for camouflaged object detection.Computer Vision and Image Understanding, page 104431, 2025. 2, 4, 6

  45. [45]

    A survey on deep learning-based camouflaged object detec- tion.Multimedia Systems, 30(5):268, 2024

    Junmin Zhong, Anzhi Wang, Chunhong Ren, and Jintao Wu. A survey on deep learning-based camouflaged object detec- tion.Multimedia Systems, 30(5):268, 2024. 1