SWNet: A Cross-Spectral Network for Camouflaged Weed Detection
Pith reviewed 2026-05-10 08:04 UTC · model grok-4.3
The pith
A cross-spectral network fuses visible and near-infrared images to detect weeds that blend with crops in dense fields.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SWNet is a bimodal end-to-end cross-spectral network that uses a Pyramid Vision Transformer v2 backbone to capture long-range dependencies, a Bimodal Gated Fusion Module to dynamically integrate Visible and Near-Infrared information, and an Edge-Aware Refinement module to produce sharper object boundaries. By leveraging the physiological differences in chlorophyll reflectance captured in the NIR spectrum, the proposed architecture effectively discriminates targets that are otherwise indistinguishable in the visible range and outperforms ten state-of-the-art methods on the Weeds-Banana dataset.
What carries the argument
The Bimodal Gated Fusion Module that dynamically integrates Visible and Near-Infrared information to exploit chlorophyll reflectance differences for discriminating camouflaged weeds.
Load-bearing premise
Physiological differences in chlorophyll reflectance captured in the NIR spectrum are sufficient to discriminate targets indistinguishable in the visible range under dense agricultural canopy conditions.
What would settle it
If a visible-spectrum-only network matches or exceeds SWNet accuracy on the Weeds-Banana dataset, the claimed necessity of NIR integration would not hold.
Figures
read the original abstract
This paper presents SWNet, a bimodal end-to-end cross-spectral network specifically engineered for the detection of camouflaged weeds in dense agricultural environments. Plant camouflage, characterized by homochromatic blending where invasive species mimic the phenotypic traits of primary crops, poses a significant challenge for traditional computer vision systems. To overcome these limitations, SWNet utilizes a Pyramid Vision Transformer v2 backbone to capture long-range dependencies and a Bimodal Gated Fusion Module to dynamically integrate Visible and Near-Infrared information. By leveraging the physiological differences in chlorophyll reflectance captured in the NIR spectrum, the proposed architecture effectively discriminates targets that are otherwise indistinguishable in the visible range. Furthermore, an Edge-Aware Refinement module is employed to produce sharper object boundaries and reduce structural ambiguity. Experimental results on the Weeds-Banana dataset indicate that SWNet outperforms ten state-of-the-art methods. The study demonstrates that the integration of cross-spectral data and boundary-guided refinement is essential for high segmentation accuracy in complex crop canopies. The code is available on GitHub: https://cod-espol.github.io/SWNet/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SWNet, a bimodal end-to-end cross-spectral network for camouflaged weed detection in dense agricultural environments. It uses a Pyramid Vision Transformer v2 (PVTv2) backbone to capture long-range dependencies, a Bimodal Gated Fusion Module to integrate visible and near-infrared (NIR) channels by exploiting chlorophyll reflectance differences, and an Edge-Aware Refinement module for sharper boundaries. The central claim is that SWNet outperforms ten state-of-the-art methods on the Weeds-Banana dataset.
Significance. If the empirical claims hold with proper validation, the work targets a practical challenge in precision agriculture where homochromatic camouflage hinders robotic weed detection. The cross-spectral fusion strategy and public code release are strengths that could support follow-on research in multimodal agricultural vision. However, the absence of supporting metrics and analyses limits immediate impact assessment.
major comments (3)
- [Abstract] Abstract: The assertion that 'Experimental results on the Weeds-Banana dataset indicate that SWNet outperforms ten state-of-the-art methods' is made without any reported quantitative metrics (e.g., mIoU, F1, or pixel accuracy), comparison tables, ablation results, or statistical tests, preventing evaluation of the central empirical claim.
- [Introduction and Experiments] Motivation and Experiments: No per-class VIS vs. NIR separability statistics, qualitative examples showing camouflage failure in VIS but success in NIR, or ablation isolating the NIR channel and Bimodal Gated Fusion Module contribution are provided; this leaves open whether gains derive from cross-spectral discrimination or from the PVTv2 backbone and edge refinement alone, undermining the core physiological motivation.
- [Experiments] Experiments: No dataset statistics (image count, train/test split, class balance in Weeds-Banana), training hyperparameters, or implementation details are reported, rendering the outperformance claim non-reproducible and unevaluable.
minor comments (2)
- [Abstract] The GitHub link contains a possible typo ('cod-espol' instead of a standard organization name); verify and correct for accessibility.
- [Method] Notation for the Bimodal Gated Fusion Module and Edge-Aware Refinement could be formalized with equations or pseudocode to improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments that identify key areas where additional detail will strengthen the presentation of our results and motivation. We have revised the manuscript accordingly to address each point.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that 'Experimental results on the Weeds-Banana dataset indicate that SWNet outperforms ten state-of-the-art methods' is made without any reported quantitative metrics (e.g., mIoU, F1, or pixel accuracy), comparison tables, ablation results, or statistical tests, preventing evaluation of the central empirical claim.
Authors: We agree that the abstract should include quantitative support. In the revised manuscript we have added the key metrics (mIoU and F1-score) to the abstract and expanded the experiments section with full comparison tables, ablation results, and statistical tests so that the outperformance claim can be directly evaluated. revision: yes
-
Referee: [Introduction and Experiments] Motivation and Experiments: No per-class VIS vs. NIR separability statistics, qualitative examples showing camouflage failure in VIS but success in NIR, or ablation isolating the NIR channel and Bimodal Gated Fusion Module contribution are provided; this leaves open whether gains derive from cross-spectral discrimination or from the PVTv2 backbone and edge refinement alone, undermining the core physiological motivation.
Authors: We acknowledge the value of explicit evidence for the physiological motivation. The revision now contains per-class VIS/NIR separability statistics, qualitative examples that illustrate camouflage failure in the visible spectrum and its resolution in NIR, and an ablation study that isolates the contribution of the NIR channel together with the Bimodal Gated Fusion Module. These additions demonstrate that the performance gains are attributable to cross-spectral fusion rather than the backbone or edge refinement alone. revision: yes
-
Referee: [Experiments] Experiments: No dataset statistics (image count, train/test split, class balance in Weeds-Banana), training hyperparameters, or implementation details are reported, rendering the outperformance claim non-reproducible and unevaluable.
Authors: We thank the referee for highlighting the reproducibility gap. The revised manuscript includes a new subsection with complete Weeds-Banana dataset statistics (image counts, train/test splits, class balance), the full set of training hyperparameters, and implementation details (optimizer, learning-rate schedule, hardware, and code repository). revision: yes
Circularity Check
No significant circularity in empirical architecture proposal and evaluation
full rationale
The manuscript proposes the SWNet architecture (PVTv2 backbone + bimodal gated fusion + edge refinement) and reports empirical outperformance on the Weeds-Banana dataset against ten external baselines. No derivation chain, equations, or first-principles predictions exist that reduce to fitted inputs by construction. The physiological NIR reflectance motivation is stated as domain knowledge rather than derived within the paper, and no self-citation load-bearing steps or self-definitional loops are present. The central claim remains an independent empirical comparison on held-out data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Geng Chen, Si-Jie Liu, Yu-Jia Sun, Ge-Peng Ji, Ya-Feng Wu, and Tao Zhou. Camouflaged object detection via context- aware cross-level fusion.IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6981–6993, 2022. 3, 6
work page 2022
-
[2]
Tianyou Chen, Jin Xiao, Xiaoguang Hu, Guofeng Zhang, and Shaojie Wang. Boundary-guided network for camou- flaged object detection.Knowledge-based systems, 248: 108901, 2022. 3, 6, 7
work page 2022
-
[3]
A tutorial on the cross-entropy method.Annals of operations research, 134:19–67, 2005
Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. A tutorial on the cross-entropy method.Annals of operations research, 134:19–67, 2005. 4
work page 2005
-
[4]
Camouflaged object de- tection
Deng-Ping Fan, Ge-Peng Ji, Guolei Sun, Ming-Ming Cheng, Jianbing Shen, and Ling Shao. Camouflaged object de- tection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2777–2787,
-
[5]
Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, and Ling Shao. Concealed object detection.IEEE transactions on pat- tern analysis and machine intelligence, 44(10):6024–6042,
-
[6]
Xijian Fan, Chunlei Ge, Xubing Yang, and Weice Wang. Cross-modal feature fusion for field weed mapping using rgb and near-infrared imagery.Agriculture, 14(12):2331, 2024. 1
work page 2024
-
[7]
Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. Res2net: A new multi-scale backbone architecture.IEEE transactions on pattern analysis and machine intelligence, 43(2):652–662,
-
[8]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InConf. on Computer Vision and Pattern Recognition, pages 770–778,
-
[9]
High-resolution it- erative feedback network for camouflaged object detection
Xiaobin Hu, Shuo Wang, Xuebin Qin, Hang Dai, Wenqi Ren, Donghao Luo, Ying Tai, and Ling Shao. High-resolution it- erative feedback network for camouflaged object detection. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 881–889, 2023. 3, 6, 7
work page 2023
-
[10]
Ge-Peng Ji, Deng-Ping Fan, Yu-Cheng Chou, Dengxin Dai, Alexander Liniger, and Luc Van Gool. Deep gradient learn- ing for efficient camouflaged object detection.Machine In- telligence Research, 20(1):92–108, 2023. 3, 6
work page 2023
-
[11]
Magnet: A camouflaged object detec- tion network simulating the observation effect of a magnifier
Xinhao Jiang, Wei Cai, Zhili Zhang, Bo Jiang, Zhiyong Yang, and Xin Wang. Magnet: A camouflaged object detec- tion network simulating the observation effect of a magnifier. Entropy, 24(12):1804, 2022. 2
work page 2022
-
[12]
Scale-aware modulation meet transformer
Weifeng Lin, Ziheng Wu, Jiayu Chen, Jun Huang, and Lian- wen Jin. Scale-aware modulation meet transformer. InInt. Conf. on Computer Vision, pages 6015–6026, 2023. 3
work page 2023
-
[13]
Modeling aleatoric uncertainty for camouflaged object detection
Jiawei Liu, Jing Zhang, and Nick Barnes. Modeling aleatoric uncertainty for camouflaged object detection. InProceed- ings of the IEEE/CVF winter conference on applications of computer vision, pages 1445–1454, 2022. 3, 6
work page 2022
-
[14]
Deep- roadmapper: Extracting road topology from aerial images
Gell ´ert M ´attyus, Wenjie Luo, and Raquel Urtasun. Deep- roadmapper: Extracting road topology from aerial images. InInt. Conf. on Computer Vision, pages 3438–3446, 2017. 4
work page 2017
-
[15]
Weed detection using deep learning: A systematic literature review.Sensors, 23(7): 3670, 2023
Nafeesa Yousuf Murad, Tariq Mahmood, Abdur Rahim Mo- hammad Forkan, Ahsan Morshed, Prem Prakash Jayaraman, and Muhammad Shoaib Siddiqui. Weed detection using deep learning: A systematic literature review.Sensors, 23(7): 3670, 2023. 2
work page 2023
-
[16]
Yang Niu, Hang Sun, and Martin Stevens. Plant camouflage: ecology, evolution, and implications.Trends in ecology & evolution, 33(8):608–618, 2018. 1
work page 2018
-
[17]
Zoom in and out: A mixed-scale triplet network for camouflaged object detection
Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, and Huchuan Lu. Zoom in and out: A mixed-scale triplet network for camouflaged object detection. InConf. on Com- puter Vision and Pattern Recognition, pages 2160–2170,
-
[18]
Inkyu Sa, Jong Yoon Lim, Ho Seok Ahn, and Bruce Mac- Donald. deepnir: Datasets for generating synthetic nir im- ages and improved fruit detection system using deep learning techniques.Sensors, 22(13):4721, 2022. 1, 2
work page 2022
-
[19]
Edge-aware mirror network for camouflaged object detection
Dongyue Sun, Shiyao Jiang, and Lin Qi. Edge-aware mirror network for camouflaged object detection. In2023 IEEE International Conference on Multimedia and Expo (ICME), pages 2465–2470. IEEE, 2023. 3, 6
work page 2023
-
[20]
Context-aware cross-level fusion network for camouflaged object detection
Yujia Sun, Geng Chen, Tao Zhou, Yi Zhang, and Nian Liu. Context-aware cross-level fusion network for camouflaged object detection. InIJCAI, pages 1025–1031, 2021. 4
work page 2021
-
[21]
Efficientnet: Rethinking model scaling for convolutional neural networks
Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning, pages 6105–6114. PMLR,
-
[22]
Henry O Velesaca, Andrea Mero, Hector Villegas, and An- gel D Sappa. Unveiling the hidden: Early detection of in- vasive vegetation in crops with uav multispectral imaging. Smart Agricultural Technology, page 101875, 2026. 5, 6, 8
work page 2026
-
[23]
Huiying Wang, Chunping Wang, Qiang Fu, Dongdong Zhang, Renke Kou, Ying Yu, and Jian Song. Cross-modal oriented object detection of uav aerial images based on im- age feature.IEEE Transactions on Geoscience and Remote Sensing, 62:1–21, 2024. 2
work page 2024
-
[24]
Kuan Wang, Xiuhong Li, Yulong Bai, Songlin Li, Mengge Lu, and Zhenhong Jia. Assisted refinement network based on channel information interaction for camouflaged object de- tection. InInt. Conf. on Multimedia Retrieval, pages 2058– 2062, 2025. 3, 6, 7
work page 2058
-
[25]
Kuan Wang, Xiuhong Li, Songlin Li, Yulong Bai, Boyuan Li, Mengge Lu, and Zhenhong Jia. Efficient camouflaged object detection network based on channel reconstruction and hy- brid attention. InInt. Conf. on Multimedia Retrieval, pages 2063–2067, 2025. 3, 6, 7
work page 2063
-
[26]
Kuan Wang, Yanjun Qin, Mengge Lu, Liejun Wang, and Xi- aoming Tao. Assisted refinement network based on channel information interaction for camouflaged and salient object detection.arXiv preprint arXiv:2512.11369, 2025. 3, 6, 7
-
[27]
Pvt v2: Improved baselines with pyramid vision transformer
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8(3):415–424, 2022. 2, 3, 4
work page 2022
-
[28]
F3net: fusion, feedback and focus for salient object detection
Jun Wei, Shuhui Wang, and Qingming Huang. F3net: fusion, feedback and focus for salient object detection. InProceed- ings of the AAAI conference on artificial intelligence, pages 12321–12328, 2020. 4
work page 2020
-
[29]
Color indices for weed identi- fication under various soil, residue, and lighting conditions
David M Woebbecke, George E Meyer, Kenneth V on Bar- gen, and David A Mortensen. Color indices for weed identi- fication under various soil, residue, and lighting conditions. Transactions of the ASAE, 38(1):259–269, 1995. 2
work page 1995
-
[30]
Plantcamo: Plant camouflage detection,
Jinyu Yang, Qingwei Wang, Feng Zheng, Peng Chen, Ale ˇs Leonardis, and Deng-Ping Fan. Plantcamo: Plant camou- flage detection.arXiv preprint arXiv:2410.17598, 2024. 1
-
[31]
Detecting camouflaged object in fre- quency domain
Yijie Zhong, Bo Li, Lv Tang, Senyun Kuang, Shuang Wu, and Shouhong Ding. Detecting camouflaged object in fre- quency domain. InConf. on Computer Vision and Pattern Recognition, pages 4504–4513, 2022. 2
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.