USU-Corn-WeedDB: A UAV RGB Image Dataset for Multi-Species Weed Detection in Forage Corn

Aaron Etienne; Eric Westra; Rhonda Miller; Saroj Burlakoti; Sierra Young; Utsav Bhandari

arxiv: 2606.06709 · v1 · pith:72S3SLZYnew · submitted 2026-06-04 · 💻 cs.CV

USU-Corn-WeedDB: A UAV RGB Image Dataset for Multi-Species Weed Detection in Forage Corn

Utsav Bhandari , Saroj Burlakoti , Rhonda Miller , Sierra Young , Eric Westra , Aaron Etienne This is my paper

Pith reviewed 2026-06-28 01:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords weed detectionUAV imageryobject detectioncorn fieldsdatasetbounding box annotationsemi-supervised learning

0 comments

The pith

USU-Corn-WeedDB supplies 8800 UAV image patches with 10539 bounding boxes of three weed species for corn field detection models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents USU-Corn-WeedDB as a publicly released collection of drone-captured RGB images from a commercial forage corn field to fill the gap in representative training data for automated weed identification. The dataset tiles 366 images into 8800 patches at 640 by 640 pixels, with 800 of them manually labeled for common lambsquarters, redroot pigweed, and green foxtail while keeping the remaining 8000 unlabeled for semi-supervised experiments. Twenty-eight object detectors spanning YOLO and RT-DETR families trained on the labeled portion reach test mAP@0.5 values between 0.773 and 0.840, and lightweight variants perform competitively. A sympathetic reader would care because weed pressure can cut corn yields by as much as 31.5 percent and site-specific management depends on having field-realistic data to train systems that reduce blanket herbicide use.

Core claim

The paper establishes that USU-Corn-WeedDB, collected at approximately 0.48 cm per pixel from 10 m altitude, contains sufficient annotated instances (10539 bounding boxes across three species with intentional class imbalance) to train detection models that achieve mAP@0.5 scores from 0.773 to 0.840 under fixed training conditions. The 800 labeled patches plus 8000 unlabeled tiles are shown to support both fully supervised and semi-supervised pipelines, with the results obtained from identical runs of 28 models drawn from five architecture families without hyperparameter search.

What carries the argument

The USU-Corn-WeedDB dataset of tiled UAV RGB patches, 800 of which carry manual bounding-box annotations for three weed species together with an 8000-patch unlabeled pool.

If this is right

Multi-class detectors can be developed for the three named weed species in forage corn using the provided labels.
Lightweight models among the tested families are viable candidates for onboard UAV inference.
The unlabeled pool of 8000 tiles can be used directly in semi-supervised training loops.
The preserved natural class imbalance allows models to be trained under realistic frequency conditions rather than artificially balanced ones.
The dataset enables repeated benchmarking of new detection architectures on the same fixed train-test division.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Replicating the exact flight altitude, date, and annotation protocol on other commercial corn fields would test whether the reported mAP range generalizes beyond the single Utah site.
Pairing the RGB patches with additional spectral bands could be examined to determine whether the current three-class performance improves without changing the annotation effort.
The 640 by 640 patch size and 0.48 cm ground sampling distance may constrain detection of very small seedlings or widely spaced weeds, suggesting a controlled experiment that varies patch resolution on the same source imagery.

Load-bearing premise

The 800 manually annotated patches carry accurate labels that faithfully represent the range of appearances and densities encountered in the actual field.

What would settle it

Training the same 28 models on the released split and then evaluating them on a newly collected and independently annotated set of images from the identical field location under comparable lighting would show whether mAP remains above 0.75.

read the original abstract

Weed pressure in forage corn production causes yield losses of up to 31.5%, yet site-specific weed management (SSWM) systems built on UAV imagery and deep learning remain constrained by the scarcity of field-representative training datasets. We present USU-Corn-WeedDB, a publicly available UAV RGB image dataset collected from a commercial forage corn field in Cache Valley, Utah, designed to support multi-class weed detection under both supervised and semi-supervised learning frameworks. RGB imagery was acquired on 27 June 2025 using an Autel EVO II Dual 640T V2 drone at ~10m above ground level, yielding a ground sampling distance of approximately 0.48 cm/pixel. A total of 366 full-resolution images were tiled into 8,800 patches at 640 x 640-pixel resolution. Of these, 800 images were manually annotated for three weed species; common lambsquarters (Chenopodium album), redroot pigweed (Amaranthus retroflexus), and green foxtail (Setaria viridis) comprising 10,539 bounding-box instances, with the remaining 8,000 tiles retained as an unlabeled pool for semi-supervised experiments. This dataset reflects a natural class imbalance where redroot pigweed constitutes 53.86% of annotated instances, which was preserved intentionally to mirror real field conditions. To validate dataset utility, we trained 28 object detection models spanning five architecture families including YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO26, and RT-DETR under identical conditions without hyperparameter tuning. Test set mAP@0.5 ranged from 0.773 to 0.840, with lightweight models achieving competitive performance relevant to edge-deployed UAV systems. USU-Corn-WeedDB is publicly available at https://doi.org/10.5281/zenodo.20044178.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward dataset paper that releases usable UAV weed imagery with baselines, but the missing annotation details are a real gap.

read the letter

The one thing to know is that the authors have put out a new public UAV dataset for three weed species in forage corn, collected at high resolution from a real commercial field, along with mAP numbers from 28 standard detectors. The data is available on Zenodo and they kept the natural imbalance instead of balancing it artificially.

What the paper does well is the practical setup: 0.48 cm/pixel GSD from 10 m altitude, tiling into 640x640 patches, 800 annotated images with over 10k boxes, and an 8000-image unlabeled pool for semi-supervised work. Running YOLOv8 through YOLO11, YOLO26, and RT-DETR under fixed conditions without tuning gives a clear baseline picture, and the lightweight models performing competitively is useful for edge UAV deployment. The collection parameters and class counts are described clearly enough to replicate the acquisition side.

The soft spot is exactly what the stress test flags. The abstract says the 800 patches were "manually annotated" but gives no protocol, no annotator count, no inter-annotator agreement, and no split criteria. Without that, the 0.773–0.840 mAP range is hard to interpret—if the labels have systematic errors or the test set isn't independent, the numbers lose value. Everything else (public release, imbalance preservation, no hyperparameter search) is secondary to this.

This is for people working on precision agriculture and agricultural computer vision who need field-representative weed data. It is the kind of incremental but concrete resource that gets used. The work shows clear thinking on the application side and honest engagement with the need for real imbalance, so it deserves peer review. I'd send it out, but the referees will need to press on the labeling process.

Referee Report

2 major / 1 minor

Summary. The paper presents USU-Corn-WeedDB, a publicly released UAV RGB dataset collected from a commercial forage corn field in Utah. It describes acquisition of 366 full-resolution images at ~0.48 cm/pixel GSD, tiling into 8,800 640x640 patches, manual annotation of 800 patches for three weed species (common lambsquarters, redroot pigweed, green foxtail) yielding 10,539 bounding boxes with preserved class imbalance (redroot pigweed at 53.86%), retention of 8,000 unlabeled patches for semi-supervised use, and validation via training 28 object detectors across five architecture families (YOLOv8–YOLO26, RT-DETR) under fixed conditions without hyperparameter tuning, reporting test mAP@0.5 of 0.773–0.840.

Significance. If the provided labels prove accurate and the splits representative, the dataset would meaningfully address the scarcity of field-representative UAV data for site-specific weed management. Strengths include the public Zenodo release, intentional retention of natural class imbalance, evaluation across 28 models from multiple families, and demonstration that lightweight detectors achieve competitive performance relevant to edge UAV deployment. These elements support reproducibility and practical utility.

major comments (2)

[Abstract] Abstract: The statement that the 800 patches were 'manually annotated' for 10,539 bounding-box instances provides no annotation protocol, number of annotators, guidelines, quality-assurance steps, or inter-annotator agreement metric. This information is required to establish the reliability of the labels that underpin the reported mAP values and the claim that the dataset supports multi-class detection.
[Abstract] Abstract: No details are given on how the 800 annotated patches were partitioned into training and test sets (or on any stratification to preserve class imbalance or field variability). Without this, it is impossible to verify that the test mAP@0.5 range of 0.773–0.840 reflects performance on an independent, representative hold-out set.

minor comments (1)

[Abstract] The collection date '27 June 2025' appears in the abstract; confirm whether this is the intended date or a typographical error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The comments highlight important aspects of dataset documentation that will improve the manuscript's clarity and utility. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The statement that the 800 patches were 'manually annotated' for 10,539 bounding-box instances provides no annotation protocol, number of annotators, guidelines, quality-assurance steps, or inter-annotator agreement metric. This information is required to establish the reliability of the labels that underpin the reported mAP values and the claim that the dataset supports multi-class detection.

Authors: We agree that annotation details are necessary to substantiate label reliability. The full manuscript will be revised to include a new subsection in the Methods describing the annotation protocol (including tools used and class definitions), the number of annotators, annotation guidelines, quality-assurance steps such as review by a domain expert, and any inter-annotator agreement metrics. These additions will directly support the reliability of the reported mAP values. revision: yes
Referee: [Abstract] Abstract: No details are given on how the 800 annotated patches were partitioned into training and test sets (or on any stratification to preserve class imbalance or field variability). Without this, it is impossible to verify that the test mAP@0.5 range of 0.773–0.840 reflects performance on an independent, representative hold-out set.

Authors: We acknowledge that explicit partitioning details are required for reproducibility. The revised manuscript will add a clear description of the train/test split procedure for the 800 annotated patches, including the split ratio, the method used to ensure independence, and any stratification applied to preserve the observed class imbalance (redroot pigweed at 53.86%) and field variability across the original 366 images. revision: yes

Circularity Check

0 steps flagged

No circularity; direct empirical dataset release and standard model benchmarking.

full rationale

The paper presents a new UAV image dataset with manual annotations and evaluates 28 standard object-detection models using conventional mAP@0.5 metrics under fixed training conditions. No equations, fitted parameters, predictions derived from those parameters, uniqueness theorems, or self-citations appear in the provided text. All reported numbers are direct outputs of external training runs on the released data; nothing reduces to a self-referential definition or fit. This is the normal non-circular case for a dataset paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a dataset release paper with no mathematical modeling or parameter fitting. It rests on standard domain assumptions in remote sensing and computer vision.

axioms (1)

domain assumption Manual bounding-box annotations accurately capture weed instances in the RGB images
Invoked implicitly when claiming the 10,539 instances support model training; no inter-annotator metrics provided.

pith-pipeline@v0.9.1-grok · 5915 in / 1277 out tokens · 37092 ms · 2026-06-28T01:33:58.928086+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 9 canonical work pages

[1]

Background Weed pressure remains one of the most significant challenges in crop production, with annual yield losses up to 31.5% [1]. Site-specific weed management (SSWM) systems built on unmanned aerial vehicle (UA V) imagery and deep learning-based detection offer a pathway to weed management with efficient use of herbicides [2]. Yet despite meaningful ...
[3]

Technical validation To validate the utility of USU-Corn-WeedDB for supervised weed detection, a benchmark evaluation was conducted using 28 single-stage object detection models spanning five architecture families: YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO26, and RT-DETR. All models were trained on the designated training split and evaluated on both the valid...
[4]

Data availability The full dataset is publicly available in: https://doi.org/10.5281/zenodo.20044178

work page doi:10.5281/zenodo.20044178
[5]

All images were collected from a single field in Cache Valley, Utah, on a single acquisition day at a fixed altitude of ~13 m above ground level under clear midday conditions

Limitations Several limitations should be considered when utilizing this dataset. All images were collected from a single field in Cache Valley, Utah, on a single acquisition day at a fixed altitude of ~13 m above ground level under clear midday conditions. This means the dataset captures neither geographic nor temporal variability across crop and weed gr...
[6]

Kubiak, A

A. Kubiak, A. Wolna-Maruwka, A. Niewiadomska, A.A. Pilarska, The Problem of Weed Infestation of Agricultural Plantations vs. the Assumptions of the European Biodiversity Strategy, Agronomy 12 (2022). https://doi.org/10.3390/agronomy12081808

work page doi:10.3390/agronomy12081808 2022
[7]

Etienne, A

A. Etienne, A. Ahmad, V . Aggarwal, D. Saraswat, Deep Learning-Based Object Detection System for Identifying Weeds Using UAS Imagery, Remote Sensing 13 (2021). https://doi.org/10.3390/rs13245182

work page doi:10.3390/rs13245182 2021
[8]

C. Wang, B. Balasubramaniyam, A. Sangem, N. Guevara, D. Caragea, Practical Insights into Semi-Supervised Object Detection Approaches, (2026). https://doi.org/10.48550/arXiv.2601.13380

work page doi:10.48550/arxiv.2601.13380 2026
[9]

Upadhyay, S.G

A. Upadhyay, S.G. C, M.V . Mahecha, J. Mettler, K. Howatt, W. Aderholdt, M. Ostlie, X. Sun, Weed-crop dataset in precision agriculture: Resource for AI-based robotic weed control systems, Data in Brief 60 (2025) 111486. https://doi.org/10.1016/j.dib.2025.111486

work page doi:10.1016/j.dib.2025.111486 2025
[10]

M. Xu, S. Yoon, A. Fuentes, D.S. Park, A Comprehensive Survey of Image Augmentation Techniques for Deep Learning, Pattern Recognition 137 (2023) 109347. https://doi.org/10.1016/j.patcog.2023.109347

work page doi:10.1016/j.patcog.2023.109347 2023
[11]

Bhandari, A

U. Bhandari, A. Etienne, Precision weed detection using UA Vs and deep learning: Models, paradigms, and challenges, Smart Agricultural Technology 13 (2026) 101656. https://doi.org/10.1016/j.atech.2025.101656

work page doi:10.1016/j.atech.2025.101656 2026
[12]

T. Liu, X. Jin, L. Zhang, J. Wang, Y . Chenc, C. Hu, J. Yu, Semi-supervised learning and attention mechanism for weed detection in wheat, Crop Protection 174 (2023) 106389. https://doi.org/10.1016/j.cropro.2023.106389

work page doi:10.1016/j.cropro.2023.106389 2023
[13]

Konstantakos, J

S. Konstantakos, J. Cani, I. Mademlis, D.I. Chalkiadaki, Y .M. Asano, E. Gavves, G.Th. Papadopoulos, Self-supervised visual learning in the low-data regime: A comparative evaluation, Neurocomputing 620 (2025) 129199. https://doi.org/10.1016/j.neucom.2024.129199

work page doi:10.1016/j.neucom.2024.129199 2025
[14]

USDA, Web Soil Survey [dataset], 18, 2025

N.R.C.S. USDA, Web Soil Survey [dataset], 18, 2025. https://websoilsurvey.nrcs.usda.gov/app/WebSoilSurvey.aspx

2025
[15]

https://www.autelrobotics.com/productdetail/evo-ii-dual-640t-drones (accessed May 11, 2026)

Autel Robotics, EVO II Dual 640T V3, Autel Robotics, Bothell, WA, 2025. https://www.autelrobotics.com/productdetail/evo-ii-dual-640t-drones (accessed May 11, 2026)

2025
[16]

Charette, Darknet/YOLO, (2026)

S. Charette, Darknet/YOLO, (2026). https://www.ccoderun.ca/programming/darknet_faq/#negative_samples (accessed April 16, 2026)

2026
[17]

Bhandari, S

U. Bhandari, S. Burlakoti, R. Miller, S. Young, E. Westra, A. Etienne, Multi-species weed detection using supervised (CNN and transformer-based) and semi-supervised learning approaches in corn field, ASABE Paper No. 2600368, 2026 ASABE Annual International Meeting, Indianapolis, Indiana, USA, July 12–15, 2026. CRediT Author Statement Utsav Bhandari: Conce...

2026

[1] [1]

Background Weed pressure remains one of the most significant challenges in crop production, with annual yield losses up to 31.5% [1]. Site-specific weed management (SSWM) systems built on unmanned aerial vehicle (UA V) imagery and deep learning-based detection offer a pathway to weed management with efficient use of herbicides [2]. Yet despite meaningful ...

[2] [3]

Technical validation To validate the utility of USU-Corn-WeedDB for supervised weed detection, a benchmark evaluation was conducted using 28 single-stage object detection models spanning five architecture families: YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO26, and RT-DETR. All models were trained on the designated training split and evaluated on both the valid...

[3] [4]

Data availability The full dataset is publicly available in: https://doi.org/10.5281/zenodo.20044178

work page doi:10.5281/zenodo.20044178

[4] [5]

All images were collected from a single field in Cache Valley, Utah, on a single acquisition day at a fixed altitude of ~13 m above ground level under clear midday conditions

Limitations Several limitations should be considered when utilizing this dataset. All images were collected from a single field in Cache Valley, Utah, on a single acquisition day at a fixed altitude of ~13 m above ground level under clear midday conditions. This means the dataset captures neither geographic nor temporal variability across crop and weed gr...

[5] [6]

Kubiak, A

A. Kubiak, A. Wolna-Maruwka, A. Niewiadomska, A.A. Pilarska, The Problem of Weed Infestation of Agricultural Plantations vs. the Assumptions of the European Biodiversity Strategy, Agronomy 12 (2022). https://doi.org/10.3390/agronomy12081808

work page doi:10.3390/agronomy12081808 2022

[6] [7]

Etienne, A

A. Etienne, A. Ahmad, V . Aggarwal, D. Saraswat, Deep Learning-Based Object Detection System for Identifying Weeds Using UAS Imagery, Remote Sensing 13 (2021). https://doi.org/10.3390/rs13245182

work page doi:10.3390/rs13245182 2021

[7] [8]

C. Wang, B. Balasubramaniyam, A. Sangem, N. Guevara, D. Caragea, Practical Insights into Semi-Supervised Object Detection Approaches, (2026). https://doi.org/10.48550/arXiv.2601.13380

work page doi:10.48550/arxiv.2601.13380 2026

[8] [9]

Upadhyay, S.G

A. Upadhyay, S.G. C, M.V . Mahecha, J. Mettler, K. Howatt, W. Aderholdt, M. Ostlie, X. Sun, Weed-crop dataset in precision agriculture: Resource for AI-based robotic weed control systems, Data in Brief 60 (2025) 111486. https://doi.org/10.1016/j.dib.2025.111486

work page doi:10.1016/j.dib.2025.111486 2025

[9] [10]

M. Xu, S. Yoon, A. Fuentes, D.S. Park, A Comprehensive Survey of Image Augmentation Techniques for Deep Learning, Pattern Recognition 137 (2023) 109347. https://doi.org/10.1016/j.patcog.2023.109347

work page doi:10.1016/j.patcog.2023.109347 2023

[10] [11]

Bhandari, A

U. Bhandari, A. Etienne, Precision weed detection using UA Vs and deep learning: Models, paradigms, and challenges, Smart Agricultural Technology 13 (2026) 101656. https://doi.org/10.1016/j.atech.2025.101656

work page doi:10.1016/j.atech.2025.101656 2026

[11] [12]

T. Liu, X. Jin, L. Zhang, J. Wang, Y . Chenc, C. Hu, J. Yu, Semi-supervised learning and attention mechanism for weed detection in wheat, Crop Protection 174 (2023) 106389. https://doi.org/10.1016/j.cropro.2023.106389

work page doi:10.1016/j.cropro.2023.106389 2023

[12] [13]

Konstantakos, J

S. Konstantakos, J. Cani, I. Mademlis, D.I. Chalkiadaki, Y .M. Asano, E. Gavves, G.Th. Papadopoulos, Self-supervised visual learning in the low-data regime: A comparative evaluation, Neurocomputing 620 (2025) 129199. https://doi.org/10.1016/j.neucom.2024.129199

work page doi:10.1016/j.neucom.2024.129199 2025

[13] [14]

USDA, Web Soil Survey [dataset], 18, 2025

N.R.C.S. USDA, Web Soil Survey [dataset], 18, 2025. https://websoilsurvey.nrcs.usda.gov/app/WebSoilSurvey.aspx

2025

[14] [15]

https://www.autelrobotics.com/productdetail/evo-ii-dual-640t-drones (accessed May 11, 2026)

Autel Robotics, EVO II Dual 640T V3, Autel Robotics, Bothell, WA, 2025. https://www.autelrobotics.com/productdetail/evo-ii-dual-640t-drones (accessed May 11, 2026)

2025

[15] [16]

Charette, Darknet/YOLO, (2026)

S. Charette, Darknet/YOLO, (2026). https://www.ccoderun.ca/programming/darknet_faq/#negative_samples (accessed April 16, 2026)

2026

[16] [17]

Bhandari, S

U. Bhandari, S. Burlakoti, R. Miller, S. Young, E. Westra, A. Etienne, Multi-species weed detection using supervised (CNN and transformer-based) and semi-supervised learning approaches in corn field, ASABE Paper No. 2600368, 2026 ASABE Annual International Meeting, Indianapolis, Indiana, USA, July 12–15, 2026. CRediT Author Statement Utsav Bhandari: Conce...

2026