CANSURF: An ASV-View Can Dataset and Benchmark for Detection and Tracking of Surface-Level Debris
Pith reviewed 2026-05-19 21:49 UTC · model grok-4.3
The pith
A dataset tailored to aluminum cans on water surfaces improves object detection accuracy twelve times over generic training sets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors create and release CANSURF, consisting of annotated ASV-view images of aluminum cans on water, and show that training YOLOv11 on it boosts performance 12x compared to generic datasets. They find that YOLOv11 combined with ByteTrack gives the most stable tracks, while YOLOv11 with SAHI is better for detecting the maximum number of cans in single-can pickup scenarios. The dataset addresses the lack of prior open data for this specific marine debris detection from surface level.
What carries the argument
The CANSURF dataset of surface-level can images from ASV viewpoint with bounding box annotations and multiple augmentation types, used to train and evaluate detection and tracking pipelines.
If this is right
- YOLOv11 models achieve higher accuracy in detecting cans when trained on CANSURF rather than generic image collections.
- Using ByteTrack with YOLOv11 results in fewer identity switches during tracking of multiple cans.
- SAHI integration with YOLOv11 increases recall for distant cans but may reduce precision in closer views.
- Single-can pickup operations benefit more from the SAHI-enhanced detector for maximizing detections.
Where Pith is reading between the lines
- This dataset could be extended to include other types of floating debris for broader cleanup applications.
- Real-world ASV deployments could test these models to validate performance beyond augmented data.
- Integration with robotic grasping systems might enable end-to-end autonomous debris collection using these detection methods.
Load-bearing premise
The collected raw images and the ten augmentation types produce a training distribution that is sufficiently representative of real ASV operating conditions including glare, ripples, and partial submersion.
What would settle it
A field test where an ASV equipped with a camera records new videos of cans in water under varying conditions, and a model trained only on CANSURF shows no significant improvement in detection rate or tracking stability compared to one trained on generic datasets would falsify the claim.
Figures
read the original abstract
Surface-level marine debris remains a practical bottleneck for autonomous clean-up, where small, reflective targets (e.g., aluminum cans) must be detected at distance under glare, ripples, and partial submersion. This paper presents, an ASV vision system and a new surface-can dataset. The dataset comprises ~7.3k raw images extracted from videos and annotated with bounding boxes, expanded via ten augmentation types to ~57k training/validation images spanning diverse lighting and water states. A family of detector and detector-tracker pipelines tailored to surface operations were benchmarked. Training YOLOv11 on CANSURF boosts performance 12x over generic datasets, highlighting the dataset's value. Experiments show that YOLOv11+ByteTrack yields the most stable tracks (fewer identity switches) and stronger multi-object accuracy under, while YOLOv11+SAHI increases recall on far-field cans at the cost of lower precision in full-context inputs. Given the mission profile, single-can pickup with approach and grab, YOLOv11 + SAHI proves better for detecting the maximum number of cans. No prior open dataset targets aluminum cans on water from a surface-level viewpoint; this dataset fills this gap and supports reproducible evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the CANSURF dataset of ~7.3k raw ASV-view images of surface-level aluminum cans, expanded via ten augmentation types to ~57k images, and benchmarks YOLOv11-based detection and tracking pipelines (including combinations with ByteTrack and SAHI). It claims that training YOLOv11 on CANSURF yields a 12x performance boost over generic datasets, with YOLOv11+ByteTrack providing stable tracks and YOLOv11+SAHI improving far-field recall, filling a gap as the first open dataset for this specific viewpoint and target.
Significance. If the reported gains are reproducible and the dataset distribution matches real ASV conditions, this work provides a practical resource for marine debris detection in autonomous clean-up missions. The contribution lies in domain-specific data collection and straightforward benchmarking rather than novel algorithms; the absence of prior open datasets for aluminum cans on water from surface level makes the release potentially useful for reproducible evaluation in robotics and CV applications.
major comments (2)
- [Abstract] Abstract: The central claim that 'Training YOLOv11 on CANSURF boosts performance 12x over generic datasets' is presented without any supporting quantitative metrics (e.g., mAP, precision, recall values), baseline numbers from the generic datasets, error bars, or statistical tests. This absence leaves the headline result unverified and load-bearing for the paper's assertion of the dataset's value.
- [Experiments] Experiments / Benchmark section: No distribution statistics, failure-case analysis, or external real-world test set is provided to substantiate that the ~7.3k raw frames plus the ten augmentation types produce a training distribution representative of actual ASV conditions (glare, ripples, partial submersion). Without this, the measured gains risk being an in-distribution artifact rather than evidence of practical utility.
minor comments (2)
- [Abstract] The abstract mentions 'stronger multi-object accuracy under' but the sentence appears truncated; clarify the exact condition or metric being referenced.
- [Dataset] Annotation quality and inter-annotator agreement are not discussed; adding a brief description of the annotation protocol would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'Training YOLOv11 on CANSURF boosts performance 12x over generic datasets' is presented without any supporting quantitative metrics (e.g., mAP, precision, recall values), baseline numbers from the generic datasets, error bars, or statistical tests. This absence leaves the headline result unverified and load-bearing for the paper's assertion of the dataset's value.
Authors: We agree that the abstract would benefit from explicit quantitative support for the reported performance gain. In the revised manuscript, we will update the abstract to include the specific metrics underlying the 12x claim, such as the mAP@0.5 and mAP@0.5:0.95 values for YOLOv11 trained on CANSURF versus the generic baselines, along with the corresponding precision and recall figures. These numbers are already detailed in the experiments section and will now be referenced directly in the abstract for immediate verifiability. revision: yes
-
Referee: [Experiments] Experiments / Benchmark section: No distribution statistics, failure-case analysis, or external real-world test set is provided to substantiate that the ~7.3k raw frames plus the ten augmentation types produce a training distribution representative of actual ASV conditions (glare, ripples, partial submersion). Without this, the measured gains risk being an in-distribution artifact rather than evidence of practical utility.
Authors: We acknowledge the value of additional evidence for dataset representativeness. In the revision, we will add distribution statistics for the raw frames (e.g., histograms and breakdowns across lighting conditions, ripple intensity, and submersion levels) and a new failure-case analysis subsection that examines detection errors under challenging ASV conditions and how the augmentations mitigate them. Our current test split is drawn from temporally held-out ASV video sequences to approximate real deployment; we will explicitly discuss this as a limitation and note that a fully independent external test set collected on different platforms or dates is not available in the present work. revision: partial
Circularity Check
No circularity: empirical dataset collection and external benchmarking
full rationale
The paper introduces a new surface-can dataset from ~7.3k raw ASV video frames plus ten augmentations, then benchmarks standard off-the-shelf detectors (YOLOv11, ByteTrack, SAHI) and reports measured performance gains against generic external datasets. No equations, derivations, fitted parameters, or self-citation chains appear in the abstract or described content. All load-bearing claims rest on new data collection and reproducible evaluation against independent baselines rather than any reduction to prior fitted inputs or author-specific uniqueness theorems. This is a standard empirical contribution whose central result (12x boost) is externally falsifiable and does not reduce by construction to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Marine debris handling guide- lines,
NOAA Marine Debris Program, “Marine debris handling guide- lines,” https://marinedebris.noaa.gov/marine-debris-handling-guidelines, apr 2020, accessed 17 Aug 2025
work page 2020
-
[2]
2020 international coastal cleanup: By the numbers,
Ocean Conservancy, “2020 international coastal cleanup: By the numbers,” https://oceanconservancy.org/wp-content/uploads/2021/09/ ByTheNumbers.pdf, 2020, lists beverage cans among top ten collected items; Accessed 17 Aug 2025
work page 2020
-
[3]
Marida: A benchmark for marine debris detection from sentinel-2 remote sensing data,
K. Kikaki, I. Kakogeorgiou, P. Mikeli, D. E. Raitsos, and K. Karantzalos, “Marida: A benchmark for marine debris detection from sentinel-2 remote sensing data,”PLOS ONE, vol. 17, no. 1, p. e0262247, 2022
work page 2022
-
[4]
Trash-icra19: A bounding box labeled dataset of underwater trash,
M. S. Fulton, J. Hong, and J. Sattar, “Trash-icra19: A bounding box labeled dataset of underwater trash,” Data Repository for the University of Minnesota (DRUM), 2020, underwater debris dataset; Accessed 17 Aug 2025
work page 2020
-
[5]
Trashcan 1.0: An instance- segmentation labeled dataset of trash observations,
J. Hong, M. S. Fulton, and J. Sattar, “Trashcan 1.0: An instance- segmentation labeled dataset of trash observations,” Data Repository for the University of Minnesota (DRUM), 2020, underwater instance- segmentation dataset; Accessed 17 Aug 2025
work page 2020
-
[6]
Slicing aided hyper infer- ence and fine-tuning for small object detection,
F. C. Akyon, S. O. Altinuc, and A. Temizel, “Slicing aided hyper infer- ence and fine-tuning for small object detection,”arXiv, no. 2202.06934, 2022
-
[7]
Multi-scale object detection model for au- tonomous ship navigation in maritime environment,
Z. Shao, H. Lyu, Y . Yin, T. Cheng, X. Gao, W. Zhang, Q. Jing, Y . Zhao, and L. Zhang, “Multi-scale object detection model for au- tonomous ship navigation in maritime environment,” https://www.mdpi. com/2077-1312/10/11/1783, 2022
work page 2077
-
[8]
Potato: A dataset for analyzing polarimetric traces of afloat trash objects,
L. F. W. Batista, S. Khazem, M. Adibi, S. Hutchinson, and C. Pradalier, “Potato: A dataset for analyzing polarimetric traces of afloat trash objects,” https://arxiv.org/abs/2409.12659, 2024
-
[9]
H. Lee, S. Byeon, J. H. Kim, J.-K. Shin, and Y . Park, “Construction of a real-time detection for floating plastics in a stream using video cameras and deep learning,” https://www.mdpi.com/1424-8220/25/7/2225, 2025
work page 2025
-
[10]
Bytetrack: Multi-object tracking by associating every detection box,
Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” 2022
work page 2022
-
[11]
label, “o 20bg 2 dataset,” https://universe.roboflow.com/label-mz0kf/o 20bg 2, apr 2024, visited on 2025-08-24
work page 2024
-
[12]
Class, “Canettes dataset,” https://universe.roboflow.com/class-iqy5c/ canettes-wjjyb, nov 2022, visited on 2025-08-24
work page 2022
-
[13]
YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review
P. Hidayatullah, N. Syakrani, M. R. Sholahuddin, T. Gelar, and R. Tuba- gus, “Yolov8 to yolo11: A comprehensive architecture in-depth compar- ative review,” https://arxiv.org/abs/2501.13400, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
A survey of zero-shot object detection,
W. Cao, X. Yao, Z. Xu, Y . Liu, Y . Pan, and Z. Ming, “A survey of zero-shot object detection,” https://www.sciopen.com/article/10.26599/ BDMA.2024.9020098, pp. 726–750, 2025
-
[15]
Yolo-world: Real-time open-vocabulary object detection,
T. Cheng, L. Song, Y . Ge, W. Liu, X. Wang, and Y . Shan, “Yolo-world: Real-time open-vocabulary object detection,” https://arxiv.org/abs/2401. 17270, 2024
work page 2024
-
[16]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Su, J. Zhu, and L. Zhang, “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” https://arxiv. org/abs/2303.05499, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
(2025) Vast.ai: Gpu rental marketplace and cloud compute service
Vast.ai. (2025) Vast.ai: Gpu rental marketplace and cloud compute service. https://vast.ai
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.