SAM3-Assisted Training of Lightweight YOLO Models for Precision Pig Farming
Pith reviewed 2026-06-29 23:02 UTC · model grok-4.3
The pith
SAM 3 zero-shot labels train a YOLOv8m to 79.4% mAP on pig images with no human annotation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Treating SAM 3 as an offline auto-annotator produces YOLOv8 models whose mean average precision reaches 79.4 percent on the PigLife pig dataset without any human labels, while inference latency drops by roughly two hundred times relative to the teacher model; in low-occlusion subsets the automated pipeline matches human-annotated detection rates above 99 percent AP50.
What carries the argument
SAM 3 zero-shot pseudo-label generation used as training supervision for YOLOv8 detectors
If this is right
- Lightweight YOLO models become deployable on edge devices for continuous pig monitoring without manual annotation labor.
- Low-occlusion detection performance reaches levels previously requiring human labels, enabling selective use of the automated pipeline.
- The knowledge-distillation step removes the computational cost of running the foundation model at inference time.
- The pipeline scales to new images or farms by simply running SAM 3 offline rather than hiring annotators.
Where Pith is reading between the lines
- The same auto-annotation route could be tested on other livestock species or farm objects where occlusion patterns differ.
- Combining the pipeline with a small amount of human verification on difficult cases might close any remaining gap to fully supervised performance.
- If SAM 3 pseudo-label quality varies by camera angle or lighting, farm-specific fine-tuning of the labeler could become necessary.
Load-bearing premise
Zero-shot pseudo-labels from SAM 3 on the PigLife dataset are accurate enough that models trained on them lose little accuracy compared with models trained on human bounding boxes.
What would settle it
A side-by-side experiment in which the same YOLO architecture trained on SAM 3 labels shows more than a 10-point mAP gap versus the identical architecture trained on human labels across the full PigLife test set.
Figures
read the original abstract
Deep learning-based object detection has revolutionized Precision Livestock Farming (PLF), yet a critical barrier remains: high-performance Foundation Models (such as SAM 3) are too computationally intensive for edge deployment, while lightweight models (like YOLO) require prohibitive manual annotation efforts. This work proposes a fully automated knowledge distillation pipeline that leverages the Segment Anything Model 3 (SAM 3) to generate zero-shot pseudo-labels for training efficient YOLOv8 detectors. By treating SAM 3 as an offline auto-annotator, we eliminate the manual labeling bottleneck, producing models capable of real-time inference on resource-constrained hardware. We systematically evaluate this approach on the PigLife dataset, comparing SAM 3-supervised models against human-annotated baselines. Results demonstrate that a SAM 3-trained YOLOv8m achieves a mean Average Precision (mAP) of 79.4% without human intervention, while reducing inference latency by approximately 200$\times$ compared to the teacher model. Furthermore, stratified analysis reveals that in low-occlusion scenarios, the automated pipeline achieves detection rates comparable to human benchmarks ($AP_{50} > 99\%$). These findings indicate that foundation models can serve as effective, zero-annotation-cost supervisors, enabling scalable edge computing solutions for smart agriculture.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a knowledge distillation pipeline that uses SAM 3 as an offline auto-annotator to generate zero-shot pseudo-labels for training lightweight YOLOv8 detectors on the PigLife dataset. It reports that a SAM 3-trained YOLOv8m model reaches 79.4% mAP without human intervention, achieves AP50 >99% in low-occlusion cases, and reduces inference latency by ~200× relative to the teacher model.
Significance. If the pseudo-label quality claim holds, the approach would remove the manual annotation bottleneck for precision livestock farming, enabling scalable training and edge deployment of efficient detectors.
major comments (2)
- [Abstract] Abstract: the central performance claim (79.4% mAP for SAM 3-trained YOLOv8m) is presented without a numeric mAP value for the human-annotated baseline, so the assertion of 'no substantial performance loss' cannot be evaluated.
- [Abstract] Abstract: no IoU, precision-recall, or agreement statistics between SAM 3 zero-shot pseudo-labels and human ground truth on PigLife are supplied, leaving the quality of the training signal unquantified despite its load-bearing role for the distillation result.
minor comments (1)
- [Abstract] Abstract: dataset statistics, error bars, and ablation details are absent from the summary of results, reducing interpretability of the reported figures.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract. We address each point below and will revise the manuscript to improve clarity and completeness.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claim (79.4% mAP for SAM 3-trained YOLOv8m) is presented without a numeric mAP value for the human-annotated baseline, so the assertion of 'no substantial performance loss' cannot be evaluated.
Authors: We agree that the abstract should include the baseline mAP to allow readers to evaluate the claim directly. The manuscript body reports the human-annotated YOLOv8m baseline performance. In the revised version, we will update the abstract to explicitly include this numeric baseline mAP value alongside the 79.4% result. revision: yes
-
Referee: [Abstract] Abstract: no IoU, precision-recall, or agreement statistics between SAM 3 zero-shot pseudo-labels and human ground truth on PigLife are supplied, leaving the quality of the training signal unquantified despite its load-bearing role for the distillation result.
Authors: The current manuscript evaluates the pipeline via end-to-end detector performance on the PigLife test set rather than intermediate pseudo-label agreement. We acknowledge that direct quantification of SAM 3 pseudo-label quality would be informative. We will add these statistics (IoU, precision-recall, and agreement metrics) computed on a held-out subset in the revised manuscript. revision: yes
Circularity Check
No significant circularity; empirical pipeline with external human baseline
full rationale
The paper reports an empirical knowledge-distillation pipeline in which SAM 3 supplies zero-shot pseudo-labels that are used to train YOLOv8 detectors; performance is then measured against a separately human-annotated test set on the PigLife dataset. No equations, fitted parameters renamed as predictions, self-citation load-bearing uniqueness theorems, or ansatz smuggling appear in the supplied text. The headline mAP of 79.4 % and the stratified AP50 > 99 % figures are presented as measured outcomes of training, not quantities forced by the definition of the training signal itself. The derivation chain is therefore self-contained against the external human benchmark and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
D. B. M. Yousefi, A. S. M. Rafie, S. A. R. Al-Haddad, and S. Azrad, “A Systematic Literature Review on the Use of Deep Learning in Precision Livestock Detection and Localization Using Unmanned Aerial Vehicles,”IEEE Access, vol. 10, pp. 80071–80091, 2022, doi: 10.1109/ACCESS.2022.3194507
-
[2]
General introduction to precision livestock farm- ing,
D. Berckmans, “General introduction to precision livestock farm- ing,”Animal Frontiers, vol. 7, no. 1, pp. 6–11, Jan. 2017, doi: 10.2527/af.2017.0102
-
[3]
Norton, C
T. Norton, C. Chen, M.L.V . Larsen, D. Berckmans, Review: Precision livestock farming: building ‘digital representations’ to bring the animals closer to the farmer, Animal, V olume 13, Issue 12, 2019, Pages 3009- 3017, ISSN 1751-7311, https://doi.org/10.1017 S175173111900199X
2019
-
[4]
SAM 3: Segment Anything with Concepts
N. Carion, L. Gustafson, Y . T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, and J. Lei, “SAM 3: Segment anything with concepts,” arXiv preprint arXiv:2511.16719, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
SAM 2: Segment Anything in Images and Videos
N. Ravi, V . Gabeur, Y . T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson, and E. Mintun, “SAM 2: Segment anything in images and videos,” arXiv preprint arXiv:2408.00714, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[6]
EmbeddedPigDet—Fast and Accurate Pig Detection for Embedded Board Implementations,
J. Seo, H. Ahn, D. Kim, S. Lee, Y . Chung, and D. Park, “EmbeddedPigDet—Fast and Accurate Pig Detection for Embedded Board Implementations,”Applied Sciences, vol. 10, no. 8, Art. no. 2878, 2020, doi: 10.3390/app10082878
-
[7]
R. Gong, H. Zhang, G. Li, and J. He, “Edge Computing-Enabled Smart Agriculture: Technical Architectures, Practical Evolution, and Bottleneck Breakthroughs,”Sensors, vol. 25, no. 17, Art. no. 5302, Aug. 2025, doi: 10.3390/s25175302
-
[8]
You only look once: Unified, real-time object detection,
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779–788
2016
-
[9]
Enhanced YOLOv8 for Robust Pig Detection and Counting in Complex Agricultural Environments
Li, J.; Ma, W.; Wei, Y .; Wang, T. Enhanced YOLOv8 for Robust Pig Detection and Counting in Complex Agricultural Environments. Animals 2025, 15, 2149. https://doi.org/10.3390/ani15142149
-
[10]
Object detection using YOLO: challenges, architectural successors, datasets and applications,
T. Diwan, G. Anirudh, and J. V . Tembhurne, “Object detection using YOLO: challenges, architectural successors, datasets and applications,” Multimedia Tools and Applications, vol. 82, pp. 9243–9275, 2023, doi: 10.1007/s11042-022-13644-y
-
[11]
Barriers to com- puter vision applications in pig production facilities,
Jiangong Li, Angela R. Green-Miller, Xiaodan Hu, Ana Lucic, M. R. Mahesh Mohan, Ryan N. Dilger, Isabella C.F.S. Condotta, Brian Aldridge, John M. Hart, Narendra Ahuja, “Barriers to com- puter vision applications in pig production facilities,”Computers and Electronics in Agriculture, vol. 200, Art. no. 107227, 2022, doi: 10.1016/j.compag.2022.107227
-
[12]
Comparative Evaluation of YOLO-family Detectors for Pig Detection in Precision Livestock Systems,
M. V . Mendes Faria, T. Meireles Paix ˜ao, and F. de Assis Boldt, “Comparative Evaluation of YOLO-family Detectors for Pig Detection in Precision Livestock Systems,” RITA, vol. 33, no. 2, pp. 98–105, Mar. 2026, https://doi.org/10.22456/2175-2745.150940
-
[13]
J. Lee, H. Chae, S. Son, J. Seo, Y . Suh, J. Lee, Y . Chung, and D. Park, “Sustainable Self-Training Pig Detection System with Augmented Single Labeled Target Data for Solving Domain Shift Problem,”Sensors, vol. 25, no. 11, Art. no. 3406, 2025, doi: 10.3390/s25113406
-
[14]
M. Wutke, C. Lensches, U. Hartmann, and I. Traulsen, “Towards auto- matic farrowing monitoring—A Noisy Student approach for improving detection performance of newborn piglets,”PLOS ONE, vol. 19, no. 10, e0310818, 2024, doi: 10.1371/journal.pone.0310818
-
[15]
In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G
A. M. Rickmannet al., “Using Foundation Models as Pseudo-label Generators for Pre-clinical 4D Cardiac CT Segmentation,” inProc. Int. Conf. Functional Imaging and Modeling of the Heart (FIMH), vol. 15673, Cham: Springer, 2025, pp. 253–265, doi: 10.1007/978-3-031- 94562-5 23
-
[16]
C. Zeng, Y . Jiang, and A. Zhang, “EfficientSAM3: Progressive Hierar- chical Distillation for Video Concept Segmentation from SAM1, 2, and 3,” arXiv preprint arXiv:2511.15833, 2025
-
[17]
Jiangong Li, Xiaodan Hu, Ana Lucic, Yiqi Wu, Isabella C.F.S. Condotta, Ryan N. Dilger, Narendra Ahuja, and Angela R. Green-Miller, “Promote computer vision applications in pig farming scenarios: high-quality dataset, fundamental models, and comparable performance”Journal of Integrative Agriculture, 2024, doi: 10.1016/j.jia.2024.08.014
-
[18]
Aljami, H. M., Alrowais, N. A., AlAwajy, A. M., Alhrgan, S. O., Aldwaani, R. A., Alsawadi, M. S., Saqib, N. U., Alam, S. S., & Alsubaie, R. (2026). Benchmarking YOLOv8 Variants for Object Detection Effi- ciency on Jetson Orin NX for Edge Computing Applications. Computers, 15(2), 74. https://doi.org/10.3390/computers15020074
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.