pith. sign in

arxiv: 2605.25860 · v1 · pith:CAHGWZ32new · submitted 2026-05-25 · 💻 cs.CV

SAM3-Assisted Training of Lightweight YOLO Models for Precision Pig Farming

Pith reviewed 2026-06-29 23:02 UTC · model grok-4.3

classification 💻 cs.CV
keywords SAM 3YOLOv8pig detectionprecision livestock farmingzero-shot annotationknowledge distillationedge deployment
0
0 comments X

The pith

SAM 3 zero-shot labels train a YOLOv8m to 79.4% mAP on pig images with no human annotation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a foundation model can serve as an automatic labeler to produce training data for lightweight object detectors in livestock monitoring. By using SAM 3 to generate pseudo-labels on the PigLife dataset and then training YOLOv8 variants on those labels, the resulting models reach performance close to human-annotated baselines in low-occlusion scenes while running two hundred times faster at inference. This removes the need for manual bounding-box work that currently limits scaling of computer vision in precision farming. The approach is tested by comparing SAM 3-supervised models against human-supervised ones on the same images, with additional breakdown by occlusion level. If the pseudo-label quality holds, it opens a route to deploy real-time detectors on ordinary farm hardware without annotation costs.

Core claim

Treating SAM 3 as an offline auto-annotator produces YOLOv8 models whose mean average precision reaches 79.4 percent on the PigLife pig dataset without any human labels, while inference latency drops by roughly two hundred times relative to the teacher model; in low-occlusion subsets the automated pipeline matches human-annotated detection rates above 99 percent AP50.

What carries the argument

SAM 3 zero-shot pseudo-label generation used as training supervision for YOLOv8 detectors

If this is right

  • Lightweight YOLO models become deployable on edge devices for continuous pig monitoring without manual annotation labor.
  • Low-occlusion detection performance reaches levels previously requiring human labels, enabling selective use of the automated pipeline.
  • The knowledge-distillation step removes the computational cost of running the foundation model at inference time.
  • The pipeline scales to new images or farms by simply running SAM 3 offline rather than hiring annotators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same auto-annotation route could be tested on other livestock species or farm objects where occlusion patterns differ.
  • Combining the pipeline with a small amount of human verification on difficult cases might close any remaining gap to fully supervised performance.
  • If SAM 3 pseudo-label quality varies by camera angle or lighting, farm-specific fine-tuning of the labeler could become necessary.

Load-bearing premise

Zero-shot pseudo-labels from SAM 3 on the PigLife dataset are accurate enough that models trained on them lose little accuracy compared with models trained on human bounding boxes.

What would settle it

A side-by-side experiment in which the same YOLO architecture trained on SAM 3 labels shows more than a 10-point mAP gap versus the identical architecture trained on human labels across the full PigLife test set.

Figures

Figures reproduced from arXiv: 2605.25860 by Francisco de Assis Boldt, Isabella C.F.S. Condotta, Marcos Vinicius Mendes Faria, Thiago Borges Pereira, Thiago Meireles Paix\~ao.

Figure 1
Figure 1. Figure 1: Overview of the SAM-3-assisted pig detection pipeline illustrated in two stages: an offline self-training phase, where the foundation model generates [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visual breakdown of the eight test scenarios (Groups 1–8). [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Captured from a top-down camera view, the image displays three [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Deep learning-based object detection has revolutionized Precision Livestock Farming (PLF), yet a critical barrier remains: high-performance Foundation Models (such as SAM 3) are too computationally intensive for edge deployment, while lightweight models (like YOLO) require prohibitive manual annotation efforts. This work proposes a fully automated knowledge distillation pipeline that leverages the Segment Anything Model 3 (SAM 3) to generate zero-shot pseudo-labels for training efficient YOLOv8 detectors. By treating SAM 3 as an offline auto-annotator, we eliminate the manual labeling bottleneck, producing models capable of real-time inference on resource-constrained hardware. We systematically evaluate this approach on the PigLife dataset, comparing SAM 3-supervised models against human-annotated baselines. Results demonstrate that a SAM 3-trained YOLOv8m achieves a mean Average Precision (mAP) of 79.4% without human intervention, while reducing inference latency by approximately 200$\times$ compared to the teacher model. Furthermore, stratified analysis reveals that in low-occlusion scenarios, the automated pipeline achieves detection rates comparable to human benchmarks ($AP_{50} > 99\%$). These findings indicate that foundation models can serve as effective, zero-annotation-cost supervisors, enabling scalable edge computing solutions for smart agriculture.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a knowledge distillation pipeline that uses SAM 3 as an offline auto-annotator to generate zero-shot pseudo-labels for training lightweight YOLOv8 detectors on the PigLife dataset. It reports that a SAM 3-trained YOLOv8m model reaches 79.4% mAP without human intervention, achieves AP50 >99% in low-occlusion cases, and reduces inference latency by ~200× relative to the teacher model.

Significance. If the pseudo-label quality claim holds, the approach would remove the manual annotation bottleneck for precision livestock farming, enabling scalable training and edge deployment of efficient detectors.

major comments (2)
  1. [Abstract] Abstract: the central performance claim (79.4% mAP for SAM 3-trained YOLOv8m) is presented without a numeric mAP value for the human-annotated baseline, so the assertion of 'no substantial performance loss' cannot be evaluated.
  2. [Abstract] Abstract: no IoU, precision-recall, or agreement statistics between SAM 3 zero-shot pseudo-labels and human ground truth on PigLife are supplied, leaving the quality of the training signal unquantified despite its load-bearing role for the distillation result.
minor comments (1)
  1. [Abstract] Abstract: dataset statistics, error bars, and ablation details are absent from the summary of results, reducing interpretability of the reported figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claim (79.4% mAP for SAM 3-trained YOLOv8m) is presented without a numeric mAP value for the human-annotated baseline, so the assertion of 'no substantial performance loss' cannot be evaluated.

    Authors: We agree that the abstract should include the baseline mAP to allow readers to evaluate the claim directly. The manuscript body reports the human-annotated YOLOv8m baseline performance. In the revised version, we will update the abstract to explicitly include this numeric baseline mAP value alongside the 79.4% result. revision: yes

  2. Referee: [Abstract] Abstract: no IoU, precision-recall, or agreement statistics between SAM 3 zero-shot pseudo-labels and human ground truth on PigLife are supplied, leaving the quality of the training signal unquantified despite its load-bearing role for the distillation result.

    Authors: The current manuscript evaluates the pipeline via end-to-end detector performance on the PigLife test set rather than intermediate pseudo-label agreement. We acknowledge that direct quantification of SAM 3 pseudo-label quality would be informative. We will add these statistics (IoU, precision-recall, and agreement metrics) computed on a held-out subset in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline with external human baseline

full rationale

The paper reports an empirical knowledge-distillation pipeline in which SAM 3 supplies zero-shot pseudo-labels that are used to train YOLOv8 detectors; performance is then measured against a separately human-annotated test set on the PigLife dataset. No equations, fitted parameters renamed as predictions, self-citation load-bearing uniqueness theorems, or ansatz smuggling appear in the supplied text. The headline mAP of 79.4 % and the stratified AP50 > 99 % figures are presented as measured outcomes of training, not quantities forced by the definition of the training signal itself. The derivation chain is therefore self-contained against the external human benchmark and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5784 in / 1076 out tokens · 39513 ms · 2026-06-29T23:02:15.180976+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 16 canonical work pages · 2 internal anchors

  1. [1]

    A Systematic Literature Review on the Use of Deep Learning in Precision Livestock Detection and Localization Using Unmanned Aerial Vehicles,

    D. B. M. Yousefi, A. S. M. Rafie, S. A. R. Al-Haddad, and S. Azrad, “A Systematic Literature Review on the Use of Deep Learning in Precision Livestock Detection and Localization Using Unmanned Aerial Vehicles,”IEEE Access, vol. 10, pp. 80071–80091, 2022, doi: 10.1109/ACCESS.2022.3194507

  2. [2]

    General introduction to precision livestock farm- ing,

    D. Berckmans, “General introduction to precision livestock farm- ing,”Animal Frontiers, vol. 7, no. 1, pp. 6–11, Jan. 2017, doi: 10.2527/af.2017.0102

  3. [3]

    Norton, C

    T. Norton, C. Chen, M.L.V . Larsen, D. Berckmans, Review: Precision livestock farming: building ‘digital representations’ to bring the animals closer to the farmer, Animal, V olume 13, Issue 12, 2019, Pages 3009- 3017, ISSN 1751-7311, https://doi.org/10.1017 S175173111900199X

  4. [4]

    SAM 3: Segment Anything with Concepts

    N. Carion, L. Gustafson, Y . T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, and J. Lei, “SAM 3: Segment anything with concepts,” arXiv preprint arXiv:2511.16719, 2025

  5. [5]

    SAM 2: Segment Anything in Images and Videos

    N. Ravi, V . Gabeur, Y . T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafson, and E. Mintun, “SAM 2: Segment anything in images and videos,” arXiv preprint arXiv:2408.00714, 2024

  6. [6]

    EmbeddedPigDet—Fast and Accurate Pig Detection for Embedded Board Implementations,

    J. Seo, H. Ahn, D. Kim, S. Lee, Y . Chung, and D. Park, “EmbeddedPigDet—Fast and Accurate Pig Detection for Embedded Board Implementations,”Applied Sciences, vol. 10, no. 8, Art. no. 2878, 2020, doi: 10.3390/app10082878

  7. [7]

    Edge Computing-Enabled Smart Agriculture: Technical Architectures, Practical Evolution, and Bottleneck Breakthroughs,

    R. Gong, H. Zhang, G. Li, and J. He, “Edge Computing-Enabled Smart Agriculture: Technical Architectures, Practical Evolution, and Bottleneck Breakthroughs,”Sensors, vol. 25, no. 17, Art. no. 5302, Aug. 2025, doi: 10.3390/s25175302

  8. [8]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779–788

  9. [9]

    Enhanced YOLOv8 for Robust Pig Detection and Counting in Complex Agricultural Environments

    Li, J.; Ma, W.; Wei, Y .; Wang, T. Enhanced YOLOv8 for Robust Pig Detection and Counting in Complex Agricultural Environments. Animals 2025, 15, 2149. https://doi.org/10.3390/ani15142149

  10. [10]

    Object detection using YOLO: challenges, architectural successors, datasets and applications,

    T. Diwan, G. Anirudh, and J. V . Tembhurne, “Object detection using YOLO: challenges, architectural successors, datasets and applications,” Multimedia Tools and Applications, vol. 82, pp. 9243–9275, 2023, doi: 10.1007/s11042-022-13644-y

  11. [11]

    Barriers to com- puter vision applications in pig production facilities,

    Jiangong Li, Angela R. Green-Miller, Xiaodan Hu, Ana Lucic, M. R. Mahesh Mohan, Ryan N. Dilger, Isabella C.F.S. Condotta, Brian Aldridge, John M. Hart, Narendra Ahuja, “Barriers to com- puter vision applications in pig production facilities,”Computers and Electronics in Agriculture, vol. 200, Art. no. 107227, 2022, doi: 10.1016/j.compag.2022.107227

  12. [12]

    Comparative Evaluation of YOLO-family Detectors for Pig Detection in Precision Livestock Systems,

    M. V . Mendes Faria, T. Meireles Paix ˜ao, and F. de Assis Boldt, “Comparative Evaluation of YOLO-family Detectors for Pig Detection in Precision Livestock Systems,” RITA, vol. 33, no. 2, pp. 98–105, Mar. 2026, https://doi.org/10.22456/2175-2745.150940

  13. [13]

    Sustainable Self-Training Pig Detection System with Augmented Single Labeled Target Data for Solving Domain Shift Problem,

    J. Lee, H. Chae, S. Son, J. Seo, Y . Suh, J. Lee, Y . Chung, and D. Park, “Sustainable Self-Training Pig Detection System with Augmented Single Labeled Target Data for Solving Domain Shift Problem,”Sensors, vol. 25, no. 11, Art. no. 3406, 2025, doi: 10.3390/s25113406

  14. [14]

    Towards auto- matic farrowing monitoring—A Noisy Student approach for improving detection performance of newborn piglets,

    M. Wutke, C. Lensches, U. Hartmann, and I. Traulsen, “Towards auto- matic farrowing monitoring—A Noisy Student approach for improving detection performance of newborn piglets,”PLOS ONE, vol. 19, no. 10, e0310818, 2024, doi: 10.1371/journal.pone.0310818

  15. [15]

    In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

    A. M. Rickmannet al., “Using Foundation Models as Pseudo-label Generators for Pre-clinical 4D Cardiac CT Segmentation,” inProc. Int. Conf. Functional Imaging and Modeling of the Heart (FIMH), vol. 15673, Cham: Springer, 2025, pp. 253–265, doi: 10.1007/978-3-031- 94562-5 23

  16. [16]

    EfficientSAM3: Progressive Hierar- chical Distillation for Video Concept Segmentation from SAM1, 2, and 3,

    C. Zeng, Y . Jiang, and A. Zhang, “EfficientSAM3: Progressive Hierar- chical Distillation for Video Concept Segmentation from SAM1, 2, and 3,” arXiv preprint arXiv:2511.15833, 2025

  17. [17]

    Promote computer vision applications in pig farming scenarios: high-quality dataset, fundamental models, and comparable performance

    Jiangong Li, Xiaodan Hu, Ana Lucic, Yiqi Wu, Isabella C.F.S. Condotta, Ryan N. Dilger, Narendra Ahuja, and Angela R. Green-Miller, “Promote computer vision applications in pig farming scenarios: high-quality dataset, fundamental models, and comparable performance”Journal of Integrative Agriculture, 2024, doi: 10.1016/j.jia.2024.08.014

  18. [18]

    M., Alrowais, N

    Aljami, H. M., Alrowais, N. A., AlAwajy, A. M., Alhrgan, S. O., Aldwaani, R. A., Alsawadi, M. S., Saqib, N. U., Alam, S. S., & Alsubaie, R. (2026). Benchmarking YOLOv8 Variants for Object Detection Effi- ciency on Jetson Orin NX for Edge Computing Applications. Computers, 15(2), 74. https://doi.org/10.3390/computers15020074