pith. sign in

arxiv: 2507.18558 · v1 · submitted 2025-07-24 · 💻 cs.CV · eess.IV

Synthetic Data Augmentation for Enhanced Chicken Carcass Instance Segmentation

Pith reviewed 2026-05-19 02:39 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords synthetic datainstance segmentationchicken carcasspoultry processingdata augmentationdeep learningbenchmark datasetphoto-realistic images
0
0 comments X

The pith

Mixing synthetic images with limited real data significantly improves instance segmentation of chicken carcasses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a pipeline that generates photo-realistic synthetic images of chicken carcasses with automatic labels and pairs it with a new benchmark of 300 real annotated processing-line images. It tests whether adding varying amounts of these synthetic images to small real datasets can raise the accuracy of instance segmentation models. A reader would care because collecting and labeling large real datasets from fast-moving poultry lines is costly and slow. The work shows that the mixed training sets produce better results on real test images than real data alone across multiple models. This points to a practical way to build reliable automated detection for quality control without exhaustive manual annotation.

Core claim

A pipeline for creating photo-realistic, automatically labeled synthetic images of chicken carcasses, together with a curated benchmark of 300 real annotated images, enables synthetic data augmentation to measurably raise instance segmentation performance when real annotated data from the processing line is scarce.

What carries the argument

The synthetic data generation pipeline that produces photo-realistic images with automatic labels for direct mixing with limited real data.

If this is right

  • Instance segmentation accuracy on real processing-line images rises when synthetic images are added to small real training sets.
  • The amount of manual annotation required for poultry datasets can be reduced while still reaching usable model performance.
  • Automated carcass detection systems become feasible even when only limited real labeled data is available.
  • The performance lift appears across several prominent instance segmentation architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same synthetic-generation approach could be adapted for other food-processing or industrial inspection tasks that face similar data-scarcity problems.
  • Testing the mixed datasets under varied lighting, camera angles, or line speeds in actual plants would check whether the gains hold in live deployment.
  • The pipeline might also support related tasks such as object detection or classification within the same poultry domain.

Load-bearing premise

The synthetic images must be close enough in visual appearance and statistical distribution to real processing-line photos that adding them improves generalization instead of introducing harmful domain shift.

What would settle it

Models trained on mixed synthetic-plus-real sets are evaluated on a held-out real test set and show no gain or a drop in standard segmentation metrics such as mean average precision compared with training on real data alone.

Figures

Figures reproduced from arXiv: 2507.18558 by Chaitanya Pallerla, Dongyi Wang, Philip Crandall, Pouya Sohrabipour Sr, Wan Shou, Xiaomin Lin, Yihong Feng, Yu She.

Figure 1
Figure 1. Figure 1: Chicken carcasses on a processing line in a poultry slaughter plant. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The real chicken data acquisition system [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: An overview for Synthetic data generation and model training. The process begins with a high fidelity 3D chicken carcass model, which is used [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of bounding box and instance segmentation results obtained with the best-performing training ratios for each model (Mask-RCNN [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

The poultry industry has been driven by broiler chicken production and has grown into the world's largest animal protein sector. Automated detection of chicken carcasses on processing lines is vital for quality control, food safety, and operational efficiency in slaughterhouses and poultry processing plants. However, developing robust deep learning models for tasks like instance segmentation in these fast-paced industrial environments is often hampered by the need for laborious acquisition and annotation of large-scale real-world image datasets. We present the first pipeline generating photo-realistic, automatically labeled synthetic images of chicken carcasses. We also introduce a new benchmark dataset containing 300 annotated real-world images, curated specifically for poultry segmentation research. Using these datasets, this study investigates the efficacy of synthetic data and automatic data annotation to enhance the instance segmentation of chicken carcasses, particularly when real annotated data from the processing line is scarce. A small real dataset with varying proportions of synthetic images was evaluated in prominent instance segmentation models. Results show that synthetic data significantly boosts segmentation performance for chicken carcasses across all models. This research underscores the value of synthetic data augmentation as a viable and effective strategy to mitigate data scarcity, reduce manual annotation efforts, and advance the development of robust AI-driven automated detection systems for chicken carcasses in the poultry processing industry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a pipeline for generating photo-realistic synthetic images of chicken carcasses with automatic labeling, introduces a new benchmark dataset of 300 annotated real-world images from poultry processing lines, and evaluates the impact of mixing varying proportions of synthetic data with limited real data on instance segmentation performance in prominent deep learning models, claiming significant boosts across all models.

Significance. If the results hold with proper verification of domain similarity and quantitative controls, this work would offer a practical approach to mitigating data scarcity and annotation costs in industrial computer vision for food processing, with the benchmark dataset providing a useful resource for the community.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'synthetic data significantly boosts segmentation performance for chicken carcasses across all models' is asserted without any quantitative metrics (e.g., mAP or IoU scores), model architectures, exact mixing ratios, or statistical tests. This omission prevents assessment of evidence strength for the headline result.
  2. [Experimental Evaluation] Experimental Evaluation: No quantitative domain-distance metrics (e.g., FID scores or domain-classifier accuracy) or controlled ablations holding total sample count fixed are described. This leaves the key assumption—that synthetic images are distributionally close enough to the 300 real images in lighting, texture, pose, and background to produce genuine generalization gains rather than artifacts—unverified and load-bearing for the augmentation efficacy claim.
minor comments (2)
  1. [Methods] Provide the specific instance segmentation architectures employed and full training details (hyperparameters, augmentation pipelines) to support reproducibility.
  2. [Dataset] Clarify the curation process, annotation protocol, and any quality assurance steps for the 300 real benchmark images.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback on our manuscript. We address each of the major comments below and have made revisions to strengthen the paper where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'synthetic data significantly boosts segmentation performance for chicken carcasses across all models' is asserted without any quantitative metrics (e.g., mAP or IoU scores), model architectures, exact mixing ratios, or statistical tests. This omission prevents assessment of evidence strength for the headline result.

    Authors: We agree that the abstract would benefit from including key quantitative results to allow readers to immediately assess the strength of the central claim. The body of the manuscript reports these details, including performance metrics across the evaluated models and mixing proportions. We have revised the abstract to incorporate representative quantitative findings, model references, and mixing ratios while maintaining conciseness. revision: yes

  2. Referee: [Experimental Evaluation] Experimental Evaluation: No quantitative domain-distance metrics (e.g., FID scores or domain-classifier accuracy) or controlled ablations holding total sample count fixed are described. This leaves the key assumption—that synthetic images are distributionally close enough to the 300 real images in lighting, texture, pose, and background to produce genuine generalization gains rather than artifacts—unverified and load-bearing for the augmentation efficacy claim.

    Authors: We acknowledge the value of explicit domain-similarity quantification and controlled ablations. In the revised manuscript we have added Fréchet Inception Distance (FID) scores comparing the synthetic and real distributions, along with an ablation that holds the total number of training samples constant while varying the synthetic-to-real ratio. These additions directly address the concern and help confirm that observed gains on the held-out real test set reflect genuine generalization rather than dataset-size effects alone. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation of synthetic data mixing; no derivations or predictions reduce to inputs by construction

full rationale

The paper introduces a synthetic image generation pipeline and a 300-image real benchmark, then reports standard instance segmentation experiments that mix varying proportions of synthetic and real training data and measure performance on held-out real images. No equations, first-principles derivations, or fitted parameters are presented whose outputs are renamed as predictions. The central result is an empirical comparison across models (e.g., Mask R-CNN, etc.) under controlled data ratios; this constitutes an independent test against external benchmarks rather than a self-referential loop. Any self-citations concern prior rendering techniques and do not carry the load of the reported gains, which are measured directly on the new real dataset. The study is therefore self-contained against observable performance metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on the domain assumption that synthetic images transfer usefully to real industrial scenes; standard computer vision training assumptions apply with no new entities introduced.

axioms (1)
  • domain assumption Photo-realistic synthetic images of chicken carcasses can be generated automatically with accurate labels that improve model performance on real data.
    Central premise stated in the abstract for the efficacy of the augmentation strategy.

pith-pipeline@v0.9.0 · 5772 in / 1109 out tokens · 32081 ms · 2026-05-19T02:39:05.665192+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We present the first pipeline generating photo-realistic, automatically labeled synthetic images of chicken carcasses... A small real dataset (60 images...) with varying proportions of synthetic images were evaluated in prominent instance segmentation models: YOLOv11-seg, Mask R-CNN... Results show that synthetic data significantly boosts segmentation performance

  • IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery theorem unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Blender... physically-based rendering engine... Ground truth labels are automatically generated during the rendering process

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 5 internal anchors

  1. [1]

    USDA ERS - Poultry Expected To Continue Leading Global Meat Imports as Demand Rises,

    “USDA ERS - Poultry Expected To Continue Leading Global Meat Imports as Demand Rises,” https://www.ers.usda.gov/amber- waves/2022/august/poultry-expected-to-continue-leading-global-meat- imports-as-demand-rises/

  2. [2]

    Poultry & Eggs - Sector at a Glance | Economic Research Service,

    “Poultry & Eggs - Sector at a Glance | Economic Research Service,” https://www.ers.usda.gov/topics/animal-products/poultry-eggs/sector-at- a-glance

  3. [3]

    Robotisation and intelligent systems in abattoirs,

    I. De Medeiros Esper, P. J. From, and A. Mason, “Robotisation and intelligent systems in abattoirs,” Trends in Food Science & Technology, vol. 108, pp. 214–222, Feb. 2021

  4. [4]

    Agricultural robotics research applicable to poultry production: A review,

    G. Ren, T. Lin, Y . Ying, G. Chowdhary, and K. Ting, “Agricultural robotics research applicable to poultry production: A review,”Computers and Electronics in Agriculture , vol. 169, p. 105216, Feb. 2020

  5. [5]

    The effect of different work-rest schedules on ergonomic risk in poultry slaughterhouse workers,

    N. F. Dias, A. S. Tirloni, D. Cunha dos Reis, and A. R. P. Moro, “The effect of different work-rest schedules on ergonomic risk in poultry slaughterhouse workers,” Work, vol. 69, no. 1, pp. 215–223, 2021

  6. [6]

    Environmental monitoring in a poultry farm using an instrument developed with the internet of things concept,

    W. F. Pereira, L. da Silva Fonseca, F. F. Putti, B. C. G ´oes, and L. de Paula Naves, “Environmental monitoring in a poultry farm using an instrument developed with the internet of things concept,” Computers and electronics in agriculture , vol. 170, p. 105257, 2020

  7. [7]

    Edge ai-enabled chicken health detection based on enhanced fcos-lite and knowledge distillation,

    Q. Tong, J. Wang, W. Yang, S. Wu, W. Zhang, C. Sun, and K. Xu, “Edge ai-enabled chicken health detection based on enhanced fcos-lite and knowledge distillation,” Computers and Electronics in Agriculture , vol. 226, p. 109432, 2024

  8. [8]

    Development and trends of chicken farming robots in chicken farming tasks: A review,

    D. Yang, D. Cui, and Y . Ying, “Development and trends of chicken farming robots in chicken farming tasks: A review,” vol. 221, no. C. [Online]. Available: https://doi.org/10.1016/j.compag.2024.108916

  9. [9]

    Cost-effective active laser scanning system for depth-aware deep-learning-based instance segmentation in poultry processing,

    P. Sohrabipour, C. K. R. Pallerla, A. Davar, S. Mahmoudi, P. Crandall, W. Shou, Y . She, and D. Wang, “Cost-effective active laser scanning system for depth-aware deep-learning-based instance segmentation in poultry processing,” AgriEngineering, vol. 7, no. 3, p. 77, 2025. SUBMITTED FOR REVIEWING 11

  10. [10]

    Chicgrasp: Imitation-learning based customized dual-jaw gripper control for delicate, irregular bio-products manipulation,

    A. Davar, Z. Xu, S. Mahmoudi, P. Sohrabipour, C. Pallerla, Y . She, W. Shou, P. Crandall, and D. Wang, “Chicgrasp: Imitation-learning based customized dual-jaw gripper control for delicate, irregular bio-products manipulation,” arXiv preprint arXiv:2505.08986 , 2025

  11. [11]

    Application of optical technologies in the US poultry slaughter facilities for the detection of poultry carcase condemnation,

    E. U. Chowdhury and A. Morey, “Application of optical technologies in the US poultry slaughter facilities for the detection of poultry carcase condemnation,” British Poultry Science , vol. 61, no. 6, pp. 646–652, Nov. 2020

  12. [12]

    Real-time tracking and counting of grape clusters in the field based on channel pruning with yolov5s,

    L. Shen, J. Su, R. He, L. Song, R. Huang, Y . Fang, Y . Song, and B. Su, “Real-time tracking and counting of grape clusters in the field based on channel pruning with yolov5s,” Computers and Electronics in Agriculture, vol. 206, p. 107662, 2023

  13. [13]

    Real-time lettuce-weed localization and weed severity classification based on lightweight yolo convolutional neural networks for intelligent intra-row weed control,

    R. Hu, W.-H. Su, J.-L. Li, and Y . Peng, “Real-time lettuce-weed localization and weed severity classification based on lightweight yolo convolutional neural networks for intelligent intra-row weed control,” Computers and Electronics in Agriculture , vol. 226, p. 109404, 2024

  14. [14]

    Faster-yolo-ap: A lightweight apple detection algorithm based on improved yolov8 with a new efficient pdwconv in orchard,

    Z. Liu, R. R. D. Abeyrathna, R. M. Sampurno, V . M. Nakaguchi, and T. Ahamed, “Faster-yolo-ap: A lightweight apple detection algorithm based on improved yolov8 with a new efficient pdwconv in orchard,” Computers and Electronics in Agriculture , vol. 223, p. 109118, 2024

  15. [15]

    Tracking dustbathing behavior of cage-free laying hens with machine vision technologies,

    B. Paneru, R. Bist, X. Yang, and L. Chai, “Tracking dustbathing behavior of cage-free laying hens with machine vision technologies,” Poultry Science, vol. 103, no. 12, p. 104289, 2024

  16. [16]

    Yolo-claw: A fast and accurate method for chicken claw detection,

    D. Wu, Y . Ying, M. Zhou, J. Pan, and D. Cui, “Yolo-claw: A fast and accurate method for chicken claw detection,” Engineering Applications of Artificial Intelligence , vol. 136, p. 108919, 2024

  17. [17]

    Fusion of Mask RCNN and attention mecha- nism for instance segmentation of apples under complex background,

    D. Wang and D. He, “Fusion of Mask RCNN and attention mecha- nism for instance segmentation of apples under complex background,” Computers and Electronics in Agriculture , vol. 196, p. 106864, 2022

  18. [18]

    Enhanced mask r-cnn for chinese food image detection,

    Y . Li, X. Xu, and C. Yuan, “Enhanced mask r-cnn for chinese food image detection,” Mathematical Problems in Engineering , vol. 2020, no. 1, p. 6253827, 2020

  19. [19]

    The food recognition benchmark: Using deep learning to recognize food in images,

    S. P. Mohanty, G. Singhal, E. A. Scuccimarra, D. Kebaili, H. H ´eritier, V . Boulanger, and M. Salath´e, “The food recognition benchmark: Using deep learning to recognize food in images,”Frontiers in Nutrition, vol. 9, p. 875143, 2022

  20. [20]

    Mask r-cnn for quality control of table olives,

    M. Mac ´ıas-Mac´ıas, H. S ´anchez-Santamaria, C. J. Garcia Orellana, H. M. Gonz ´alez-Velasco, R. Gallardo-Caballero, and A. Garc ´ıa-Manso, “Mask r-cnn for quality control of table olives,” Multimedia Tools and Applications, vol. 82, no. 14, pp. 21 657–21 671, 2023

  21. [21]

    Robotic Bin-Picking Pipeline for Chicken Fillets with Deep Learning-Based Instance Segmentation using Synthetic Data,

    M. Jonker, “Robotic Bin-Picking Pipeline for Chicken Fillets with Deep Learning-Based Instance Segmentation using Synthetic Data,” 2023

  22. [22]

    A review of deep learning techniques used in agriculture,

    I. Attri, L. K. Awasthi, T. P. Sharma, and P. Rathee, “A review of deep learning techniques used in agriculture,” Ecological Informatics , p. 102217, 2023

  23. [23]

    An overview of mixing augmentation methods and augmentation strategies,

    D. Lewy and J. Ma ´ndziuk, “An overview of mixing augmentation methods and augmentation strategies,” Artificial Intelligence Review , vol. 56, no. 3, pp. 2111–2169, 2023

  24. [24]

    Limitations of data augmentation and outlook,

    B. H ¨uttenrauch, “Limitations of data augmentation and outlook,” in Targeting Using Augmented Data in Database Marketing: Decision Factors for Evaluating External Sources. Springer, 2016, pp. 279–290

  25. [25]

    Learning data augmentation strategies for object detection,

    B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y . Lin, J. Shlens, and Q. V . Le, “Learning data augmentation strategies for object detection,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16 . Springer, 2020, pp. 566–583

  26. [26]

    S. I. Nikolenko, Synthetic Data for Deep Learning . Springer, 2021, vol. 174

  27. [27]

    Whale detection enhancement through synthetic satellite images,

    A. Gaur, C. Liu, X. Lin, N. Karapetyan, and Y . Aloimonos, “Whale detection enhancement through synthetic satellite images,” in OCEANS 2023-MTS/IEEE US Gulf Coast . IEEE, 2023, pp. 1–7

  28. [28]

    Nerf: Representing scenes as neural radiance fields for view synthesis,

    B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ACM Transactions on Graphics (TOG) , vol. 40, no. 4, 2021, pp. 1–12

  29. [29]

    3d gaussian splatting for real-time radiance field rendering,

    B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” in ACM Transactions on Graphics (TOG) , vol. 42, no. 4, 2023, pp. 1–14

  30. [30]

    Enhancing Strawberry Dis- ease and Quality Detection: Integrating Vision Transformers with Blender-Enhanced Synthetic Data and SwinUNet Segmentation Tech- niques,

    K. Aghamohammadesmaeilketabforoosh, “Enhancing Strawberry Dis- ease and Quality Detection: Integrating Vision Transformers with Blender-Enhanced Synthetic Data and SwinUNet Segmentation Tech- niques,” 2024

  31. [31]

    Meta-sim: Learning to generate synthetic datasets,

    A. Kar, A. Prakash, M.-Y . Liu, E. Cameracci, J. Yuan, M. Rusiniak, D. Acuna, A. Torralba, and S. Fidler, “Meta-sim: Learning to generate synthetic datasets,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2019, pp. 4551–4560

  32. [32]

    Mujoco: A physics engine for model- based control,

    E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model- based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems . IEEE, 2012, pp. 5026–5033

  33. [33]

    Exploring the evolution of physics cognition in video generation: A survey.arXiv preprint arXiv:2503.21765,

    M. Lin, X. Wang, Y . Wang, S. Wang, F. Dai, P. Ding, C. Wang, Z. Zuo, N. Sang, S. Huang et al., “Exploring the evolution of physics cognition in video generation: A survey,” arXiv preprint arXiv:2503.21765, 2025

  34. [34]

    Three- dworld: A platform for interactive multi-modal physical simulation,

    C. Gan, J. Schwartz, S. Alter, D. Mrowca, M. Schrimpf, J. Traer, J. De Freitas, J. Kubilius, A. Bhandwaldar, N. Haber et al. , “Three- dworld: A platform for interactive multi-modal physical simulation,” arXiv preprint arXiv:2007.04954 , 2020

  35. [35]

    High-resolution image synthesis and semantic manipulation with con- ditional gans,

    T.-C. Wang, M.-Y . Liu, J.-Y . Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with con- ditional gans,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 8798–8807

  36. [36]

    Generative adversarial networks,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” Communications of the ACM , vol. 63, no. 11, pp. 139–144, 2020

  37. [37]

    Generative adversarial nets,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” Advances in neural information processing systems , vol. 27, 2014

  38. [38]

    Detecting olives with synthetic or real data? olive the above,

    Y . Karabatis, X. Lin, N. J. Sanket, M. G. Lagoudakis, and Y . Aloimonos, “Detecting olives with synthetic or real data? olive the above,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 4242–4249

  39. [39]

    Detection of apple lesions in orchards based on deep learning methods of CycleGAN and YOLOV3-dense,

    Y . Tian, G. Yang, Z. Wang, E. Li, and Z. Liang, “Detection of apple lesions in orchards based on deep learning methods of CycleGAN and YOLOV3-dense,”Journal of Sensors, vol. 2019, no. 1, p. 7630926, 2019

  40. [40]

    Shape and style GAN-based multispectral data augmentation for crop/weed segmentation in precision farming,

    M. Fawakherji, V . Suriani, D. Nardi, and D. D. Bloisi, “Shape and style GAN-based multispectral data augmentation for crop/weed segmentation in precision farming,” Crop Protection, vol. 184, p. 106848, 2024

  41. [41]

    Oysternet: Enhanced oyster detection using simulation,

    X. Lin, N. J. Sanket, N. Karapetyan, and Y . Aloimonos, “Oysternet: Enhanced oyster detection using simulation,” in2023 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2023, pp. 5170–5176

  42. [42]

    Lychee surface defect detection based on deep convolutional neural networks with gan-based data augmentation,

    C. Wang and Z. Xiao, “Lychee surface defect detection based on deep convolutional neural networks with gan-based data augmentation,” Agronomy, vol. 11, no. 8, p. 1500, 2021

  43. [43]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. M ¨uller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” arXiv preprint arXiv:2307.01952 , 2023

  44. [44]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems , vol. 33, pp. 6840– 6851, 2020

  45. [45]

    Diffusion models beat gans on image synthesis,

    P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems , vol. 34, pp. 8780–8794, 2021

  46. [46]

    Odyssee: Oyster detection yielded by sensor systems on edge electronics,

    X. Lin, V . Mange, A. Suresh, B. Neuberger, A. Palnitkar, B. Campbell, A. Williams, K. Baxevani, J. Mallette, A. Vera et al., “Odyssee: Oyster detection yielded by sensor systems on edge electronics,” arXiv preprint arXiv:2409.07003, 2024

  47. [47]

    Is ai currently capable of identifying wild oysters? a comparison of human annotators against the ai model, odyssee,

    B. Campbell, A. Williams, K. Baxevani, A. Campbell, R. Dhoke, R. E. Hudock, X. Lin, V . Mange, B. Neuberger, A. Suresh et al. , “Is ai currently capable of identifying wild oysters? a comparison of human annotators against the ai model, odyssee,” Frontiers in Robotics and AI , vol. 12, p. 1587033, 2025

  48. [48]

    Denoising Diffusion Implicit Models

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502 , 2020

  49. [49]

    Labelme: a database and web-based tool for image annotation,

    B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: a database and web-based tool for image annotation,” International Journal of Computer Vision , vol. 77, no. 1, pp. 157–173, 2008

  50. [50]

    Blender- proc: Reducing the reality gap with photorealistic rendering,

    M. Denninger, M. Sundermeyer, D. Winkelbauer, D. Olefir, T. Hodan, Y . Zidan, M. Elbadrawy, M. Knauer, H. Katam, and A. Lodhi, “Blender- proc: Reducing the reality gap with photorealistic rendering,” in 16th Robotics: Science and Systems, RSS 2020, Workshops , 2020

  51. [51]

    Data synthesis methods for semantic segmentation in agriculture: A Capsicum annuum dataset,

    R. Barth, J. IJsselmuiden, J. Hemming, and E. J. Van Henten, “Data synthesis methods for semantic segmentation in agriculture: A Capsicum annuum dataset,” Computers and electronics in agriculture, vol. 144, pp. 284–296, 2018

  52. [52]

    Cluttered Food Grasp- ing with Adaptive Fingers and Synthetic-Data Trained Object Detec- tion,

    A. Ummadisingu, K. Takahashi, and N. Fukaya, “Cluttered Food Grasp- ing with Adaptive Fingers and Synthetic-Data Trained Object Detec- tion,” in 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 8290–8297

  53. [53]

    Mask r-cnn,

    K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision , 2017, pp. 2961–2969. SUBMITTED FOR REVIEWING 12

  54. [54]

    Masked-attention mask transformer for universal image segmentation,

    B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1290–1299

  55. [55]

    Yolov11: Robust, fast and efficient object detection,

    C.-Y . Wang, A. Bochkovskiy, and H.-Y . M. Liao, “Yolov11: Robust, fast and efficient object detection,” arXiv preprint arXiv:2308.11562 , 2023

  56. [56]

    Decoupled Weight Decay Regularization

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101 , 2017

  57. [57]

    Playing for data: Ground truth from computer games,

    S. R. Richter, V . Vineet, S. Roth, and V . Koltun, “Playing for data: Ground truth from computer games,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11- 14, 2016, Proceedings, Part II 14 . Springer, 2016, pp. 102–118

  58. [58]

    Training deep networks with synthetic data: Bridging the reality gap by domain randomization,

    J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V . Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops , 2018, pp. 969–977

  59. [59]

    Cut- and-splat: Leveraging gaussian splatting for synthetic data generation,

    B. Vanherle, B. Zoomers, J. Put, F. Van Reeth, and N. Michiels, “Cut- and-splat: Leveraging gaussian splatting for synthetic data generation,” arXiv preprint arXiv:2504.08473 , 2025

  60. [60]

    Knowing the distance: Understanding the gap between syn- thetic and real data for face parsing,

    E. Friedman, A. Lehr, A. Gruzdev, V . Loginov, M. Kogan, M. Rubin, and O. Zvitia, “Knowing the distance: Understanding the gap between syn- thetic and real data for face parsing,” arXiv preprint arXiv:2303.15219, 2023

  61. [61]

    Boosting zero-shot classification with synthetic data diversity via stable diffusion,

    J. Shipard, A. Wiliem, K. N. Thanh, W. Xiang, and C. Fookes, “Boosting zero-shot classification with synthetic data diversity via stable diffusion,” arXiv preprint arXiv:2302.03298 , vol. 3, no. 5, 2023

  62. [62]

    YOLOv11: An Overview of the Key Architectural Enhancements

    R. Khanam and M. Hussain, “Yolov11: An overview of the key architectural enhancements,” arXiv preprint arXiv:2410.17725 , 2024

  63. [63]

    Comparing yolov11 and yolov8 for instance segmentation of occluded and non-occluded immature green fruits in complex orchard environment,

    R. Sapkota and M. Karkee, “Comparing yolov11 and yolov8 for instance segmentation of occluded and non-occluded immature green fruits in complex orchard environment,” arXiv preprint arXiv:2410.19869, 2024

  64. [64]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

  65. [65]

    Low saturation confidence distribution-based test-time adaptation for cross- domain remote sensing image classification,

    Y . Liang, S. Cao, J. Zheng, X. Zhang, J. Huang, and H. Fu, “Low saturation confidence distribution-based test-time adaptation for cross- domain remote sensing image classification,” International Journal of Applied Earth Observation and Geoinformation , vol. 139, p. 104463, 2025

  66. [66]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929 , 2020

  67. [67]

    Quantifying the simulation–reality gap for deep learning-based drone detection,

    T. R. Dieter, A. Weinmann, S. J ¨ager, and E. Brucherseifer, “Quantifying the simulation–reality gap for deep learning-based drone detection,” Electronics, vol. 12, no. 10, p. 2197, 2023

  68. [68]

    On the equiva- lency, substitutability, and flexibility of synthetic data,

    C.-J. Chang, D. Li, S. Moon, and M. Kapadia, “On the equiva- lency, substitutability, and flexibility of synthetic data,” arXiv preprint arXiv:2403.16244, 2024

  69. [69]

    One-for-more: Continual diffusion model for anomaly detection,

    X. Li, X. Tan, Z. Chen, Z. Zhang, R. Zhang, R. Guo, G. Jiang, Y . Chen, Y . Qu, L. Ma et al. , “One-for-more: Continual diffusion model for anomaly detection,” arXiv preprint arXiv:2502.19848 , 2025

  70. [70]

    Ai models collapse when trained on recursively generated data,

    I. Shumailov, Z. Shumaylov, Y . Zhao, N. Papernot, R. Anderson, and Y . Gal, “Ai models collapse when trained on recursively generated data,” Nature, vol. 631, no. 8022, pp. 755–759, 2024