pith. sign in

arxiv: 2605.23863 · v1 · pith:XZEOGGBZnew · submitted 2026-05-22 · 💻 cs.RO

Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real Control

Pith reviewed 2026-05-25 03:46 UTC · model grok-4.3

classification 💻 cs.RO
keywords strawberry harvestingrobotic manipulationdeep reinforcement learningsim-to-real transferinstance segmentationPPO controlgreenhouse roboticsUR10e manipulator
0
0 comments X

The pith

A robot arm harvests strawberries at 84.3 percent overall success after its control policy is trained only in simulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a complete harvesting pipeline that pairs a custom vision model for locating fruit in clutter with a reinforcement learning controller that learns reaching and grasping motions entirely inside a simulator. The controller then runs directly on a physical UR10e arm inside real greenhouses, producing joint commands without any real-world policy updates or traditional motion planners. Across 281 strawberries the system reached 96.6 percent success on the reach phase, 91.3 percent on grasp-and-pull, and 84.3 percent end-to-end. A reader should care because the approach replaces costly physical trial-and-error with cheaper simulation training while still delivering usable performance in unstructured agricultural scenes.

Core claim

The paper shows that a target-conditioned PPO policy trained in Isaac Lab to output joint-position commands, when combined with the HRAttnEdge-YOLO26-seg perception model, produces stable closed-loop harvesting on a UR10e manipulator that reaches 84.3 percent overall success on 281 strawberries in greenhouse conditions, outperforming an inverse-kinematics MoveIt baseline in motion smoothness and eliminating the need for exhaustive real-robot data collection before deployment.

What carries the argument

The target-conditioned PPO policy trained in Isaac Lab that maps fruit location observations to smooth joint-position commands for the UR10e arm.

If this is right

  • The vision model raises segmentation accuracy by 10 to 14 percent over baseline YOLO variants on both in-house and public datasets.
  • The PPO controller produces smoother joint trajectories than the MoveIt IK baseline in controlled lab tests.
  • The full pipeline reduces hardware dependency by training the controller exclusively in simulation before direct real-robot deployment.
  • The integrated system achieves 84.3 percent end-to-end success on 281 strawberries without planner-dependent reaching.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same simulation-trained policy structure could be reused for other soft or clustered fruits if the contact-force model in simulation is adjusted accordingly.
  • Adding camera calibration drift detection at runtime would test whether the reported success rates remain stable across multiple days of greenhouse operation.
  • The approach implies that scaling the number of simulated environments could reduce the remaining 15.7 percent failure rate without collecting new real-robot failures.

Load-bearing premise

The Isaac Lab simulator reproduces real greenhouse lighting, fruit dynamics, and robot contact forces closely enough that a policy trained only inside it transfers to the physical robot without extra fine-tuning or new failure modes.

What would settle it

Deploy the same PPO policy on the UR10e in the greenhouse and record whether the combined reaching-plus-grasp success rate falls materially below the reported 91.3 percent because of unmodeled contact forces or lighting changes.

Figures

Figures reproduced from arXiv: 2605.23863 by Al Bashir, Azlan Zahid, Chen-Kang Huang, Partho Ghose, Prem Raj, Shao-Yang Chang.

Figure 1
Figure 1. Figure 1: Overview of the closed-loop robotic strawberry harvesting system. An RGB-D camera acquires [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Side-by-side comparison of baseline YOLO26-seg and the proposed HRAttnEdge-YOLO26-seg. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: PPO-based reaching policy learning for the UR10e manipulator in Isaac Lab. At each time [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: ROS-based integration architecture for greenhouse strawberry harvesting. The vision node [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of strawberry instance segmentation results at the nano ( [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative in-house trajectories for qualitative comparison between the PPO-based con [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Demonstration of greenhouse strawberry harvesting using integrated vision and trajectory [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Representative greenhouse harvesting trajectories of the PPO-based controller: (a) 3D tra [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
read the original abstract

This study presents a closed-loop robotic strawberry harvesting system that combines a robust vision module, simulation-trained deep reinforcement learning (DRL) control, and ROS-based realrobot execution. For perception, we propose HRAttnEdge-YOLO26-seg, a modified YOLO26-seg architecture that incorporates a high-resolution P2 branch, segmentation-path attention, and edgesupervised prototype learning to improve instance segmentation in cluttered scenes. For control, we train a target-conditioned Proximal Policy Optimization (PPO) policy in Isaac Lab to produce smooth joint-position commands for a UR10e manipulator and deploy it on a UR10e robot for targetfruit reaching and harvesting. This simulation-based approach reduces hardware dependency, lowers development cost, and allows scalable policy training without exhaustive physical trials before real deployment. The proposed vision model demonstrated the highest overall performance among the evaluated methods. On both self-collected and public datasets, the model showed a 10 to 14% improvement in segmentation performance. In controlled in-house tests, the PPO controller produced stable and dynamically smoother motion than a inverse kinematics (IK)-based MoveIt baseline. In greenhouse trials, the proposed integrated system harvested 281 strawberries, achieving 96.6% reaching success, 91.3% grasp-and-pull success, and 84.3% overall harvesting success. These results illustrate that task-specific perception combined with simulation-trained PPO can serve as a practical and resource-efficient alternative to conventional planner-dependent reaching in manipulation, enabling reliable closed-loop robotic harvesting in complex agricultural environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents a closed-loop robotic strawberry harvesting system that integrates a modified YOLO26-seg architecture (HRAttnEdge-YOLO26-seg) with high-resolution P2 branch, attention, and edge-supervised learning for instance segmentation in cluttered scenes, together with a target-conditioned PPO policy trained in Isaac Lab to generate joint-position commands for a UR10e arm. The policy is deployed zero-shot on physical hardware; greenhouse trials on 281 strawberries yield 96.6% reaching success, 91.3% grasp-and-pull success, and 84.3% overall harvesting success, with the vision model showing 10-14% segmentation gains on self-collected and public datasets and the controller producing smoother motion than an IK-based MoveIt baseline.

Significance. If the sim-to-real transfer is substantiated, the work supplies concrete empirical evidence that a simulation-trained PPO policy can deliver high success rates on a physically deployed agricultural manipulator without real-world fine-tuning, together with a task-specific perception module that improves segmentation in occlusion-heavy scenes. The scale of the greenhouse evaluation (281 strawberries) and the direct baseline comparison constitute a tangible data point for sim-to-real manipulation in unstructured environments.

major comments (1)
  1. [Control and deployment description (greenhouse trials paragraph)] The central claim that the reported greenhouse performance demonstrates successful zero-shot sim-to-real transfer of the PPO policy rests on the unverified assumption that Isaac Lab reproduces the relevant contact forces, stem detachment dynamics, and lighting statistics. No domain-randomization parameter ranges, force-torque trajectory matching metrics, or sensitivity analysis of policy performance to simulation-reality mismatch are supplied, leaving the 84.3% overall success rate consistent with transfer but unable to rule out condition-specific alignment.
minor comments (2)
  1. [Abstract] The abstract states success percentages without accompanying trial counts per metric, error bars, or statistical tests; these details appear only later in the greenhouse trials paragraph and should be summarized upfront for clarity.
  2. [Perception results] The claim of '10 to 14% improvement in segmentation performance' is stated without naming the exact metrics (mAP, IoU, etc.) or the precise baseline models against which the gain is measured.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the single major comment below and indicate the planned revision.

read point-by-point responses
  1. Referee: The central claim that the reported greenhouse performance demonstrates successful zero-shot sim-to-real transfer of the PPO policy rests on the unverified assumption that Isaac Lab reproduces the relevant contact forces, stem detachment dynamics, and lighting statistics. No domain-randomization parameter ranges, force-torque trajectory matching metrics, or sensitivity analysis of policy performance to simulation-reality mismatch are supplied, leaving the 84.3% overall success rate consistent with transfer but unable to rule out condition-specific alignment.

    Authors: We acknowledge that the original manuscript does not report the specific domain-randomization parameter ranges, force-torque matching metrics, or a sensitivity analysis. In the revised version we will add the exact randomization ranges used in Isaac Lab (lighting intensity and color temperature, stem stiffness and friction coefficients, fruit mass and size variation, and camera pose noise). We will also include a brief sensitivity study showing policy success rate versus selected randomization magnitudes. Direct force-torque trajectory matching is not feasible because the greenhouse trials did not instrument the UR10e with a force-torque sensor; we will therefore add an explicit limitations paragraph noting this gap and explaining that the 84.3 % success rate across 281 strawberries in an unstructured greenhouse, together with smoother motion than the MoveIt baseline, constitutes empirical support for transfer rather than conclusive proof against all possible sim-reality mismatches. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical results from physical trials and dataset tests

full rationale

The paper reports measured success rates (96.6% reaching, 91.3% grasp-and-pull, 84.3% overall on 281 strawberries) from greenhouse deployment of a simulation-trained PPO policy and a modified YOLO segmentation model. These are direct experimental outcomes, not predictions or derivations that reduce to fitted inputs, self-citations, or ansatzes by construction. The sim-to-real transfer is presented as an empirical fact verified by real-robot performance rather than any mathematical chain that collapses to its own assumptions. No equations, uniqueness theorems, or self-referential definitions appear in the load-bearing claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper rests on standard robotics and machine-learning assumptions without introducing new physical entities or deriving constants; the main unstated premises are accurate sim-to-real transfer and generalization of the vision model to the target greenhouse.

free parameters (1)
  • PPO reward weights and network hyperparameters
    Typical training knobs chosen to produce stable joint commands; values not reported in abstract.
axioms (1)
  • domain assumption Isaac Lab simulator dynamics are sufficiently faithful for policy transfer to the real UR10e in contact tasks
    Invoked when the trained policy is deployed directly on hardware without further adaptation.

pith-pipeline@v0.9.0 · 5827 in / 1509 out tokens · 28954 ms · 2026-05-25T03:46:02.910110+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

  1. [1]

    Stemmler, E

    H. Stemmler, E. M. Meemken, Greenhouse farming and employment: Evidence from ecuador, Food Policy 117 (5 2023).doi:10.1016/j.foodpol.2023.102443

  2. [2]

    Schimmelpfennig, Crop production costs, profits, and ecosystem stewardship with precision agri- culture, Journal of Agricultural and Applied Economics 50 (1) (2018) 81–103

    D. Schimmelpfennig, Crop production costs, profits, and ecosystem stewardship with precision agri- culture, Journal of Agricultural and Applied Economics 50 (1) (2018) 81–103

  3. [3]

    Pánková, R

    L. Pánková, R. Aulová, J. Jarolímek, Economic aspects of precision agriculture systems, AGRIS on-line Papers in Economics and Informatics 12 (3) (2020) 59–67

  4. [4]

    Getahun, H

    S. Getahun, H. Kefale, Y. Gelaye, Application of precision agriculture technologies for sustain- able crop production and environmental sustainability: A systematic review, The Scientific World Journal 2024 (1) (2024) 2126734

  5. [5]

    M. T. Win, Z. Rutledge, M. K. Maredia, Labor shortages and farmer adaptation strategies, Applied Economic Perspectives and Policy 47 (3) (2025) 896–913

  6. [6]

    Charlton, J

    D. Charlton, J. E. Taylor, S. Vougioukas, Z. Rutledge, Can wages rise quickly enough to keep workers in the fields?, Choices 34 (2) (2019) 1–7

  7. [7]

    Pandey, K

    S. Pandey, K. Kaushik, A. Tewatia, S. J. Quraishi, Robotics and automation in modern agriculture: Revolutionizing harvesting and processing, in: Precision and Intelligence in Agriculture: Advanced Technologies for Sustainable Farming, IGI Global Scientific Publishing, 2026, pp. 153–186

  8. [8]

    W. Wang, C. Li, Y. Xi, J. Gu, X. Zhang, M. Zhou, Y. Peng, Research progress and development trend of visual detection methods for selective fruit harvesting robots, Agronomy 15 (8) (2025) 1926

  9. [9]

    Rajendran, B

    V. Rajendran, B. Debnath, S. Mghames, W. Mandil, S.Parsa, S.Parsons, A. Ghalamzan-E, Towards autonomous selective harvesting: A review of robot perception, robot design, motion planning and control, Journal of Field Robotics 41 (7) (2024) 2247–2279

  10. [10]

    S. Wu, H. Chen, Y. Zhang, K. Liao, L. Li, Robot visual servo based on lightweight yolo11-smma for camellia oleifera fruits harvesting, Computers and Electronics in Agriculture 243 (2026) 111409. doi:10.1016/j.compag.2026.111409

  11. [11]

    B. Arad, J. Balendonck, R. Barth, O. Ben-Shahar, Y. Edan, T. Hellström, J. Hemming, P. Kurtser, O. Ringdahl, T. Tielen, et al., Development of a sweet pepper harvesting robot, Journal of field robotics 37 (6) (2020) 1027–1039. 19

  12. [12]

    S. Noda, M. Kogoshi, W. Iijima, Robot simulation on agri-field point cloud with centimeter resolu- tion, IEEE Access 13 (2025) 14404–14416

  13. [13]

    Catala-Roman, E

    P. Catala-Roman, E. A. Navarro, J. Segura-Garcia, M. Garcia-Pineda, Harnessing digital twins for agriculture 5.0: a comparative analysis of 3d point cloud tools, Applied Sciences 14 (5) (2024) 1709

  14. [14]

    Huang, C.-T

    Z.-H. Huang, C.-T. Chen, N. Ikegaya, T. Chang, K. Ke, Y.-C. Chen, A robotic harvesting system for occluded cucumbers using f2sa-yolov8 and hvsc, Computers and Electronics in Agriculture 246 (2026) 111616.doi:10.1016/j.compag.2026.111616

  15. [15]

    W. Jia, Y. Tian, R. Luo, Z. Zhang, J. Lian, Y. Zheng, Detection and segmentation of overlapped fruits based on optimized mask r-cnn application in apple harvesting robot, Computers and Elec- tronics in Agriculture 172 (2020) 105380

  16. [16]

    W.Huapeng, H.Handroos, Inversekinematicsanalysisofaparallelredundantmanipulatorbymeans of differential evolution, in: E. Arai, T. Arai, M. Takano (Eds.), Human Friendly Mechatronics, Elsevier Science, Amsterdam, 2001, pp. 321–326.doi:10.1016/B978-044450649-8/50054-1

  17. [17]

    O. A. Al-Sharif, N. A. Abbass, A. M. Hanafi, A. O. Elnady, Enhancing robotic autonomy: a review and case study of traditional and deep learning approaches to inverse kinematics, Int J Eng Appl Sci Oct 6 Univ 1 (1) (2024) 1–8

  18. [18]

    Calzada-Garcia, J

    A. Calzada-Garcia, J. G. Victores, F. J. Naranjo-Campos, C. Balaguer, A review on inverse kine- matics, control and planning for robotic manipulators with and without obstacles via deep neural networks, Algorithms 18 (1) (2025) 23

  19. [19]

    Jocher, A

    G. Jocher, A. Chaurasia, J. Qiu, Ultralytics yolov8 (2023). URLhttps://github.com/ultralytics/ultralytics

  20. [20]

    Jocher, J

    G. Jocher, J. Qiu, Ultralytics yolo11 (2024). URLhttps://github.com/ultralytics/ultralytics

  21. [21]

    Malis, F

    E. Malis, F. Chaumette, S. Boudet, 2 1/2 d visual servoing, IEEE Transactions on Robotics and Automation 15 (2) (2002) 238–250

  22. [22]

    Chaumette, S

    F. Chaumette, S. Hutchinson, Visual servo control. i. basic approaches, IEEE robotics & automation magazine 13 (4) (2006) 82–90

  23. [23]

    Chesi, A

    G. Chesi, A. Vicino, Visual servoing for large camera displacements, IEEE Transactions on Robotics 20 (4) (2004) 724–735.doi:10.1109/TRO.2004.829465

  24. [24]

    Levine, P

    S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, D. Quillen, Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, The International journal of robotics research 37 (4-5) (2018) 421–436

  25. [25]

    K. M. Bhat, T. Gao, A. Mathur, R. Satishkumar, F. Yandun, D. Bauer, N. Pollard, Vision-guided autonomous dual-arm extraction robot for bell pepper harvesting, arXiv preprint arXiv:2603.13987 (2026)

  26. [26]

    Y. Tang, J. Qiu, Y. Zhang, D. Wu, Y. Cao, K. Zhao, L. Zhu, Optimization strategies of fruit detection to overcome the challenge of unstructured background in field orchard environment: A review, Precision Agriculture 24 (4) (2023) 1183–1219

  27. [27]

    Espinoza, C

    S. Espinoza, C. Aguilera, L. Rojas, P. G. Campos, Analysis of fruit images with deep learning: A systematic literature review and future directions, IEEE Access 12 (2023) 3837–3859

  28. [28]

    Bashir, Y

    A. Bashir, Y. Wang, M. O. Ojo, A. Zahid, A vision system for occluded cutting point localization in robotic harvesting of greenhouse lettuce, IEEE Transactions on AgriFood Electronics 4 (1) (2026) 157–167.doi:10.1109/TAFE.2025.3621592

  29. [29]

    Arulkumaran, M

    K. Arulkumaran, M. P. Deisenroth, M. Brundage, A. A. Bharath, Deep reinforcement learning: A brief survey, IEEE signal processing magazine 34 (6) (2017) 26–38

  30. [30]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algo- rithms, arXiv preprint arXiv:1707.06347 (2017). 20

  31. [31]

    Kurup, D

    A. Kurup, D. Sumesh, L. L. Saju, S. Suresh, Comparative analysis between proximal policy op- timization and its applications in high-complexity sequential decision-making, INTERNATIONAL JOURNAL OF ENGINEERING DEVELOPMENT AND RESEARCH 14 (1) (2026) 504–510

  32. [32]

    V. K. Elumalai, et al., A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator, Results in Engineering 25 (2025) 104178

  33. [33]

    Y. Liu, H. Xu, D. Liu, L. Wang, A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping, Robotics and Computer-Integrated Manufacturing 78 (2022) 102365.doi:10.1016/j.rcim.2022.102365

  34. [34]

    J. Liu, H. J. Yap, A. S. M. Khairuddin, Review on motion planning of robotic manipulator in dynamic environments, Journal of Sensors 2024 (1) (2024) 5969512

  35. [35]

    Soleymanzadeh, I

    D. Soleymanzadeh, I. Lopez-Sanchez, H. Su, Y. Li, X. Liang, M. Zheng, Towards generalist neural motion planners for robotic manipulators: Challenges and opportunities, IEEE Transactions on Automation Science and Engineering (2026)

  36. [36]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Munoz, X. Yao, R. Zurbrügg, N. Rudin, et al., Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning, arXiv preprint arXiv:2511.04831 (2025).doi:10.48550/arXiv.2511.04831

  37. [37]

    H. A. Hernández, I. F. Mondragón, S. R. González, L. F. Pedraza, Reconfigurable agricultural robotics: Control strategies, communication, and applications, Computers and Electronics in Agri- culture 234 (2025) 110161.doi:10.1016/j.compag.2025.110161

  38. [38]

    Silveira, J

    J. Silveira, J. A. Marshall, S. N. Givigi, A simulation pipeline to facilitate real-world robotic rein- forcement learning applications, in: 2025 IEEE International systems Conference (SysCon), IEEE, 2025, pp. 1–8

  39. [39]

    Jocher, J

    G. Jocher, J. Qiu, Ultralytics yolo26 (2026). URLhttps://github.com/ultralytics/ultralytics

  40. [40]

    Schwarke, M

    C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, M. Hutter, Rsl-rl: A learning library for robotics research, arXiv preprint arXiv:2509.10771 (2025)

  41. [41]

    I.Pérez-Borrero, D.Marín-Santos, M.E.Gegúndez-Arias, E.Cortés-Ancos, Afastandaccuratedeep learning method for strawberry instance segmentation, Computers and Electronics in Agriculture 178 (2020) 105736.doi:10.1016/j.compag.2020.105736

  42. [42]

    Sekachev, N

    B. Sekachev, N. Manovich, M. Zhiltsov, A. Zhavoronkov, D. Kalinin, B. Hoff, TOsmanov, D. Kru- chinin, A. Zankevich, DmitriySidnev, M. Markelov, Johannes222, M. Chenuet, a andre, telena- chos, A. Melnikov, J. Kim, L. Ilouz, N. Glazov, Priya4607, R. Tehrani, S. Jeong, V. Skubriev, S. Yonekura, vugia truong, zliang7, lizhming, T. Truong, opencv/cvat: v1.1.0 ...

  43. [43]

    Microsoft COCO: common objects in context,

    T.-Y.Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P.Dollár, C. L.Zitnick, Microsoft coco: Common objects in context, in: European conference on computer vision, Springer, 2014, pp. 740–755.doi:10.1007/978-3-319-10602-1_48

  44. [44]

    Everingham, L

    M. Everingham, L. Van Gool, C. K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, International journal of computer vision 88 (2) (2010) 303–338.doi:10. 1007/s11263-009-0275-4

  45. [45]

    D. H. Douglas, T. K. Peucker, Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, Cartographica: the international journal for geographic information and geovisualization 10 (2) (1973) 112–122

  46. [46]

    Ramer, An iterative procedure for the polygonal approximation of plane curves, Computer graph- ics and image processing 1 (3) (1972) 244–256

    U. Ramer, An iterative procedure for the polygonal approximation of plane curves, Computer graph- ics and image processing 1 (3) (1972) 244–256. 21