Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real Control
Pith reviewed 2026-05-25 03:46 UTC · model grok-4.3
The pith
A robot arm harvests strawberries at 84.3 percent overall success after its control policy is trained only in simulation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that a target-conditioned PPO policy trained in Isaac Lab to output joint-position commands, when combined with the HRAttnEdge-YOLO26-seg perception model, produces stable closed-loop harvesting on a UR10e manipulator that reaches 84.3 percent overall success on 281 strawberries in greenhouse conditions, outperforming an inverse-kinematics MoveIt baseline in motion smoothness and eliminating the need for exhaustive real-robot data collection before deployment.
What carries the argument
The target-conditioned PPO policy trained in Isaac Lab that maps fruit location observations to smooth joint-position commands for the UR10e arm.
If this is right
- The vision model raises segmentation accuracy by 10 to 14 percent over baseline YOLO variants on both in-house and public datasets.
- The PPO controller produces smoother joint trajectories than the MoveIt IK baseline in controlled lab tests.
- The full pipeline reduces hardware dependency by training the controller exclusively in simulation before direct real-robot deployment.
- The integrated system achieves 84.3 percent end-to-end success on 281 strawberries without planner-dependent reaching.
Where Pith is reading between the lines
- The same simulation-trained policy structure could be reused for other soft or clustered fruits if the contact-force model in simulation is adjusted accordingly.
- Adding camera calibration drift detection at runtime would test whether the reported success rates remain stable across multiple days of greenhouse operation.
- The approach implies that scaling the number of simulated environments could reduce the remaining 15.7 percent failure rate without collecting new real-robot failures.
Load-bearing premise
The Isaac Lab simulator reproduces real greenhouse lighting, fruit dynamics, and robot contact forces closely enough that a policy trained only inside it transfers to the physical robot without extra fine-tuning or new failure modes.
What would settle it
Deploy the same PPO policy on the UR10e in the greenhouse and record whether the combined reaching-plus-grasp success rate falls materially below the reported 91.3 percent because of unmodeled contact forces or lighting changes.
Figures
read the original abstract
This study presents a closed-loop robotic strawberry harvesting system that combines a robust vision module, simulation-trained deep reinforcement learning (DRL) control, and ROS-based realrobot execution. For perception, we propose HRAttnEdge-YOLO26-seg, a modified YOLO26-seg architecture that incorporates a high-resolution P2 branch, segmentation-path attention, and edgesupervised prototype learning to improve instance segmentation in cluttered scenes. For control, we train a target-conditioned Proximal Policy Optimization (PPO) policy in Isaac Lab to produce smooth joint-position commands for a UR10e manipulator and deploy it on a UR10e robot for targetfruit reaching and harvesting. This simulation-based approach reduces hardware dependency, lowers development cost, and allows scalable policy training without exhaustive physical trials before real deployment. The proposed vision model demonstrated the highest overall performance among the evaluated methods. On both self-collected and public datasets, the model showed a 10 to 14% improvement in segmentation performance. In controlled in-house tests, the PPO controller produced stable and dynamically smoother motion than a inverse kinematics (IK)-based MoveIt baseline. In greenhouse trials, the proposed integrated system harvested 281 strawberries, achieving 96.6% reaching success, 91.3% grasp-and-pull success, and 84.3% overall harvesting success. These results illustrate that task-specific perception combined with simulation-trained PPO can serve as a practical and resource-efficient alternative to conventional planner-dependent reaching in manipulation, enabling reliable closed-loop robotic harvesting in complex agricultural environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a closed-loop robotic strawberry harvesting system that integrates a modified YOLO26-seg architecture (HRAttnEdge-YOLO26-seg) with high-resolution P2 branch, attention, and edge-supervised learning for instance segmentation in cluttered scenes, together with a target-conditioned PPO policy trained in Isaac Lab to generate joint-position commands for a UR10e arm. The policy is deployed zero-shot on physical hardware; greenhouse trials on 281 strawberries yield 96.6% reaching success, 91.3% grasp-and-pull success, and 84.3% overall harvesting success, with the vision model showing 10-14% segmentation gains on self-collected and public datasets and the controller producing smoother motion than an IK-based MoveIt baseline.
Significance. If the sim-to-real transfer is substantiated, the work supplies concrete empirical evidence that a simulation-trained PPO policy can deliver high success rates on a physically deployed agricultural manipulator without real-world fine-tuning, together with a task-specific perception module that improves segmentation in occlusion-heavy scenes. The scale of the greenhouse evaluation (281 strawberries) and the direct baseline comparison constitute a tangible data point for sim-to-real manipulation in unstructured environments.
major comments (1)
- [Control and deployment description (greenhouse trials paragraph)] The central claim that the reported greenhouse performance demonstrates successful zero-shot sim-to-real transfer of the PPO policy rests on the unverified assumption that Isaac Lab reproduces the relevant contact forces, stem detachment dynamics, and lighting statistics. No domain-randomization parameter ranges, force-torque trajectory matching metrics, or sensitivity analysis of policy performance to simulation-reality mismatch are supplied, leaving the 84.3% overall success rate consistent with transfer but unable to rule out condition-specific alignment.
minor comments (2)
- [Abstract] The abstract states success percentages without accompanying trial counts per metric, error bars, or statistical tests; these details appear only later in the greenhouse trials paragraph and should be summarized upfront for clarity.
- [Perception results] The claim of '10 to 14% improvement in segmentation performance' is stated without naming the exact metrics (mAP, IoU, etc.) or the precise baseline models against which the gain is measured.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the single major comment below and indicate the planned revision.
read point-by-point responses
-
Referee: The central claim that the reported greenhouse performance demonstrates successful zero-shot sim-to-real transfer of the PPO policy rests on the unverified assumption that Isaac Lab reproduces the relevant contact forces, stem detachment dynamics, and lighting statistics. No domain-randomization parameter ranges, force-torque trajectory matching metrics, or sensitivity analysis of policy performance to simulation-reality mismatch are supplied, leaving the 84.3% overall success rate consistent with transfer but unable to rule out condition-specific alignment.
Authors: We acknowledge that the original manuscript does not report the specific domain-randomization parameter ranges, force-torque matching metrics, or a sensitivity analysis. In the revised version we will add the exact randomization ranges used in Isaac Lab (lighting intensity and color temperature, stem stiffness and friction coefficients, fruit mass and size variation, and camera pose noise). We will also include a brief sensitivity study showing policy success rate versus selected randomization magnitudes. Direct force-torque trajectory matching is not feasible because the greenhouse trials did not instrument the UR10e with a force-torque sensor; we will therefore add an explicit limitations paragraph noting this gap and explaining that the 84.3 % success rate across 281 strawberries in an unstructured greenhouse, together with smoother motion than the MoveIt baseline, constitutes empirical support for transfer rather than conclusive proof against all possible sim-reality mismatches. revision: yes
Circularity Check
No circularity: purely empirical results from physical trials and dataset tests
full rationale
The paper reports measured success rates (96.6% reaching, 91.3% grasp-and-pull, 84.3% overall on 281 strawberries) from greenhouse deployment of a simulation-trained PPO policy and a modified YOLO segmentation model. These are direct experimental outcomes, not predictions or derivations that reduce to fitted inputs, self-citations, or ansatzes by construction. The sim-to-real transfer is presented as an empirical fact verified by real-robot performance rather than any mathematical chain that collapses to its own assumptions. No equations, uniqueness theorems, or self-referential definitions appear in the load-bearing claims.
Axiom & Free-Parameter Ledger
free parameters (1)
- PPO reward weights and network hyperparameters
axioms (1)
- domain assumption Isaac Lab simulator dynamics are sufficiently faithful for policy transfer to the real UR10e in contact tasks
Reference graph
Works this paper leans on
-
[1]
H. Stemmler, E. M. Meemken, Greenhouse farming and employment: Evidence from ecuador, Food Policy 117 (5 2023).doi:10.1016/j.foodpol.2023.102443
-
[2]
D. Schimmelpfennig, Crop production costs, profits, and ecosystem stewardship with precision agri- culture, Journal of Agricultural and Applied Economics 50 (1) (2018) 81–103
work page 2018
-
[3]
L. Pánková, R. Aulová, J. Jarolímek, Economic aspects of precision agriculture systems, AGRIS on-line Papers in Economics and Informatics 12 (3) (2020) 59–67
work page 2020
-
[4]
S. Getahun, H. Kefale, Y. Gelaye, Application of precision agriculture technologies for sustain- able crop production and environmental sustainability: A systematic review, The Scientific World Journal 2024 (1) (2024) 2126734
work page 2024
-
[5]
M. T. Win, Z. Rutledge, M. K. Maredia, Labor shortages and farmer adaptation strategies, Applied Economic Perspectives and Policy 47 (3) (2025) 896–913
work page 2025
-
[6]
D. Charlton, J. E. Taylor, S. Vougioukas, Z. Rutledge, Can wages rise quickly enough to keep workers in the fields?, Choices 34 (2) (2019) 1–7
work page 2019
-
[7]
S. Pandey, K. Kaushik, A. Tewatia, S. J. Quraishi, Robotics and automation in modern agriculture: Revolutionizing harvesting and processing, in: Precision and Intelligence in Agriculture: Advanced Technologies for Sustainable Farming, IGI Global Scientific Publishing, 2026, pp. 153–186
work page 2026
-
[8]
W. Wang, C. Li, Y. Xi, J. Gu, X. Zhang, M. Zhou, Y. Peng, Research progress and development trend of visual detection methods for selective fruit harvesting robots, Agronomy 15 (8) (2025) 1926
work page 2025
-
[9]
V. Rajendran, B. Debnath, S. Mghames, W. Mandil, S.Parsa, S.Parsons, A. Ghalamzan-E, Towards autonomous selective harvesting: A review of robot perception, robot design, motion planning and control, Journal of Field Robotics 41 (7) (2024) 2247–2279
work page 2024
-
[10]
S. Wu, H. Chen, Y. Zhang, K. Liao, L. Li, Robot visual servo based on lightweight yolo11-smma for camellia oleifera fruits harvesting, Computers and Electronics in Agriculture 243 (2026) 111409. doi:10.1016/j.compag.2026.111409
-
[11]
B. Arad, J. Balendonck, R. Barth, O. Ben-Shahar, Y. Edan, T. Hellström, J. Hemming, P. Kurtser, O. Ringdahl, T. Tielen, et al., Development of a sweet pepper harvesting robot, Journal of field robotics 37 (6) (2020) 1027–1039. 19
work page 2020
-
[12]
S. Noda, M. Kogoshi, W. Iijima, Robot simulation on agri-field point cloud with centimeter resolu- tion, IEEE Access 13 (2025) 14404–14416
work page 2025
-
[13]
P. Catala-Roman, E. A. Navarro, J. Segura-Garcia, M. Garcia-Pineda, Harnessing digital twins for agriculture 5.0: a comparative analysis of 3d point cloud tools, Applied Sciences 14 (5) (2024) 1709
work page 2024
-
[14]
Z.-H. Huang, C.-T. Chen, N. Ikegaya, T. Chang, K. Ke, Y.-C. Chen, A robotic harvesting system for occluded cucumbers using f2sa-yolov8 and hvsc, Computers and Electronics in Agriculture 246 (2026) 111616.doi:10.1016/j.compag.2026.111616
-
[15]
W. Jia, Y. Tian, R. Luo, Z. Zhang, J. Lian, Y. Zheng, Detection and segmentation of overlapped fruits based on optimized mask r-cnn application in apple harvesting robot, Computers and Elec- tronics in Agriculture 172 (2020) 105380
work page 2020
-
[16]
W.Huapeng, H.Handroos, Inversekinematicsanalysisofaparallelredundantmanipulatorbymeans of differential evolution, in: E. Arai, T. Arai, M. Takano (Eds.), Human Friendly Mechatronics, Elsevier Science, Amsterdam, 2001, pp. 321–326.doi:10.1016/B978-044450649-8/50054-1
-
[17]
O. A. Al-Sharif, N. A. Abbass, A. M. Hanafi, A. O. Elnady, Enhancing robotic autonomy: a review and case study of traditional and deep learning approaches to inverse kinematics, Int J Eng Appl Sci Oct 6 Univ 1 (1) (2024) 1–8
work page 2024
-
[18]
A. Calzada-Garcia, J. G. Victores, F. J. Naranjo-Campos, C. Balaguer, A review on inverse kine- matics, control and planning for robotic manipulators with and without obstacles via deep neural networks, Algorithms 18 (1) (2025) 23
work page 2025
- [19]
- [20]
- [21]
-
[22]
F. Chaumette, S. Hutchinson, Visual servo control. i. basic approaches, IEEE robotics & automation magazine 13 (4) (2006) 82–90
work page 2006
-
[23]
G. Chesi, A. Vicino, Visual servoing for large camera displacements, IEEE Transactions on Robotics 20 (4) (2004) 724–735.doi:10.1109/TRO.2004.829465
- [24]
- [25]
-
[26]
Y. Tang, J. Qiu, Y. Zhang, D. Wu, Y. Cao, K. Zhao, L. Zhu, Optimization strategies of fruit detection to overcome the challenge of unstructured background in field orchard environment: A review, Precision Agriculture 24 (4) (2023) 1183–1219
work page 2023
-
[27]
S. Espinoza, C. Aguilera, L. Rojas, P. G. Campos, Analysis of fruit images with deep learning: A systematic literature review and future directions, IEEE Access 12 (2023) 3837–3859
work page 2023
-
[28]
A. Bashir, Y. Wang, M. O. Ojo, A. Zahid, A vision system for occluded cutting point localization in robotic harvesting of greenhouse lettuce, IEEE Transactions on AgriFood Electronics 4 (1) (2026) 157–167.doi:10.1109/TAFE.2025.3621592
-
[29]
K. Arulkumaran, M. P. Deisenroth, M. Brundage, A. A. Bharath, Deep reinforcement learning: A brief survey, IEEE signal processing magazine 34 (6) (2017) 26–38
work page 2017
-
[30]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algo- rithms, arXiv preprint arXiv:1707.06347 (2017). 20
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [31]
-
[32]
V. K. Elumalai, et al., A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator, Results in Engineering 25 (2025) 104178
work page 2025
-
[33]
Y. Liu, H. Xu, D. Liu, L. Wang, A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping, Robotics and Computer-Integrated Manufacturing 78 (2022) 102365.doi:10.1016/j.rcim.2022.102365
-
[34]
J. Liu, H. J. Yap, A. S. M. Khairuddin, Review on motion planning of robotic manipulator in dynamic environments, Journal of Sensors 2024 (1) (2024) 5969512
work page 2024
-
[35]
D. Soleymanzadeh, I. Lopez-Sanchez, H. Su, Y. Li, X. Liang, M. Zheng, Towards generalist neural motion planners for robotic manipulators: Challenges and opportunities, IEEE Transactions on Automation Science and Engineering (2026)
work page 2026
-
[36]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Munoz, X. Yao, R. Zurbrügg, N. Rudin, et al., Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning, arXiv preprint arXiv:2511.04831 (2025).doi:10.48550/arXiv.2511.04831
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.04831 2025
-
[37]
H. A. Hernández, I. F. Mondragón, S. R. González, L. F. Pedraza, Reconfigurable agricultural robotics: Control strategies, communication, and applications, Computers and Electronics in Agri- culture 234 (2025) 110161.doi:10.1016/j.compag.2025.110161
-
[38]
J. Silveira, J. A. Marshall, S. N. Givigi, A simulation pipeline to facilitate real-world robotic rein- forcement learning applications, in: 2025 IEEE International systems Conference (SysCon), IEEE, 2025, pp. 1–8
work page 2025
- [39]
-
[40]
C. Schwarke, M. Mittal, N. Rudin, D. Hoeller, M. Hutter, Rsl-rl: A learning library for robotics research, arXiv preprint arXiv:2509.10771 (2025)
-
[41]
I.Pérez-Borrero, D.Marín-Santos, M.E.Gegúndez-Arias, E.Cortés-Ancos, Afastandaccuratedeep learning method for strawberry instance segmentation, Computers and Electronics in Agriculture 178 (2020) 105736.doi:10.1016/j.compag.2020.105736
-
[42]
B. Sekachev, N. Manovich, M. Zhiltsov, A. Zhavoronkov, D. Kalinin, B. Hoff, TOsmanov, D. Kru- chinin, A. Zankevich, DmitriySidnev, M. Markelov, Johannes222, M. Chenuet, a andre, telena- chos, A. Melnikov, J. Kim, L. Ilouz, N. Glazov, Priya4607, R. Tehrani, S. Jeong, V. Skubriev, S. Yonekura, vugia truong, zliang7, lizhming, T. Truong, opencv/cvat: v1.1.0 ...
-
[43]
Microsoft COCO: common objects in context,
T.-Y.Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P.Dollár, C. L.Zitnick, Microsoft coco: Common objects in context, in: European conference on computer vision, Springer, 2014, pp. 740–755.doi:10.1007/978-3-319-10602-1_48
-
[44]
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, International journal of computer vision 88 (2) (2010) 303–338.doi:10. 1007/s11263-009-0275-4
work page 2010
-
[45]
D. H. Douglas, T. K. Peucker, Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, Cartographica: the international journal for geographic information and geovisualization 10 (2) (1973) 112–122
work page 1973
-
[46]
U. Ramer, An iterative procedure for the polygonal approximation of plane curves, Computer graph- ics and image processing 1 (3) (1972) 244–256. 21
work page 1972
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.