ROG-Grasp: Root-Oriented Geometry for Robotic Grasping and Placement

Augustus Sroka; Bill Cai; Brian Poon; Feng Liu; Kelvin Cai; Lifeng Zhou; Ran Yang; Satoru Eto; Shijie Geng; Yiming Feng

arxiv: 2606.00449 · v1 · pith:ABGEP2HBnew · submitted 2026-05-30 · 💻 cs.RO

ROG-Grasp: Root-Oriented Geometry for Robotic Grasping and Placement

Zijian An , Augustus Sroka , Ran Yang , Bill Cai , Satoru Eto , Brian Poon , Kelvin Cai , Shijie Geng

show 3 more authors

Feng Liu Yiming Feng Lifeng Zhou

This is my paper

Pith reviewed 2026-06-28 19:08 UTC · model grok-4.3

classification 💻 cs.RO

keywords robotic graspingRGB-D perceptionorientation estimationagricultural roboticsYOLO detectionplane fittinggrasp planningproduce manipulation

0 comments

The pith

Root-oriented geometry from RGB-D point clouds enables reliable, orientation-aware robotic grasping and placement of produce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ROG-Grasp, a framework that infers the orientation of agricultural produce like tomatoes and onions from the geometry of their root surfaces using RGB-D sensors. It uses a YOLO detector to find the root and plane fitting on the point cloud to determine the normal direction, which then guides grasp pose generation and motion planning for consistent placement. This geometry-driven approach is shown to achieve high success rates in both isolated and cluttered scenes while executing faster than vision-language-action policies. A sympathetic reader would care because consistent orientation is critical in post-harvest processing, and this offers a more reliable alternative to learning-based methods for such tasks.

Core claim

ROG-Grasp estimates the produce orientation from root surface geometry using a YOLO-based root detector and point cloud plane fitting to infer the root normal. This enables stable grasp pose generation and orientation-constrained Cartesian motion planning, leading to high success rates and stable execution times on tomatoes and onions in isolated and cluttered scenarios, with more reliable and accurate grasp completion and faster execution than vision-language-action policies.

What carries the argument

Root normal inference from YOLO root detection combined with plane fitting on RGB-D point clouds, which determines the produce orientation used for grasp planning and constrained motion.

If this is right

High success rates for grasping and placement in isolated and cluttered scenarios.
Stable execution times for orientation-controlled manipulation tasks.
More reliable and accurate grasp completion compared to vision-language-action policies.
Faster execution than VLA policies for the same tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar root-oriented perception could apply to other agricultural items where orientation matters for packaging or processing.
Integrating this geometric method with learned policies might improve overall robustness in varying conditions.
The approach relies on clear root visibility, suggesting potential for extension to occluded or varied produce shapes.

Load-bearing premise

Root surface geometry from RGB-D point clouds can be reliably inferred via YOLO detection and plane fitting to determine produce orientation for grasp planning.

What would settle it

Demonstrating cases where YOLO root detection fails or plane fitting produces incorrect normals, resulting in unstable grasps or incorrect orientations.

Figures

Figures reproduced from arXiv: 2606.00449 by Augustus Sroka, Bill Cai, Brian Poon, Feng Liu, Kelvin Cai, Lifeng Zhou, Ran Yang, Satoru Eto, Shijie Geng, Yiming Feng, Zijian An.

**Figure 1.** Figure 1: Overview of the proposed ROG-Grasp framework. The pipeline consists of two main modules: vision perception and motion execution. In the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: YOLO dataset annotation process. the grasping operation requires accurate positioning of the gripper rather than the robot flange. Therefore, a coordinate transformation between the gripper frame and the robot flange frame must be considered. This transformation is formulated using homogeneous transformation matrices in SE(3), which provide a unified representation of rigidbody motion including both rotat… view at source ↗

**Figure 3.** Figure 3: Key components of geometry-based orientation estimation and pose generation. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Waypoint-based execution pipeline for orientation-aware grasping [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Sequential execution of orientation-aware grasping and placement for a tomato. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Sequential execution of orientation-aware grasping and placement for an onion. Notably, in subfigures (f)–(h), the robot moves from waypoint [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Orientation-aware manipulation is essential in post-harvest agricultural processing, where produce must be grasped and placed in consistent configurations. This paper presents ROG-Grasp, a geometry-based robotic grasping and placement framework that estimates the produce orientation from root surface geometry using RGB-D perception. A YOLO-based root detector and point cloud plane fitting are used to infer the root normal, enabling stable grasp pose generation and orientation-constrained Cartesian motion planning. Experiments on tomatoes and onions demonstrate high success rates and stable execution time in both isolated and cluttered scenarios. Compared with vision-language-action (VLA) policies, the proposed method achieves more reliable and accurate grasp completion with faster execution. These results highlight the effectiveness of geometry-driven perception for practical orientation-controlled manipulation tasks. A video of our paper is available online https://youtu.be/Ir2UtGODdMo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ROG-Grasp applies YOLO detection plus plane fitting to recover root normals for orientation-aware grasping of tomatoes and onions, but the abstract supplies zero numbers to back the reliability or speed claims over VLA policies.

read the letter

This paper is an application of standard computer vision tools to a practical problem in agricultural robotics. It detects the root of produce like tomatoes and onions with YOLO, fits a plane to the RGB-D points to find the normal, and uses that to plan grasps and placements that respect orientation.

What it does reasonably well is target a real need where produce has to be handled consistently after harvest. The geometry approach avoids the black-box nature of VLA policies and promises faster execution in both isolated and cluttered scenes.

The new element is framing the root geometry as the key for orientation, but the components themselves are off-the-shelf.

The soft spots are significant. There are no quantitative results at all in the abstract—no success percentages, no timing data, no comparison tables. The claim of more reliable and accurate grasp completion than VLA is stated but not supported by evidence. More importantly, the plane fitting for root normal is presented without any validation of its accuracy or robustness to noise, partial views, or non-planar surfaces. That step is central to the whole thing, and its reliability is not demonstrated.

Readers in applied robotics for food processing might get some ideas from the pipeline and the video. For the broader community, there's little here that advances the state of the art beyond a domain-specific implementation.

I would not bring this to a reading group. It does not deserve peer review in its current form because the experimental support is missing. If the full paper has detailed metrics and analysis, then maybe, but as described the central claims can't be assessed.

Referee Report

3 major / 0 minor

Summary. The manuscript presents ROG-Grasp, a geometry-based robotic grasping and placement framework for agricultural produce (tomatoes, onions) that detects the root via YOLO on RGB-D images, fits a plane to the corresponding point cloud to recover the root normal, and uses the resulting orientation to generate stable grasp poses and execute Cartesian motion plans with orientation constraints. Experiments in isolated and cluttered scenes are reported to yield high success rates with faster execution than vision-language-action (VLA) policies.

Significance. If the quantitative claims hold, the work supplies a lightweight, interpretable, and training-data-efficient alternative to end-to-end learned policies for orientation-controlled manipulation tasks that arise in post-harvest processing; the explicit use of surface geometry rather than learned priors is a clear methodological strength.

major comments (3)

[Abstract] Abstract: the central claim that the method 'achieves more reliable and accurate grasp completion with faster execution' than VLA policies is stated without any numerical success rates, timing values, error bars, trial counts, or statistical comparisons, so the performance advantage cannot be assessed.
[Methods (root detection and plane fitting)] Root normal inference step (YOLO detection + plane fitting on RGB-D points): no quantitative orientation error (e.g., angular deviation of the fitted normal), no ablation on plane-fitting robustness to occlusion or non-planar roots, and no failure-mode analysis are supplied, yet this step directly determines the grasp pose and motion constraints that underpin the orientation-aware claim.
[Experiments] Experimental section: the manuscript supplies neither dataset sizes, number of grasp trials, nor any comparison table or statistical test against the VLA baseline, leaving the reported 'high success rates' and 'stable execution time' unsupported.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify areas where additional quantitative details are needed to support our claims. We will revise the manuscript to incorporate these elements.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'achieves more reliable and accurate grasp completion with faster execution' than VLA policies is stated without any numerical success rates, timing values, error bars, trial counts, or statistical comparisons, so the performance advantage cannot be assessed.

Authors: We agree that the abstract lacks specific numerical support for the performance claims. In the revised version, we will update the abstract to include key quantitative results such as success rates, execution times, trial counts, and references to statistical comparisons from the experiments. revision: yes
Referee: [Methods (root detection and plane fitting)] Root normal inference step (YOLO detection + plane fitting on RGB-D points): no quantitative orientation error (e.g., angular deviation of the fitted normal), no ablation on plane-fitting robustness to occlusion or non-planar roots, and no failure-mode analysis are supplied, yet this step directly determines the grasp pose and motion constraints that underpin the orientation-aware claim.

Authors: We acknowledge this limitation in the current manuscript. The revised paper will add quantitative evaluation of the orientation error for the root normal, an ablation study on the plane fitting step's robustness, and an analysis of failure modes. revision: yes
Referee: [Experiments] Experimental section: the manuscript supplies neither dataset sizes, number of grasp trials, nor any comparison table or statistical test against the VLA baseline, leaving the reported 'high success rates' and 'stable execution time' unsupported.

Authors: We agree that more detailed experimental reporting is required. We will revise the experimental section to include dataset sizes, the number of grasp trials performed, a comparison table with the VLA baseline, and the results of appropriate statistical tests. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical method with no derivation chain

full rationale

The paper describes an empirical pipeline (YOLO root detection + plane fitting on RGB-D points to obtain normals, followed by grasp planning and motion constraints) and reports experimental success rates on tomatoes/onions versus VLA baselines. No equations, fitted parameters, uniqueness theorems, or self-citations appear in the provided text. The central claims rest on direct hardware trials rather than any reduction of a prediction to its own inputs. This is self-contained empirical work; the reader's circularity score of 0.0 is confirmed.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, derivation, or theoretical component; paper is an empirical robotics application with no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5701 in / 1058 out tokens · 21076 ms · 2026-06-28T19:08:46.593245+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 5 canonical work pages · 2 internal anchors

[1]

Automatic fruit picking technology: A comprehensive review of research advances,

J. Zhang, N. Kang, Q. Qu, L. Zhou, and H. Zhang, “Automatic fruit picking technology: A comprehensive review of research advances,” Artificial Intelligence Review, vol. 57, no. 3, p. 54, 2024

2024
[2]

A review of visual perception technology for intelligent fruit harvesting robots,

Y . Huang, S. Xu, H. Chen, G. Li, H. Dong, J. Yu, X. Zhang, and R. Chen, “A review of visual perception technology for intelligent fruit harvesting robots,”Frontiers in Plant Science, vol. 16, p. 1646871, 2025

2025
[3]

A review of research on fruit and vegetable picking robots based on deep learning,

Y . Tan, X. Liu, J. Zhang, Y . Wang, and Y . Hu, “A review of research on fruit and vegetable picking robots based on deep learning,”Sensors, vol. 25, no. 12, p. 3677, 2025

2025
[4]

A review on the recent developments in vision-based apple-harvesting robots for recognizing fruit and picking pose,

Y . Zhang, N. Li, L. Zhang, J. Lin, X. Gao, and G. Chen, “A review on the recent developments in vision-based apple-harvesting robots for recognizing fruit and picking pose,”Computers and Electronics in Agriculture, vol. 231, p. 109968, 2025

2025
[5]

Recognition and localization methods for vision-based fruit picking robots: A review,

Y . Tang, M. Chen, C. Wang, L. Luo, J. Li, G. Lian, and X. Zou, “Recognition and localization methods for vision-based fruit picking robots: A review,”Frontiers in plant science, vol. 11, p. 510, 2020

2020
[6]

A survey of robotic harvesting systems and enabling technologies,

L. Droukas, Z. Doulgeri, N. L. Tsakiridis, D. Triantafyllou, I. Kleitsio- tis, I. Mariolis, D. Giakoumis, D. Tzovaras, D. Kateris, and D. Bochtis, “A survey of robotic harvesting systems and enabling technologies,” Journal of Intelligent & Robotic Systems, vol. 107, no. 2, p. 21, 2023

2023
[7]

An autonomous strawberry-harvesting robot: Design, development, integration, and field evaluation,

Y . Xiong, Y . Ge, L. Grimstad, and P. J. From, “An autonomous strawberry-harvesting robot: Design, development, integration, and field evaluation,”Journal of Field Robotics, vol. 37, no. 2, pp. 202– 224, 2020

2020
[8]

Design and evaluation of a novel cable-driven gripper with perception capabilities for strawberry picking robots,

Y . Xiong, P. J. From, and V . Isler, “Design and evaluation of a novel cable-driven gripper with perception capabilities for strawberry picking robots,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 7384–7391

2018
[9]

Robotic complex for harvesting apple crops,

O. Krakhmalev, S. Gataullin, E. Boltachev, S. Korchagin, I. Blagoveshchensky, and K. Liang, “Robotic complex for harvesting apple crops,”Robotics, vol. 11, no. 4, p. 77, 2022

2022
[10]

A novel ap- proach to tomato harvesting using a hybrid gripper with semantic seg- mentation and keypoint detection,

S. Ansari, M. K. Gohil, Y . Maeda, and B. Bhattacharya, “A novel ap- proach to tomato harvesting using a hybrid gripper with semantic seg- mentation and keypoint detection,”arXiv preprint arXiv:2512.03684, 2025

work page arXiv 2025
[11]

The adoption of robotics in pack houses for fresh produce handling,

B. J. Mulholland, P. S. Panesar, and P. H. Johnson, “The adoption of robotics in pack houses for fresh produce handling,”The Journal of Horticultural Science and Biotechnology, vol. 99, no. 1, pp. 9–19, 2024

2024
[12]

Fruit sorting robot based on color and size for an agricultural product packaging system,

T. Dewi, P. Risma, and Y . Oktarina, “Fruit sorting robot based on color and size for an agricultural product packaging system,”Bulletin of Electrical Engineering and Informatics, vol. 9, no. 4, pp. 1438– 1445, 2020

2020
[13]

Multi-arm robotic system and strategy for the automatic packaging of apples,

Y . Zhang, L. Chen, X. Li, Q. Li, and J. Li, “Multi-arm robotic system and strategy for the automatic packaging of apples,”Artificial Intelligence in Agriculture, 2025

2025
[14]

Understanding decision making for automation in packhouse and human capital requirement

O. B. Oladele, “Understanding decision making for automation in packhouse and human capital requirement.”
[15]

Panoptic mapping with fruit completion and pose estimation for horticultural robots,

Y . Pan, F. Magistri, T. L¨abe, E. Marks, C. Smitt, C. McCool, J. Behley, and C. Stachniss, “Panoptic mapping with fruit completion and pose estimation for horticultural robots,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 4226–4233

2023
[16]

Multi-vision-based picking point localisation of target fruit for harvesting robots,

C. Beldek, A. Dunn, J. Cunningham, E. Sariyildiz, S. Phung, and G. Alici, “Multi-vision-based picking point localisation of target fruit for harvesting robots,” in2025 IEEE International Conference on Mechatronics (ICM). IEEE, 2025, pp. 1–6

2025
[17]

Geometry-aware fruit grasping estimation for robotic harvesting in apple orchards,

X. Wang, H. Kang, H. Zhou, W. Au, and C. Chen, “Geometry-aware fruit grasping estimation for robotic harvesting in apple orchards,” Computers and Electronics in Agriculture, vol. 193, p. 106716, 2022

2022
[18]

High-precision fruit localization using active laser-camera scanning: Robust laser line extraction for 2d-3d transformation,

P. Chu, Z. Li, K. Zhang, K. Lammers, and R. Lu, “High-precision fruit localization using active laser-camera scanning: Robust laser line extraction for 2d-3d transformation,”Smart Agricultural Technology, vol. 7, p. 100391, 2024

2024
[19]

Direction identification system of garlic clove based on machine vision,

G. Chi and G. Hui, “Direction identification system of garlic clove based on machine vision,”TELKOMNIKA Indonesian Journal of Electrical Engineering, vol. 11, no. 5, pp. 2323–2329, 2013

2013
[20]

Design and experiment of a garlic orientation and orderly conveying device based on machine vision,

J. Chen, C. Yu, K. Yao, Y . Zhou, and B. Zhou, “Design and experiment of a garlic orientation and orderly conveying device based on machine vision,”Agriculture, vol. 12, no. 8, p. 1077, 2022

2022
[21]

Determination of garlic clove orientation based on capacitive sensing technology,

L. Fang, K. Zhou, T. Li, J. Hou, and Y . Li, “Determination of garlic clove orientation based on capacitive sensing technology,”Computers and Electronics in Agriculture, vol. 219, p. 108827, 2024

2024
[22]

Design and experiment of adjustment device based on machine vision for garlic clove direction,

Y . Li, Y . Wu, T. Li, Z. Niu, and J. Hou, “Design and experiment of adjustment device based on machine vision for garlic clove direction,” Computers and Electronics in Agriculture, vol. 174, p. 105513, 2020

2020
[23]

CLAW: A Vision-Language-Action Framework for Weight-Aware Robotic Grasping

Z. An, R. Yang, Y . Feng, and L. Zhou, “Claw: A vision-language- action framework for weight-aware robotic grasping,”arXiv preprint arXiv:2509.14143, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

arXiv preprint arXiv:2509.14138 (2025)

R. Yang, Z. An, L. Zhou, and Y . Feng, “Seqvla: Sequential task ex- ecution for long-horizon manipulation with completion-aware vision- language-action model,”arXiv preprint arXiv:2509.14138, 2025

work page arXiv 2025
[25]

Zhong, X

Y . Zhong, X. Huang, R. Li, C. Zhang, Z. Chen, T. Guan, F. Zeng, K. N. Lui, Y . Ye, Y . Lianget al., “Dexgraspvla: A vision-language- action framework towards general dexterous grasping,”arXiv preprint arXiv:2502.20900, 2025

work page arXiv 2025
[26]

Vla-grasp: a vision-language- action modeling with cross-modality fusion for task-oriented grasp- ing,

J. Zhu, X. Sun, Q. Zhang, and M. Liu, “Vla-grasp: a vision-language- action modeling with cross-modality fusion for task-oriented grasp- ing,”Complex & Intelligent Systems, vol. 11, no. 6, p. 272, 2025

2025
[27]

A review of yolo algorithm developments,

P. Jiang, D. Ergu, F. Liu, Y . Cai, and B. Ma, “A review of yolo algorithm developments,”Procedia computer science, vol. 199, pp. 1066–1073, 2022

2022
[28]

Comprehensive performance evaluation of yolov12, yolo11, yolov10, yolov9 and yolov8 on detecting and counting fruitlet in complex orchard environments,

R. Sapkota, Z. Meng, M. Churuvija, X. Du, Z. Ma, and M. Karkee, “Comprehensive performance evaluation of yolov12, yolo11, yolov10, yolov9 and yolov8 on detecting and counting fruitlet in complex orchard environments,”Agriculture Communications, p. 100125, 2026

2026
[29]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichteret al., “π 0: A vision-language-action flow model for general robot control. corr, abs/2410.24164, 2024. doi: 10.48550,”arXiv preprint ARXIV .2410.24164

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

Automatic fruit picking technology: A comprehensive review of research advances,

J. Zhang, N. Kang, Q. Qu, L. Zhou, and H. Zhang, “Automatic fruit picking technology: A comprehensive review of research advances,” Artificial Intelligence Review, vol. 57, no. 3, p. 54, 2024

2024

[2] [2]

A review of visual perception technology for intelligent fruit harvesting robots,

Y . Huang, S. Xu, H. Chen, G. Li, H. Dong, J. Yu, X. Zhang, and R. Chen, “A review of visual perception technology for intelligent fruit harvesting robots,”Frontiers in Plant Science, vol. 16, p. 1646871, 2025

2025

[3] [3]

A review of research on fruit and vegetable picking robots based on deep learning,

Y . Tan, X. Liu, J. Zhang, Y . Wang, and Y . Hu, “A review of research on fruit and vegetable picking robots based on deep learning,”Sensors, vol. 25, no. 12, p. 3677, 2025

2025

[4] [4]

A review on the recent developments in vision-based apple-harvesting robots for recognizing fruit and picking pose,

Y . Zhang, N. Li, L. Zhang, J. Lin, X. Gao, and G. Chen, “A review on the recent developments in vision-based apple-harvesting robots for recognizing fruit and picking pose,”Computers and Electronics in Agriculture, vol. 231, p. 109968, 2025

2025

[5] [5]

Recognition and localization methods for vision-based fruit picking robots: A review,

Y . Tang, M. Chen, C. Wang, L. Luo, J. Li, G. Lian, and X. Zou, “Recognition and localization methods for vision-based fruit picking robots: A review,”Frontiers in plant science, vol. 11, p. 510, 2020

2020

[6] [6]

A survey of robotic harvesting systems and enabling technologies,

L. Droukas, Z. Doulgeri, N. L. Tsakiridis, D. Triantafyllou, I. Kleitsio- tis, I. Mariolis, D. Giakoumis, D. Tzovaras, D. Kateris, and D. Bochtis, “A survey of robotic harvesting systems and enabling technologies,” Journal of Intelligent & Robotic Systems, vol. 107, no. 2, p. 21, 2023

2023

[7] [7]

An autonomous strawberry-harvesting robot: Design, development, integration, and field evaluation,

Y . Xiong, Y . Ge, L. Grimstad, and P. J. From, “An autonomous strawberry-harvesting robot: Design, development, integration, and field evaluation,”Journal of Field Robotics, vol. 37, no. 2, pp. 202– 224, 2020

2020

[8] [8]

Design and evaluation of a novel cable-driven gripper with perception capabilities for strawberry picking robots,

Y . Xiong, P. J. From, and V . Isler, “Design and evaluation of a novel cable-driven gripper with perception capabilities for strawberry picking robots,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 7384–7391

2018

[9] [9]

Robotic complex for harvesting apple crops,

O. Krakhmalev, S. Gataullin, E. Boltachev, S. Korchagin, I. Blagoveshchensky, and K. Liang, “Robotic complex for harvesting apple crops,”Robotics, vol. 11, no. 4, p. 77, 2022

2022

[10] [10]

A novel ap- proach to tomato harvesting using a hybrid gripper with semantic seg- mentation and keypoint detection,

S. Ansari, M. K. Gohil, Y . Maeda, and B. Bhattacharya, “A novel ap- proach to tomato harvesting using a hybrid gripper with semantic seg- mentation and keypoint detection,”arXiv preprint arXiv:2512.03684, 2025

work page arXiv 2025

[11] [11]

The adoption of robotics in pack houses for fresh produce handling,

B. J. Mulholland, P. S. Panesar, and P. H. Johnson, “The adoption of robotics in pack houses for fresh produce handling,”The Journal of Horticultural Science and Biotechnology, vol. 99, no. 1, pp. 9–19, 2024

2024

[12] [12]

Fruit sorting robot based on color and size for an agricultural product packaging system,

T. Dewi, P. Risma, and Y . Oktarina, “Fruit sorting robot based on color and size for an agricultural product packaging system,”Bulletin of Electrical Engineering and Informatics, vol. 9, no. 4, pp. 1438– 1445, 2020

2020

[13] [13]

Multi-arm robotic system and strategy for the automatic packaging of apples,

Y . Zhang, L. Chen, X. Li, Q. Li, and J. Li, “Multi-arm robotic system and strategy for the automatic packaging of apples,”Artificial Intelligence in Agriculture, 2025

2025

[14] [14]

Understanding decision making for automation in packhouse and human capital requirement

O. B. Oladele, “Understanding decision making for automation in packhouse and human capital requirement.”

[15] [15]

Panoptic mapping with fruit completion and pose estimation for horticultural robots,

Y . Pan, F. Magistri, T. L¨abe, E. Marks, C. Smitt, C. McCool, J. Behley, and C. Stachniss, “Panoptic mapping with fruit completion and pose estimation for horticultural robots,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 4226–4233

2023

[16] [16]

Multi-vision-based picking point localisation of target fruit for harvesting robots,

C. Beldek, A. Dunn, J. Cunningham, E. Sariyildiz, S. Phung, and G. Alici, “Multi-vision-based picking point localisation of target fruit for harvesting robots,” in2025 IEEE International Conference on Mechatronics (ICM). IEEE, 2025, pp. 1–6

2025

[17] [17]

Geometry-aware fruit grasping estimation for robotic harvesting in apple orchards,

X. Wang, H. Kang, H. Zhou, W. Au, and C. Chen, “Geometry-aware fruit grasping estimation for robotic harvesting in apple orchards,” Computers and Electronics in Agriculture, vol. 193, p. 106716, 2022

2022

[18] [18]

High-precision fruit localization using active laser-camera scanning: Robust laser line extraction for 2d-3d transformation,

P. Chu, Z. Li, K. Zhang, K. Lammers, and R. Lu, “High-precision fruit localization using active laser-camera scanning: Robust laser line extraction for 2d-3d transformation,”Smart Agricultural Technology, vol. 7, p. 100391, 2024

2024

[19] [19]

Direction identification system of garlic clove based on machine vision,

G. Chi and G. Hui, “Direction identification system of garlic clove based on machine vision,”TELKOMNIKA Indonesian Journal of Electrical Engineering, vol. 11, no. 5, pp. 2323–2329, 2013

2013

[20] [20]

Design and experiment of a garlic orientation and orderly conveying device based on machine vision,

J. Chen, C. Yu, K. Yao, Y . Zhou, and B. Zhou, “Design and experiment of a garlic orientation and orderly conveying device based on machine vision,”Agriculture, vol. 12, no. 8, p. 1077, 2022

2022

[21] [21]

Determination of garlic clove orientation based on capacitive sensing technology,

L. Fang, K. Zhou, T. Li, J. Hou, and Y . Li, “Determination of garlic clove orientation based on capacitive sensing technology,”Computers and Electronics in Agriculture, vol. 219, p. 108827, 2024

2024

[22] [22]

Design and experiment of adjustment device based on machine vision for garlic clove direction,

Y . Li, Y . Wu, T. Li, Z. Niu, and J. Hou, “Design and experiment of adjustment device based on machine vision for garlic clove direction,” Computers and Electronics in Agriculture, vol. 174, p. 105513, 2020

2020

[23] [23]

CLAW: A Vision-Language-Action Framework for Weight-Aware Robotic Grasping

Z. An, R. Yang, Y . Feng, and L. Zhou, “Claw: A vision-language- action framework for weight-aware robotic grasping,”arXiv preprint arXiv:2509.14143, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

arXiv preprint arXiv:2509.14138 (2025)

R. Yang, Z. An, L. Zhou, and Y . Feng, “Seqvla: Sequential task ex- ecution for long-horizon manipulation with completion-aware vision- language-action model,”arXiv preprint arXiv:2509.14138, 2025

work page arXiv 2025

[25] [25]

Zhong, X

Y . Zhong, X. Huang, R. Li, C. Zhang, Z. Chen, T. Guan, F. Zeng, K. N. Lui, Y . Ye, Y . Lianget al., “Dexgraspvla: A vision-language- action framework towards general dexterous grasping,”arXiv preprint arXiv:2502.20900, 2025

work page arXiv 2025

[26] [26]

Vla-grasp: a vision-language- action modeling with cross-modality fusion for task-oriented grasp- ing,

J. Zhu, X. Sun, Q. Zhang, and M. Liu, “Vla-grasp: a vision-language- action modeling with cross-modality fusion for task-oriented grasp- ing,”Complex & Intelligent Systems, vol. 11, no. 6, p. 272, 2025

2025

[27] [27]

A review of yolo algorithm developments,

P. Jiang, D. Ergu, F. Liu, Y . Cai, and B. Ma, “A review of yolo algorithm developments,”Procedia computer science, vol. 199, pp. 1066–1073, 2022

2022

[28] [28]

Comprehensive performance evaluation of yolov12, yolo11, yolov10, yolov9 and yolov8 on detecting and counting fruitlet in complex orchard environments,

R. Sapkota, Z. Meng, M. Churuvija, X. Du, Z. Ma, and M. Karkee, “Comprehensive performance evaluation of yolov12, yolo11, yolov10, yolov9 and yolov8 on detecting and counting fruitlet in complex orchard environments,”Agriculture Communications, p. 100125, 2026

2026

[29] [29]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichteret al., “π 0: A vision-language-action flow model for general robot control. corr, abs/2410.24164, 2024. doi: 10.48550,”arXiv preprint ARXIV .2410.24164

work page internal anchor Pith review Pith/arXiv arXiv 2024