Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control
Pith reviewed 2026-05-22 09:20 UTC · model grok-4.3
The pith
A reinforcement learning policy trained only in a frictionless simulator controls real microfiber shapes on a surface using visual feedback, without retraining or adaptation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 ± 80 μm across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters and three manipulated lengths, the same policy achieves sub-millimeter final shape error.
What carries the argument
Closed-loop sim-to-real RL that trains geometric shape regulation in a frictionless simulator and uses real-time visual feedback to correct observed effects of unmodeled surface interactions.
If this is right
- The identical policy works across three diameters and three lengths without any retuning.
- Shape regulation remains repeatable under real surface contact conditions.
- Operation at 40 Hz is achieved on physical dual-gripper hardware.
- Simplified simulators suffice for this task when feedback closes the loop.
Where Pith is reading between the lines
- The same feedback-driven correction might apply to other deformable micromanipulation tasks where surface effects dominate.
- Adding depth sensing or multi-view cameras could further reduce errors in more complex 3D shapes.
- The approach implies that many contact-rich microscale tasks could avoid domain randomization if visual observability is high.
Load-bearing premise
The task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.
What would settle it
A series of trials on new fiber specimens or initial configurations that produce final shape errors consistently above one millimeter would show the claim of reliable sub-millimeter performance does not hold.
read the original abstract
Autonomous contact-based micromanipulation is challenging because surface and interfacial interactions at the microscale are difficult to model accurately, limiting the use of conventional model-based control and sim-to-real learning. We present a closed-loop sim-to-real reinforcement learning (RL) approach for microfiber shape control on a surface. The central idea is to train geometric shape regulation in a simplified frictionless simulator and rely on real-time visual feedback during deployment to iteratively correct the observed effects of unmodeled surface interactions. An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 $\pm$ 80 $\mu$m across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters (50, 80, and 120 $\mu$m) and three manipulated lengths (10 mm, 15mm, and 20 mm), the same policy achieves sub-millimeter final shape error without any retraining or retuning. These results show that a policy learned in a simplified simulator can achieve repeatable real-world microfiber shape regulation under surface contact, provided that the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a closed-loop sim-to-real reinforcement learning approach for controlling the shape of deformable microfibers on a surface using a dual-gripper micromanipulation system. An RL policy is trained entirely in a simplified frictionless simulator and transferred directly to the physical system operating at 40 Hz without retraining or domain adaptation. Experiments on silk microfibers report a mean point-wise shape error of 270 ± 80 μm across 24 diverse initial configurations and sub-millimeter final shape error across nine specimens with varying diameters (50, 80, 120 μm) and lengths (10, 15, 20 mm).
Significance. If the results hold, the work shows that simplified simulation combined with real-time visual feedback can enable repeatable zero-shot sim-to-real transfer for contact-rich deformable object manipulation at the microscale. This could reduce reliance on complex domain randomization or adaptation in micro-robotics, provided the closed-loop correction reliably handles unmodeled surface effects.
major comments (2)
- [Methods] The manuscript provides no details on the RL algorithm, reward design, simulation parameters, observation/action spaces, or training procedure. These omissions are load-bearing because the central claim of successful zero-shot transfer from a frictionless simulator rests on understanding why the policy generalizes; without them the quantitative error metrics (270 ± 80 μm) cannot be fully evaluated for soundness or reproducibility.
- [Abstract and Results] The final sentence of the abstract states that success requires 'the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop,' yet no analysis, failure-mode discussion, or experiments address potential unobservable discrepancies such as visual latency, 3D buckling, or non-holonomic contact effects. This assumption underpins generalization across 24 configurations and 9 specimens and requires explicit support.
minor comments (1)
- [Abstract] The abstract reports error statistics but does not specify the number of trials per configuration or any statistical tests used to compute the ±80 μm variability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We address each major comment below and have revised the manuscript to incorporate additional details and analysis as suggested.
read point-by-point responses
-
Referee: [Methods] The manuscript provides no details on the RL algorithm, reward design, simulation parameters, observation/action spaces, or training procedure. These omissions are load-bearing because the central claim of successful zero-shot transfer from a frictionless simulator rests on understanding why the policy generalizes; without them the quantitative error metrics (270 ± 80 μm) cannot be fully evaluated for soundness or reproducibility.
Authors: We agree that the original manuscript did not provide sufficient detail on these elements, which limits evaluation of the zero-shot transfer claim. In the revised version, we have added an expanded Methods section that specifies the RL algorithm (Proximal Policy Optimization with a standard actor-critic architecture), the reward function (weighted combination of point-wise shape error to target configuration and L2 regularization on gripper actions), simulation parameters (frictionless planar dynamics with fiber modeled as a chain of rigid segments connected by torsional springs, specific stiffness values, and no surface friction), observation space (2D image-plane coordinates of 10 uniformly sampled keypoints along the fiber plus current gripper positions and velocities), action space (commanded velocities for each gripper in the plane), and training procedure (10 million environment steps, learning rate of 3e-4, discount factor 0.99, and batch size details). These additions directly support assessment of why the policy generalizes from the simplified simulator. revision: yes
-
Referee: [Abstract and Results] The final sentence of the abstract states that success requires 'the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop,' yet no analysis, failure-mode discussion, or experiments address potential unobservable discrepancies such as visual latency, 3D buckling, or non-holonomic contact effects. This assumption underpins generalization across 24 configurations and 9 specimens and requires explicit support.
Authors: We concur that the abstract claim would be strengthened by explicit discussion of the assumption. We have added a new subsection titled 'Analysis of Sim-to-Real Discrepancies' in the Discussion. This subsection addresses visual latency by noting that the 40 Hz closed-loop rate (with measured end-to-end latency under 25 ms) permits iterative correction of observed errors; 3D buckling by explaining that the surface constraint and top-down visual feedback keep out-of-plane motion minimal and observable as 2D projection changes; and non-holonomic contact effects by describing how the policy uses continuous visual feedback to adjust rather than relying on precise contact modeling. We also include a failure-mode analysis drawing on the 24 trials, identifying that higher-error cases (still under 400 μm) occurred with initial configurations involving sharp bends, but the closed-loop policy recovered without retraining. This provides the requested support while acknowledging that fully unobservable effects remain a limitation. revision: yes
Circularity Check
No circularity: experimental validation of closed-loop sim-to-real transfer
full rationale
The paper's central result is an empirical demonstration that an RL policy trained in a frictionless simulator transfers zero-shot to physical dual-gripper hardware at 40 Hz, yielding measured point-wise errors of 270 ± 80 μm across 24 initial configurations and sub-millimeter errors across nine fiber specimens of varying diameters and lengths. No mathematical derivation, parameter fitting, or self-referential equation chain is present; the claim rests on direct physical trials that test the observability and correctability of unmodeled surface effects via visual feedback. This is independently falsifiable outside any fitted quantities or self-citations, satisfying the criteria for a self-contained experimental finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Task-relevant effects of sim-to-real mismatch are observable via real-time visual feedback and correctable by the deployed policy.
Reference graph
Works this paper leans on
-
[1]
H. Bettahar et al., ‘Probing Early Particle‐Cell Membrane Interactions via Single‐Cell and Single‐Particle Interaction Analysis’, Adv. Funct. Mater., vol. 35, no. 46, p. 2507301, Nov. 2025, doi: 10.1002/adfm.202507301
-
[2]
S. Hu and D. Sun, ‘Automatic transportation of biological cells with a robot-tweezer manipulation system’, Int. J. Robot. Res., vol. 30, no. 14, pp. 1681–1694, Dec. 2011, doi: 10.1177/0278364911413479
-
[3]
P. A. York, R. Peña, D. Kent, and R. J. Wood, ‘Microrobotic laser steering for minimally invasive surgery’, Sci. Robot., vol. 6, no. 50, p. eabd5476, Jan. 2021, doi: 10.1126/scirobotics.abd5476
-
[4]
F. Tendick, S. S. Sastry, R. S. Fearing, and M. Cohn, ‘Applications of micromechatronics in minimally invasive surgery’, IEEEASME Trans. Mechatron., vol. 3, no. 1, pp. 34 –42, Mar. 1998, doi: 10.1109/3516.662866
-
[5]
H. Bettahar, C. Clévy, N. Courjal, and P. Lutz, ‘Force -Position Photo- Robotic Approach for the High -Accurate Micro-Assembly of Photonic Devices’, IEEE Robot. Autom. Lett., vol. 5, no. 4, pp. 6396 –6402, Oct. 2020, doi: 10.1109/LRA.2020.3014634
-
[6]
L. Song, B. Chang, Y. Feng, J. Jin, and Q. Zhou, ‘Self -Alignment Capillary Gripper for Microfiber Manipulation’, IEEEASME Trans. Mechatron., vol. 28, no. 4, pp. 1957 –1965, Aug. 2023, doi: 10.1109/TMECH.2023.3276064
-
[7]
B. Keller et al. , ‘Optical Coherence Tomography -Guided Robotic Ophthalmic Microsurgery via Reinforcement Learning from Demonstration’, IEEE Trans. Robot. , vol. 36, no. 4, pp. 1207 –1218, Aug. 2020, doi: 10.1109/TRO.2020.2980158
-
[8]
L. Chen, W. Rong, L. Sun, and H. Xie, ‘Micromanipulation robot for automatic fiber alignment’, in IEEE International Conference Mechatronics and Automation, 2005 , Jul. 2005, pp. 1756 -1759 Vol. 4. doi: 10.1109/ICMA.2005.1626825
-
[9]
Y. Long et al. , ‘A Review of Contact -Based Robotic Micromanipulation Systems: Technology and Applications’, J. Intell. Robot. Syst. , vol. 111, no. 3, p. 89, Aug. 2025, doi: 10.1007/s10846 - 025-02299-0
-
[10]
R. S. Fearing, ‘Survey of sticking effects for micro parts handling’, in Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, Aug. 1995, pp. 212 –217 vol.2. doi: 10.1109/IROS.1995.526162
-
[11]
M. Savia and H. N. Koivo, ‘Contact Micromanipulation —Survey of Strategies’, IEEEASME Trans. Mechatron., vol. 14, no. 4, pp. 504–514, Aug. 2009, doi: 10.1109/TMECH.2008.2011986
-
[12]
B. Fang, S. Jia, D. Guo, M. Xu, S. Wen, and F. Sun, ‘Survey of imitation learning for robotic manipulation’, Int. J. Intell. Robot. Appl., vol. 3, no. 4, pp. 362–369, Dec. 2019, doi: 10.1007/s41315 -019-00103-5
-
[13]
Solving Rubik's Cube with a Robot Hand
OpenAI et al., ‘Solving Rubik’s Cube with a Robot Hand’, Oct. 16, 2019, arXiv: arXiv:1910.07113. doi: 10.48550/arXiv.1910.07113
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.07113 2019
-
[14]
E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, ‘Champion -level drone racing using deep reinforcement learning’, Nature, vol. 620, no. 7976, pp. 982–987, Aug. 2023, doi: 10.1038/s41586 -023-06419-4
-
[15]
W. Zhao, J. P. Queralta, and T. Westerlund, ‘Sim -to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey’, in 2020 IEEE Symposium Series on Computational Intelligence (SSCI) , Dec. 2020, pp. 737–744. doi: 10.1109/SSCI47803.2020.9308468
-
[16]
C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Martín-Martín, and P. Stone, ‘Deep Reinforcement Learning for Robotics: A Survey of Real - World Successes’, Annu. Rev. Control Robot. Auton. Syst., vol. 8, no. 1, pp. 153–188, May 2025, doi: 10.1146/annurev-control-030323-022510
-
[17]
H. Gong, Y. Zhang, Y. Liu, Q. Zhao, X. Zhao, and M. Sun, ‘Automatic Cell Rotation Method Based on Deep Reinforcement Learning’, in 2023 IEEE International Conference on Robotics and Automation (ICRA) , London, United Kingdom: IEEE, May 2023, pp. 5452 –5458. doi: 10.1109/ICRA48891.2023.10161043
-
[18]
Y. Zhang et al. , ‘Robotic Cell Micromanipulation for Posture Adjustment of Zebrafish Embryonic Cell’, J. Phys. Conf. Ser. , vol. 3101, no. 1, p. 012014, Sep. 2025, doi: 10.1088/1742 - 6596/3101/1/012014
-
[19]
E. Coumans and Y. Bai, ‘PyBullet, a Python module for physics simulation for games, robotics and machine learning’. 2021 2016. [Online]. Available: http://pybullet.org
work page 2021
-
[20]
2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, pp
I. Marougkas et al. , ‘Integrating Model -Based Control and RL for Sim2Real Transfer of Tight Insertion Policies’, in IEEE International Conference on Robotics and Automation (ICRA) , Atlanta, GA, USA: IEEE, May 2025, pp. 2102 –2109. doi: 10.1109/ICRA55743.2025.11128860
-
[21]
M. Haiderbhai, R. Gondokaryono, A. Wu, and L. A. Kahrs, ‘Sim2Real Rope Cutting With a Surgical Robot Using Vision -Based Reinforcement Learning’, IEEE Trans. Autom. Sci. Eng. , vol. 22, pp. 4354–4365, 2025, doi: 10.1109/TASE.2024.3410297
-
[22]
Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong
E. Todorov, T. Erez, and Y. Tassa, ‘MuJoCo: A physics engine for model-based control’, in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , Vilamoura -Algarve, Portugal: IEEE, Oct. 2012, pp. 5026–5033. doi: 10.1109/IROS.2012.6386109
-
[23]
R. S. Sutton and A. Barto, Reinforcement learning: an introduction , Nachdruck. in Adaptive computation and machine learning. Cambridge, Massachusetts: The MIT Press, 2014
work page 2014
-
[24]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, ‘Proximal Policy Optimization Algorithms’, Aug. 28, 2017, arXiv: arXiv:1707.06347. doi: 10.48550/arXiv.1707.06347
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
-
[25]
A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, ‘Stable -baselines3: Reliable reinforcement learning implementations’, J. Mach. Learn. Res., vol. 22, no. 268, pp. 1–8, 2021
work page 2021
-
[26]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston, ‘Curriculum learning’, in Proceedings of the 26th Annual International Conference on Machine Learning , Montreal Quebec Canada: ACM, Jun. 2009, pp. 41–48. doi: 10.1145/1553374.1553380
-
[27]
Full -Spectrum Out-of-Distribution Detection,
P. Soviany, R. T. Ionescu, P. Rota, and N. Sebe, ‘Curriculum Learning: A Survey’, Int. J. Comput. Vis. , vol. 130, no. 6, pp. 1526 –1565, Jun. 2022, doi: 10.1007/s11263 -022-01611-x
-
[28]
T. Bi, C. Sferrazza, and R. D’Andrea, ‘Zero -Shot Sim-to-Real Transfer of Tactile Control Policies for Aggressive Swing -Up Manipulation’, IEEE Robot. Autom. Lett. , vol. 6, no. 3, pp. 5761 –5768, Jul. 2021, doi: 10.1109/LRA.2021.3084880
- [29]
-
[30]
Available: https://ieeexplore.ieee.org/document/6240859/
[Online]. Available: https://ieeexplore.ieee.org/document/6240859/
-
[31]
Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans
J. Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans. Pattern Anal. Mach. Intell. , vol. PAMI -8, no. 6, pp. 679 –698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851
-
[32]
M. Bergou, M. Wardetzky, S. Robinson, B. Audoly, and E. Grinspun, ‘Discrete elastic rods’, in ACM SIGGRAPH 2008 papers , Los Angeles California: ACM, Aug. 2008, pp. 1 –12. doi: 10.1145/1399504.1360662
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.