pith. sign in

arxiv: 2605.21688 · v1 · pith:SXXYDPWVnew · submitted 2026-05-20 · 💻 cs.RO · cs.SY· eess.SY

Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control

Pith reviewed 2026-05-22 09:20 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY
keywords sim-to-real transferreinforcement learningmicrofiber shape controldeformable object manipulationvisual feedbackmicromanipulationclosed-loop control
0
0 comments X

The pith

A reinforcement learning policy trained only in a frictionless simulator controls real microfiber shapes on a surface using visual feedback, without retraining or adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that geometric shape regulation for microfibers can be learned in a simplified simulator and deployed directly on physical hardware. Real-time visual feedback from a dual-gripper system corrects the unmodeled effects of surface interactions during operation at 40 Hz. This matters because conventional modeling of microscale contacts is unreliable, yet the closed loop allows the policy to achieve consistent accuracy across varied starting shapes and fiber sizes. A sympathetic reader sees a route to autonomous micromanipulation that bypasses detailed physics models.

Core claim

An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 ± 80 μm across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters and three manipulated lengths, the same policy achieves sub-millimeter final shape error.

What carries the argument

Closed-loop sim-to-real RL that trains geometric shape regulation in a frictionless simulator and uses real-time visual feedback to correct observed effects of unmodeled surface interactions.

If this is right

  • The identical policy works across three diameters and three lengths without any retuning.
  • Shape regulation remains repeatable under real surface contact conditions.
  • Operation at 40 Hz is achieved on physical dual-gripper hardware.
  • Simplified simulators suffice for this task when feedback closes the loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feedback-driven correction might apply to other deformable micromanipulation tasks where surface effects dominate.
  • Adding depth sensing or multi-view cameras could further reduce errors in more complex 3D shapes.
  • The approach implies that many contact-rich microscale tasks could avoid domain randomization if visual observability is high.

Load-bearing premise

The task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.

What would settle it

A series of trials on new fiber specimens or initial configurations that produce final shape errors consistently above one millimeter would show the claim of reliable sub-millimeter performance does not hold.

read the original abstract

Autonomous contact-based micromanipulation is challenging because surface and interfacial interactions at the microscale are difficult to model accurately, limiting the use of conventional model-based control and sim-to-real learning. We present a closed-loop sim-to-real reinforcement learning (RL) approach for microfiber shape control on a surface. The central idea is to train geometric shape regulation in a simplified frictionless simulator and rely on real-time visual feedback during deployment to iteratively correct the observed effects of unmodeled surface interactions. An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 $\pm$ 80 $\mu$m across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters (50, 80, and 120 $\mu$m) and three manipulated lengths (10 mm, 15mm, and 20 mm), the same policy achieves sub-millimeter final shape error without any retraining or retuning. These results show that a policy learned in a simplified simulator can achieve repeatable real-world microfiber shape regulation under surface contact, provided that the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a closed-loop sim-to-real reinforcement learning approach for controlling the shape of deformable microfibers on a surface using a dual-gripper micromanipulation system. An RL policy is trained entirely in a simplified frictionless simulator and transferred directly to the physical system operating at 40 Hz without retraining or domain adaptation. Experiments on silk microfibers report a mean point-wise shape error of 270 ± 80 μm across 24 diverse initial configurations and sub-millimeter final shape error across nine specimens with varying diameters (50, 80, 120 μm) and lengths (10, 15, 20 mm).

Significance. If the results hold, the work shows that simplified simulation combined with real-time visual feedback can enable repeatable zero-shot sim-to-real transfer for contact-rich deformable object manipulation at the microscale. This could reduce reliance on complex domain randomization or adaptation in micro-robotics, provided the closed-loop correction reliably handles unmodeled surface effects.

major comments (2)
  1. [Methods] The manuscript provides no details on the RL algorithm, reward design, simulation parameters, observation/action spaces, or training procedure. These omissions are load-bearing because the central claim of successful zero-shot transfer from a frictionless simulator rests on understanding why the policy generalizes; without them the quantitative error metrics (270 ± 80 μm) cannot be fully evaluated for soundness or reproducibility.
  2. [Abstract and Results] The final sentence of the abstract states that success requires 'the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop,' yet no analysis, failure-mode discussion, or experiments address potential unobservable discrepancies such as visual latency, 3D buckling, or non-holonomic contact effects. This assumption underpins generalization across 24 configurations and 9 specimens and requires explicit support.
minor comments (1)
  1. [Abstract] The abstract reports error statistics but does not specify the number of trials per configuration or any statistical tests used to compute the ±80 μm variability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We address each major comment below and have revised the manuscript to incorporate additional details and analysis as suggested.

read point-by-point responses
  1. Referee: [Methods] The manuscript provides no details on the RL algorithm, reward design, simulation parameters, observation/action spaces, or training procedure. These omissions are load-bearing because the central claim of successful zero-shot transfer from a frictionless simulator rests on understanding why the policy generalizes; without them the quantitative error metrics (270 ± 80 μm) cannot be fully evaluated for soundness or reproducibility.

    Authors: We agree that the original manuscript did not provide sufficient detail on these elements, which limits evaluation of the zero-shot transfer claim. In the revised version, we have added an expanded Methods section that specifies the RL algorithm (Proximal Policy Optimization with a standard actor-critic architecture), the reward function (weighted combination of point-wise shape error to target configuration and L2 regularization on gripper actions), simulation parameters (frictionless planar dynamics with fiber modeled as a chain of rigid segments connected by torsional springs, specific stiffness values, and no surface friction), observation space (2D image-plane coordinates of 10 uniformly sampled keypoints along the fiber plus current gripper positions and velocities), action space (commanded velocities for each gripper in the plane), and training procedure (10 million environment steps, learning rate of 3e-4, discount factor 0.99, and batch size details). These additions directly support assessment of why the policy generalizes from the simplified simulator. revision: yes

  2. Referee: [Abstract and Results] The final sentence of the abstract states that success requires 'the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop,' yet no analysis, failure-mode discussion, or experiments address potential unobservable discrepancies such as visual latency, 3D buckling, or non-holonomic contact effects. This assumption underpins generalization across 24 configurations and 9 specimens and requires explicit support.

    Authors: We concur that the abstract claim would be strengthened by explicit discussion of the assumption. We have added a new subsection titled 'Analysis of Sim-to-Real Discrepancies' in the Discussion. This subsection addresses visual latency by noting that the 40 Hz closed-loop rate (with measured end-to-end latency under 25 ms) permits iterative correction of observed errors; 3D buckling by explaining that the surface constraint and top-down visual feedback keep out-of-plane motion minimal and observable as 2D projection changes; and non-holonomic contact effects by describing how the policy uses continuous visual feedback to adjust rather than relying on precise contact modeling. We also include a failure-mode analysis drawing on the 24 trials, identifying that higher-error cases (still under 400 μm) occurred with initial configurations involving sharp bends, but the closed-loop policy recovered without retraining. This provides the requested support while acknowledging that fully unobservable effects remain a limitation. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental validation of closed-loop sim-to-real transfer

full rationale

The paper's central result is an empirical demonstration that an RL policy trained in a frictionless simulator transfers zero-shot to physical dual-gripper hardware at 40 Hz, yielding measured point-wise errors of 270 ± 80 μm across 24 initial configurations and sub-millimeter errors across nine fiber specimens of varying diameters and lengths. No mathematical derivation, parameter fitting, or self-referential equation chain is present; the claim rests on direct physical trials that test the observability and correctability of unmodeled surface effects via visual feedback. This is independently falsifiable outside any fitted quantities or self-citations, satisfying the criteria for a self-contained experimental finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the approach rests on the domain assumption that visual feedback can observe and correct sim-to-real discrepancies; no explicit free parameters, invented entities, or additional axioms are stated.

axioms (1)
  • domain assumption Task-relevant effects of sim-to-real mismatch are observable via real-time visual feedback and correctable by the deployed policy.
    This premise is invoked in the abstract's concluding sentence as the condition under which the simplified-simulator policy succeeds.

pith-pipeline@v0.9.0 · 5794 in / 1317 out tokens · 56097 ms · 2026-05-22T09:20:27.763587+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 2 internal anchors

  1. [1]

    Bettahar et al., ‘Probing Early Particle‐Cell Membrane Interactions via Single‐Cell and Single‐Particle Interaction Analysis’, Adv

    H. Bettahar et al., ‘Probing Early Particle‐Cell Membrane Interactions via Single‐Cell and Single‐Particle Interaction Analysis’, Adv. Funct. Mater., vol. 35, no. 46, p. 2507301, Nov. 2025, doi: 10.1002/adfm.202507301

  2. [2]

    Hu and D

    S. Hu and D. Sun, ‘Automatic transportation of biological cells with a robot-tweezer manipulation system’, Int. J. Robot. Res., vol. 30, no. 14, pp. 1681–1694, Dec. 2011, doi: 10.1177/0278364911413479

  3. [3]

    P. A. York, R. Peña, D. Kent, and R. J. Wood, ‘Microrobotic laser steering for minimally invasive surgery’, Sci. Robot., vol. 6, no. 50, p. eabd5476, Jan. 2021, doi: 10.1126/scirobotics.abd5476

  4. [4]

    Tendick, S

    F. Tendick, S. S. Sastry, R. S. Fearing, and M. Cohn, ‘Applications of micromechatronics in minimally invasive surgery’, IEEEASME Trans. Mechatron., vol. 3, no. 1, pp. 34 –42, Mar. 1998, doi: 10.1109/3516.662866

  5. [5]

    Bettahar, C

    H. Bettahar, C. Clévy, N. Courjal, and P. Lutz, ‘Force -Position Photo- Robotic Approach for the High -Accurate Micro-Assembly of Photonic Devices’, IEEE Robot. Autom. Lett., vol. 5, no. 4, pp. 6396 –6402, Oct. 2020, doi: 10.1109/LRA.2020.3014634

  6. [6]

    L. Song, B. Chang, Y. Feng, J. Jin, and Q. Zhou, ‘Self -Alignment Capillary Gripper for Microfiber Manipulation’, IEEEASME Trans. Mechatron., vol. 28, no. 4, pp. 1957 –1965, Aug. 2023, doi: 10.1109/TMECH.2023.3276064

  7. [7]

    Keller et al

    B. Keller et al. , ‘Optical Coherence Tomography -Guided Robotic Ophthalmic Microsurgery via Reinforcement Learning from Demonstration’, IEEE Trans. Robot. , vol. 36, no. 4, pp. 1207 –1218, Aug. 2020, doi: 10.1109/TRO.2020.2980158

  8. [8]

    L. Chen, W. Rong, L. Sun, and H. Xie, ‘Micromanipulation robot for automatic fiber alignment’, in IEEE International Conference Mechatronics and Automation, 2005 , Jul. 2005, pp. 1756 -1759 Vol. 4. doi: 10.1109/ICMA.2005.1626825

  9. [9]

    Long et al

    Y. Long et al. , ‘A Review of Contact -Based Robotic Micromanipulation Systems: Technology and Applications’, J. Intell. Robot. Syst. , vol. 111, no. 3, p. 89, Aug. 2025, doi: 10.1007/s10846 - 025-02299-0

  10. [10]

    R. S. Fearing, ‘Survey of sticking effects for micro parts handling’, in Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, Aug. 1995, pp. 212 –217 vol.2. doi: 10.1109/IROS.1995.526162

  11. [11]

    Savia and H

    M. Savia and H. N. Koivo, ‘Contact Micromanipulation —Survey of Strategies’, IEEEASME Trans. Mechatron., vol. 14, no. 4, pp. 504–514, Aug. 2009, doi: 10.1109/TMECH.2008.2011986

  12. [12]

    B. Fang, S. Jia, D. Guo, M. Xu, S. Wen, and F. Sun, ‘Survey of imitation learning for robotic manipulation’, Int. J. Intell. Robot. Appl., vol. 3, no. 4, pp. 362–369, Dec. 2019, doi: 10.1007/s41315 -019-00103-5

  13. [13]

    Solving Rubik's Cube with a Robot Hand

    OpenAI et al., ‘Solving Rubik’s Cube with a Robot Hand’, Oct. 16, 2019, arXiv: arXiv:1910.07113. doi: 10.48550/arXiv.1910.07113

  14. [14]

    Kaufmann, L

    E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, ‘Champion -level drone racing using deep reinforcement learning’, Nature, vol. 620, no. 7976, pp. 982–987, Aug. 2023, doi: 10.1038/s41586 -023-06419-4

  15. [15]

    W. Zhao, J. P. Queralta, and T. Westerlund, ‘Sim -to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey’, in 2020 IEEE Symposium Series on Computational Intelligence (SSCI) , Dec. 2020, pp. 737–744. doi: 10.1109/SSCI47803.2020.9308468

  16. [16]

    C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Martín-Martín, and P. Stone, ‘Deep Reinforcement Learning for Robotics: A Survey of Real - World Successes’, Annu. Rev. Control Robot. Auton. Syst., vol. 8, no. 1, pp. 153–188, May 2025, doi: 10.1146/annurev-control-030323-022510

  17. [17]

    H. Gong, Y. Zhang, Y. Liu, Q. Zhao, X. Zhao, and M. Sun, ‘Automatic Cell Rotation Method Based on Deep Reinforcement Learning’, in 2023 IEEE International Conference on Robotics and Automation (ICRA) , London, United Kingdom: IEEE, May 2023, pp. 5452 –5458. doi: 10.1109/ICRA48891.2023.10161043

  18. [18]

    Zhang et al

    Y. Zhang et al. , ‘Robotic Cell Micromanipulation for Posture Adjustment of Zebrafish Embryonic Cell’, J. Phys. Conf. Ser. , vol. 3101, no. 1, p. 012014, Sep. 2025, doi: 10.1088/1742 - 6596/3101/1/012014

  19. [19]

    Coumans and Y

    E. Coumans and Y. Bai, ‘PyBullet, a Python module for physics simulation for games, robotics and machine learning’. 2021 2016. [Online]. Available: http://pybullet.org

  20. [20]

    2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, pp

    I. Marougkas et al. , ‘Integrating Model -Based Control and RL for Sim2Real Transfer of Tight Insertion Policies’, in IEEE International Conference on Robotics and Automation (ICRA) , Atlanta, GA, USA: IEEE, May 2025, pp. 2102 –2109. doi: 10.1109/ICRA55743.2025.11128860

  21. [21]

    Haiderbhai, R

    M. Haiderbhai, R. Gondokaryono, A. Wu, and L. A. Kahrs, ‘Sim2Real Rope Cutting With a Surgical Robot Using Vision -Based Reinforcement Learning’, IEEE Trans. Autom. Sci. Eng. , vol. 22, pp. 4354–4365, 2025, doi: 10.1109/TASE.2024.3410297

  22. [22]

    Todorov, T

    E. Todorov, T. Erez, and Y. Tassa, ‘MuJoCo: A physics engine for model-based control’, in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , Vilamoura -Algarve, Portugal: IEEE, Oct. 2012, pp. 5026–5033. doi: 10.1109/IROS.2012.6386109

  23. [23]

    R. S. Sutton and A. Barto, Reinforcement learning: an introduction , Nachdruck. in Adaptive computation and machine learning. Cambridge, Massachusetts: The MIT Press, 2014

  24. [24]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, ‘Proximal Policy Optimization Algorithms’, Aug. 28, 2017, arXiv: arXiv:1707.06347. doi: 10.48550/arXiv.1707.06347

  25. [25]

    A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, ‘Stable -baselines3: Reliable reinforcement learning implementations’, J. Mach. Learn. Res., vol. 22, no. 268, pp. 1–8, 2021

  26. [26]

    Curriculum learning,

    Y. Bengio, J. Louradour, R. Collobert, and J. Weston, ‘Curriculum learning’, in Proceedings of the 26th Annual International Conference on Machine Learning , Montreal Quebec Canada: ACM, Jun. 2009, pp. 41–48. doi: 10.1145/1553374.1553380

  27. [27]

    Full -Spectrum Out-of-Distribution Detection,

    P. Soviany, R. T. Ionescu, P. Rota, and N. Sebe, ‘Curriculum Learning: A Survey’, Int. J. Comput. Vis. , vol. 130, no. 6, pp. 1526 –1565, Jun. 2022, doi: 10.1007/s11263 -022-01611-x

  28. [28]

    T. Bi, C. Sferrazza, and R. D’Andrea, ‘Zero -Shot Sim-to-Real Transfer of Tactile Control Policies for Aggressive Swing -Up Manipulation’, IEEE Robot. Autom. Lett. , vol. 6, no. 3, pp. 5761 –5768, Jul. 2021, doi: 10.1109/LRA.2021.3084880

  29. [29]

    Culjak, D

    I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, ‘A brief introduction to OpenCV’, in 2012 Proceedings of the 35th International Convention MIPRO , May 2012, pp. 1725 –1730. Accessed: Mar. 31,

  30. [30]

    Available: https://ieeexplore.ieee.org/document/6240859/

    [Online]. Available: https://ieeexplore.ieee.org/document/6240859/

  31. [31]

    Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans

    J. Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans. Pattern Anal. Mach. Intell. , vol. PAMI -8, no. 6, pp. 679 –698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851

  32. [32]

    Bergou, M

    M. Bergou, M. Wardetzky, S. Robinson, B. Audoly, and E. Grinspun, ‘Discrete elastic rods’, in ACM SIGGRAPH 2008 papers , Los Angeles California: ACM, Aug. 2008, pp. 1 –12. doi: 10.1145/1399504.1360662