Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control

Alessandro Amici; Houari Bettahar; Quan Zhou; Veeti Jaakkola

arxiv: 2605.21688 · v1 · pith:SXXYDPWVnew · submitted 2026-05-20 · 💻 cs.RO · cs.SY· eess.SY

Closed-Loop Sim-to-Real Reinforcement Learning for Deformable Microfiber Shape Control

Alessandro Amici , Houari Bettahar , Veeti Jaakkola , Quan Zhou This is my paper

Pith reviewed 2026-05-22 09:20 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords sim-to-real transferreinforcement learningmicrofiber shape controldeformable object manipulationvisual feedbackmicromanipulationclosed-loop control

0 comments

The pith

A reinforcement learning policy trained only in a frictionless simulator controls real microfiber shapes on a surface using visual feedback, without retraining or adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that geometric shape regulation for microfibers can be learned in a simplified simulator and deployed directly on physical hardware. Real-time visual feedback from a dual-gripper system corrects the unmodeled effects of surface interactions during operation at 40 Hz. This matters because conventional modeling of microscale contacts is unreliable, yet the closed loop allows the policy to achieve consistent accuracy across varied starting shapes and fiber sizes. A sympathetic reader sees a route to autonomous micromanipulation that bypasses detailed physics models.

Core claim

An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 ± 80 μm across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters and three manipulated lengths, the same policy achieves sub-millimeter final shape error.

What carries the argument

Closed-loop sim-to-real RL that trains geometric shape regulation in a frictionless simulator and uses real-time visual feedback to correct observed effects of unmodeled surface interactions.

If this is right

The identical policy works across three diameters and three lengths without any retuning.
Shape regulation remains repeatable under real surface contact conditions.
Operation at 40 Hz is achieved on physical dual-gripper hardware.
Simplified simulators suffice for this task when feedback closes the loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same feedback-driven correction might apply to other deformable micromanipulation tasks where surface effects dominate.
Adding depth sensing or multi-view cameras could further reduce errors in more complex 3D shapes.
The approach implies that many contact-rich microscale tasks could avoid domain randomization if visual observability is high.

Load-bearing premise

The task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.

What would settle it

A series of trials on new fiber specimens or initial configurations that produce final shape errors consistently above one millimeter would show the claim of reliable sub-millimeter performance does not hold.

read the original abstract

Autonomous contact-based micromanipulation is challenging because surface and interfacial interactions at the microscale are difficult to model accurately, limiting the use of conventional model-based control and sim-to-real learning. We present a closed-loop sim-to-real reinforcement learning (RL) approach for microfiber shape control on a surface. The central idea is to train geometric shape regulation in a simplified frictionless simulator and rely on real-time visual feedback during deployment to iteratively correct the observed effects of unmodeled surface interactions. An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 $\pm$ 80 $\mu$m across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters (50, 80, and 120 $\mu$m) and three manipulated lengths (10 mm, 15mm, and 20 mm), the same policy achieves sub-millimeter final shape error without any retraining or retuning. These results show that a policy learned in a simplified simulator can achieve repeatable real-world microfiber shape regulation under surface contact, provided that the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows a sim-trained RL policy can control real microfiber shapes zero-shot via visual feedback at 40 Hz, hitting sub-mm errors across varied specimens without adaptation.

read the letter

This paper's main result is a zero-shot transfer of an RL policy from a frictionless simulator to real microfiber shape control on a dual-gripper setup, using visual feedback at 40 Hz to achieve mean errors of about 270 microns across 24 configs and sub-mm on nine different specimens. They do a good job showing that you can train for the geometry in sim and let the closed loop sort out the surface interactions that are tough to model. The experiments with varied initial conditions and fiber dimensions (50 to 120 μm diameter, 10 to 20 mm length) provide decent evidence that the method works without retraining or adaptation. That's useful for micro-robotics where contact models are unreliable. The approach is straightforward and the quantitative metrics are reported, which is better than many sim-to-real papers that stay in simulation. One soft spot is that the success relies on the visual feedback capturing all the important discrepancies, like any sticking or out-of-plane effects. The paper's results indicate it worked for these silk fibers, but if the observation space misses something or if conditions change, the policy might not correct it. More on how they validated the observation sufficiency would strengthen it. Another minor point is that without the full methods section details on the RL algorithm and reward, it's a bit hard to judge reproducibility right away, though the abstract gives the high-level idea. This is the sort of applied paper that would interest people in precision manipulation and learning-based control for deformable objects. A reader in that area could pick up practical insights on using feedback to bridge sim gaps. It should go to peer review. The empirical work is there and the claim is testable.

Referee Report

2 major / 1 minor

Summary. The paper proposes a closed-loop sim-to-real reinforcement learning approach for controlling the shape of deformable microfibers on a surface using a dual-gripper micromanipulation system. An RL policy is trained entirely in a simplified frictionless simulator and transferred directly to the physical system operating at 40 Hz without retraining or domain adaptation. Experiments on silk microfibers report a mean point-wise shape error of 270 ± 80 μm across 24 diverse initial configurations and sub-millimeter final shape error across nine specimens with varying diameters (50, 80, 120 μm) and lengths (10, 15, 20 mm).

Significance. If the results hold, the work shows that simplified simulation combined with real-time visual feedback can enable repeatable zero-shot sim-to-real transfer for contact-rich deformable object manipulation at the microscale. This could reduce reliance on complex domain randomization or adaptation in micro-robotics, provided the closed-loop correction reliably handles unmodeled surface effects.

major comments (2)

[Methods] The manuscript provides no details on the RL algorithm, reward design, simulation parameters, observation/action spaces, or training procedure. These omissions are load-bearing because the central claim of successful zero-shot transfer from a frictionless simulator rests on understanding why the policy generalizes; without them the quantitative error metrics (270 ± 80 μm) cannot be fully evaluated for soundness or reproducibility.
[Abstract and Results] The final sentence of the abstract states that success requires 'the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop,' yet no analysis, failure-mode discussion, or experiments address potential unobservable discrepancies such as visual latency, 3D buckling, or non-holonomic contact effects. This assumption underpins generalization across 24 configurations and 9 specimens and requires explicit support.

minor comments (1)

[Abstract] The abstract reports error statistics but does not specify the number of trials per configuration or any statistical tests used to compute the ±80 μm variability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We address each major comment below and have revised the manuscript to incorporate additional details and analysis as suggested.

read point-by-point responses

Referee: [Methods] The manuscript provides no details on the RL algorithm, reward design, simulation parameters, observation/action spaces, or training procedure. These omissions are load-bearing because the central claim of successful zero-shot transfer from a frictionless simulator rests on understanding why the policy generalizes; without them the quantitative error metrics (270 ± 80 μm) cannot be fully evaluated for soundness or reproducibility.

Authors: We agree that the original manuscript did not provide sufficient detail on these elements, which limits evaluation of the zero-shot transfer claim. In the revised version, we have added an expanded Methods section that specifies the RL algorithm (Proximal Policy Optimization with a standard actor-critic architecture), the reward function (weighted combination of point-wise shape error to target configuration and L2 regularization on gripper actions), simulation parameters (frictionless planar dynamics with fiber modeled as a chain of rigid segments connected by torsional springs, specific stiffness values, and no surface friction), observation space (2D image-plane coordinates of 10 uniformly sampled keypoints along the fiber plus current gripper positions and velocities), action space (commanded velocities for each gripper in the plane), and training procedure (10 million environment steps, learning rate of 3e-4, discount factor 0.99, and batch size details). These additions directly support assessment of why the policy generalizes from the simplified simulator. revision: yes
Referee: [Abstract and Results] The final sentence of the abstract states that success requires 'the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop,' yet no analysis, failure-mode discussion, or experiments address potential unobservable discrepancies such as visual latency, 3D buckling, or non-holonomic contact effects. This assumption underpins generalization across 24 configurations and 9 specimens and requires explicit support.

Authors: We concur that the abstract claim would be strengthened by explicit discussion of the assumption. We have added a new subsection titled 'Analysis of Sim-to-Real Discrepancies' in the Discussion. This subsection addresses visual latency by noting that the 40 Hz closed-loop rate (with measured end-to-end latency under 25 ms) permits iterative correction of observed errors; 3D buckling by explaining that the surface constraint and top-down visual feedback keep out-of-plane motion minimal and observable as 2D projection changes; and non-holonomic contact effects by describing how the policy uses continuous visual feedback to adjust rather than relying on precise contact modeling. We also include a failure-mode analysis drawing on the 24 trials, identifying that higher-error cases (still under 400 μm) occurred with initial configurations involving sharp bends, but the closed-loop policy recovered without retraining. This provides the requested support while acknowledging that fully unobservable effects remain a limitation. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental validation of closed-loop sim-to-real transfer

full rationale

The paper's central result is an empirical demonstration that an RL policy trained in a frictionless simulator transfers zero-shot to physical dual-gripper hardware at 40 Hz, yielding measured point-wise errors of 270 ± 80 μm across 24 initial configurations and sub-millimeter errors across nine fiber specimens of varying diameters and lengths. No mathematical derivation, parameter fitting, or self-referential equation chain is present; the claim rests on direct physical trials that test the observability and correctability of unmodeled surface effects via visual feedback. This is independently falsifiable outside any fitted quantities or self-citations, satisfying the criteria for a self-contained experimental finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the approach rests on the domain assumption that visual feedback can observe and correct sim-to-real discrepancies; no explicit free parameters, invented entities, or additional axioms are stated.

axioms (1)

domain assumption Task-relevant effects of sim-to-real mismatch are observable via real-time visual feedback and correctable by the deployed policy.
This premise is invoked in the abstract's concluding sentence as the condition under which the simplified-simulator policy succeeds.

pith-pipeline@v0.9.0 · 5794 in / 1317 out tokens · 56097 ms · 2026-05-22T09:20:27.763587+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 2 internal anchors

[1]

Bettahar et al., ‘Probing Early Particle‐Cell Membrane Interactions via Single‐Cell and Single‐Particle Interaction Analysis’, Adv

H. Bettahar et al., ‘Probing Early Particle‐Cell Membrane Interactions via Single‐Cell and Single‐Particle Interaction Analysis’, Adv. Funct. Mater., vol. 35, no. 46, p. 2507301, Nov. 2025, doi: 10.1002/adfm.202507301

work page doi:10.1002/adfm.202507301 2025
[2]

Hu and D

S. Hu and D. Sun, ‘Automatic transportation of biological cells with a robot-tweezer manipulation system’, Int. J. Robot. Res., vol. 30, no. 14, pp. 1681–1694, Dec. 2011, doi: 10.1177/0278364911413479

work page doi:10.1177/0278364911413479 2011
[3]

P. A. York, R. Peña, D. Kent, and R. J. Wood, ‘Microrobotic laser steering for minimally invasive surgery’, Sci. Robot., vol. 6, no. 50, p. eabd5476, Jan. 2021, doi: 10.1126/scirobotics.abd5476

work page doi:10.1126/scirobotics.abd5476 2021
[4]

Tendick, S

F. Tendick, S. S. Sastry, R. S. Fearing, and M. Cohn, ‘Applications of micromechatronics in minimally invasive surgery’, IEEEASME Trans. Mechatron., vol. 3, no. 1, pp. 34 –42, Mar. 1998, doi: 10.1109/3516.662866

work page doi:10.1109/3516.662866 1998
[5]

Bettahar, C

H. Bettahar, C. Clévy, N. Courjal, and P. Lutz, ‘Force -Position Photo- Robotic Approach for the High -Accurate Micro-Assembly of Photonic Devices’, IEEE Robot. Autom. Lett., vol. 5, no. 4, pp. 6396 –6402, Oct. 2020, doi: 10.1109/LRA.2020.3014634

work page doi:10.1109/lra.2020.3014634 2020
[6]

L. Song, B. Chang, Y. Feng, J. Jin, and Q. Zhou, ‘Self -Alignment Capillary Gripper for Microfiber Manipulation’, IEEEASME Trans. Mechatron., vol. 28, no. 4, pp. 1957 –1965, Aug. 2023, doi: 10.1109/TMECH.2023.3276064

work page doi:10.1109/tmech.2023.3276064 1957
[7]

Keller et al

B. Keller et al. , ‘Optical Coherence Tomography -Guided Robotic Ophthalmic Microsurgery via Reinforcement Learning from Demonstration’, IEEE Trans. Robot. , vol. 36, no. 4, pp. 1207 –1218, Aug. 2020, doi: 10.1109/TRO.2020.2980158

work page doi:10.1109/tro.2020.2980158 2020
[8]

L. Chen, W. Rong, L. Sun, and H. Xie, ‘Micromanipulation robot for automatic fiber alignment’, in IEEE International Conference Mechatronics and Automation, 2005 , Jul. 2005, pp. 1756 -1759 Vol. 4. doi: 10.1109/ICMA.2005.1626825

work page doi:10.1109/icma.2005.1626825 2005
[9]

Long et al

Y. Long et al. , ‘A Review of Contact -Based Robotic Micromanipulation Systems: Technology and Applications’, J. Intell. Robot. Syst. , vol. 111, no. 3, p. 89, Aug. 2025, doi: 10.1007/s10846 - 025-02299-0

work page doi:10.1007/s10846 2025
[10]

R. S. Fearing, ‘Survey of sticking effects for micro parts handling’, in Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, Aug. 1995, pp. 212 –217 vol.2. doi: 10.1109/IROS.1995.526162

work page doi:10.1109/iros.1995.526162 1995
[11]

Savia and H

M. Savia and H. N. Koivo, ‘Contact Micromanipulation —Survey of Strategies’, IEEEASME Trans. Mechatron., vol. 14, no. 4, pp. 504–514, Aug. 2009, doi: 10.1109/TMECH.2008.2011986

work page doi:10.1109/tmech.2008.2011986 2009
[12]

B. Fang, S. Jia, D. Guo, M. Xu, S. Wen, and F. Sun, ‘Survey of imitation learning for robotic manipulation’, Int. J. Intell. Robot. Appl., vol. 3, no. 4, pp. 362–369, Dec. 2019, doi: 10.1007/s41315 -019-00103-5

work page doi:10.1007/s41315 2019
[13]

Solving Rubik's Cube with a Robot Hand

OpenAI et al., ‘Solving Rubik’s Cube with a Robot Hand’, Oct. 16, 2019, arXiv: arXiv:1910.07113. doi: 10.48550/arXiv.1910.07113

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.07113 2019
[14]

Kaufmann, L

E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, ‘Champion -level drone racing using deep reinforcement learning’, Nature, vol. 620, no. 7976, pp. 982–987, Aug. 2023, doi: 10.1038/s41586 -023-06419-4

work page doi:10.1038/s41586 2023
[15]

W. Zhao, J. P. Queralta, and T. Westerlund, ‘Sim -to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey’, in 2020 IEEE Symposium Series on Computational Intelligence (SSCI) , Dec. 2020, pp. 737–744. doi: 10.1109/SSCI47803.2020.9308468

work page doi:10.1109/ssci47803.2020.9308468 2020
[16]

C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Martín-Martín, and P. Stone, ‘Deep Reinforcement Learning for Robotics: A Survey of Real - World Successes’, Annu. Rev. Control Robot. Auton. Syst., vol. 8, no. 1, pp. 153–188, May 2025, doi: 10.1146/annurev-control-030323-022510

work page doi:10.1146/annurev-control-030323-022510 2025
[17]

H. Gong, Y. Zhang, Y. Liu, Q. Zhao, X. Zhao, and M. Sun, ‘Automatic Cell Rotation Method Based on Deep Reinforcement Learning’, in 2023 IEEE International Conference on Robotics and Automation (ICRA) , London, United Kingdom: IEEE, May 2023, pp. 5452 –5458. doi: 10.1109/ICRA48891.2023.10161043

work page doi:10.1109/icra48891.2023.10161043 2023
[18]

Zhang et al

Y. Zhang et al. , ‘Robotic Cell Micromanipulation for Posture Adjustment of Zebrafish Embryonic Cell’, J. Phys. Conf. Ser. , vol. 3101, no. 1, p. 012014, Sep. 2025, doi: 10.1088/1742 - 6596/3101/1/012014

work page doi:10.1088/1742 2025
[19]

Coumans and Y

E. Coumans and Y. Bai, ‘PyBullet, a Python module for physics simulation for games, robotics and machine learning’. 2021 2016. [Online]. Available: http://pybullet.org

work page 2021
[20]

2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, pp

I. Marougkas et al. , ‘Integrating Model -Based Control and RL for Sim2Real Transfer of Tight Insertion Policies’, in IEEE International Conference on Robotics and Automation (ICRA) , Atlanta, GA, USA: IEEE, May 2025, pp. 2102 –2109. doi: 10.1109/ICRA55743.2025.11128860

work page doi:10.1109/icra55743.2025.11128860 2025
[21]

Haiderbhai, R

M. Haiderbhai, R. Gondokaryono, A. Wu, and L. A. Kahrs, ‘Sim2Real Rope Cutting With a Surgical Robot Using Vision -Based Reinforcement Learning’, IEEE Trans. Autom. Sci. Eng. , vol. 22, pp. 4354–4365, 2025, doi: 10.1109/TASE.2024.3410297

work page doi:10.1109/tase.2024.3410297 2025
[22]

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong

E. Todorov, T. Erez, and Y. Tassa, ‘MuJoCo: A physics engine for model-based control’, in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , Vilamoura -Algarve, Portugal: IEEE, Oct. 2012, pp. 5026–5033. doi: 10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012
[23]

R. S. Sutton and A. Barto, Reinforcement learning: an introduction , Nachdruck. in Adaptive computation and machine learning. Cambridge, Massachusetts: The MIT Press, 2014

work page 2014
[24]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, ‘Proximal Policy Optimization Algorithms’, Aug. 28, 2017, arXiv: arXiv:1707.06347. doi: 10.48550/arXiv.1707.06347

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
[25]

A. Raﬃn, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, ‘Stable -baselines3: Reliable reinforcement learning implementations’, J. Mach. Learn. Res., vol. 22, no. 268, pp. 1–8, 2021

work page 2021
[26]

Curriculum learning,

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, ‘Curriculum learning’, in Proceedings of the 26th Annual International Conference on Machine Learning , Montreal Quebec Canada: ACM, Jun. 2009, pp. 41–48. doi: 10.1145/1553374.1553380

work page doi:10.1145/1553374.1553380 2009
[27]

Full -Spectrum Out-of-Distribution Detection,

P. Soviany, R. T. Ionescu, P. Rota, and N. Sebe, ‘Curriculum Learning: A Survey’, Int. J. Comput. Vis. , vol. 130, no. 6, pp. 1526 –1565, Jun. 2022, doi: 10.1007/s11263 -022-01611-x

work page doi:10.1007/s11263 2022
[28]

T. Bi, C. Sferrazza, and R. D’Andrea, ‘Zero -Shot Sim-to-Real Transfer of Tactile Control Policies for Aggressive Swing -Up Manipulation’, IEEE Robot. Autom. Lett. , vol. 6, no. 3, pp. 5761 –5768, Jul. 2021, doi: 10.1109/LRA.2021.3084880

work page doi:10.1109/lra.2021.3084880 2021
[29]

Culjak, D

I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, ‘A brief introduction to OpenCV’, in 2012 Proceedings of the 35th International Convention MIPRO , May 2012, pp. 1725 –1730. Accessed: Mar. 31,

work page 2012
[30]

Available: https://ieeexplore.ieee.org/document/6240859/

[Online]. Available: https://ieeexplore.ieee.org/document/6240859/

work page arXiv
[31]

Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans

J. Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans. Pattern Anal. Mach. Intell. , vol. PAMI -8, no. 6, pp. 679 –698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851

work page doi:10.1109/tpami.1986.4767851 1986
[32]

Bergou, M

M. Bergou, M. Wardetzky, S. Robinson, B. Audoly, and E. Grinspun, ‘Discrete elastic rods’, in ACM SIGGRAPH 2008 papers , Los Angeles California: ACM, Aug. 2008, pp. 1 –12. doi: 10.1145/1399504.1360662

work page doi:10.1145/1399504.1360662 2008

[1] [1]

Bettahar et al., ‘Probing Early Particle‐Cell Membrane Interactions via Single‐Cell and Single‐Particle Interaction Analysis’, Adv

H. Bettahar et al., ‘Probing Early Particle‐Cell Membrane Interactions via Single‐Cell and Single‐Particle Interaction Analysis’, Adv. Funct. Mater., vol. 35, no. 46, p. 2507301, Nov. 2025, doi: 10.1002/adfm.202507301

work page doi:10.1002/adfm.202507301 2025

[2] [2]

Hu and D

S. Hu and D. Sun, ‘Automatic transportation of biological cells with a robot-tweezer manipulation system’, Int. J. Robot. Res., vol. 30, no. 14, pp. 1681–1694, Dec. 2011, doi: 10.1177/0278364911413479

work page doi:10.1177/0278364911413479 2011

[3] [3]

P. A. York, R. Peña, D. Kent, and R. J. Wood, ‘Microrobotic laser steering for minimally invasive surgery’, Sci. Robot., vol. 6, no. 50, p. eabd5476, Jan. 2021, doi: 10.1126/scirobotics.abd5476

work page doi:10.1126/scirobotics.abd5476 2021

[4] [4]

Tendick, S

F. Tendick, S. S. Sastry, R. S. Fearing, and M. Cohn, ‘Applications of micromechatronics in minimally invasive surgery’, IEEEASME Trans. Mechatron., vol. 3, no. 1, pp. 34 –42, Mar. 1998, doi: 10.1109/3516.662866

work page doi:10.1109/3516.662866 1998

[5] [5]

Bettahar, C

H. Bettahar, C. Clévy, N. Courjal, and P. Lutz, ‘Force -Position Photo- Robotic Approach for the High -Accurate Micro-Assembly of Photonic Devices’, IEEE Robot. Autom. Lett., vol. 5, no. 4, pp. 6396 –6402, Oct. 2020, doi: 10.1109/LRA.2020.3014634

work page doi:10.1109/lra.2020.3014634 2020

[6] [6]

L. Song, B. Chang, Y. Feng, J. Jin, and Q. Zhou, ‘Self -Alignment Capillary Gripper for Microfiber Manipulation’, IEEEASME Trans. Mechatron., vol. 28, no. 4, pp. 1957 –1965, Aug. 2023, doi: 10.1109/TMECH.2023.3276064

work page doi:10.1109/tmech.2023.3276064 1957

[7] [7]

Keller et al

B. Keller et al. , ‘Optical Coherence Tomography -Guided Robotic Ophthalmic Microsurgery via Reinforcement Learning from Demonstration’, IEEE Trans. Robot. , vol. 36, no. 4, pp. 1207 –1218, Aug. 2020, doi: 10.1109/TRO.2020.2980158

work page doi:10.1109/tro.2020.2980158 2020

[8] [8]

L. Chen, W. Rong, L. Sun, and H. Xie, ‘Micromanipulation robot for automatic fiber alignment’, in IEEE International Conference Mechatronics and Automation, 2005 , Jul. 2005, pp. 1756 -1759 Vol. 4. doi: 10.1109/ICMA.2005.1626825

work page doi:10.1109/icma.2005.1626825 2005

[9] [9]

Long et al

Y. Long et al. , ‘A Review of Contact -Based Robotic Micromanipulation Systems: Technology and Applications’, J. Intell. Robot. Syst. , vol. 111, no. 3, p. 89, Aug. 2025, doi: 10.1007/s10846 - 025-02299-0

work page doi:10.1007/s10846 2025

[10] [10]

R. S. Fearing, ‘Survey of sticking effects for micro parts handling’, in Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, Aug. 1995, pp. 212 –217 vol.2. doi: 10.1109/IROS.1995.526162

work page doi:10.1109/iros.1995.526162 1995

[11] [11]

Savia and H

M. Savia and H. N. Koivo, ‘Contact Micromanipulation —Survey of Strategies’, IEEEASME Trans. Mechatron., vol. 14, no. 4, pp. 504–514, Aug. 2009, doi: 10.1109/TMECH.2008.2011986

work page doi:10.1109/tmech.2008.2011986 2009

[12] [12]

B. Fang, S. Jia, D. Guo, M. Xu, S. Wen, and F. Sun, ‘Survey of imitation learning for robotic manipulation’, Int. J. Intell. Robot. Appl., vol. 3, no. 4, pp. 362–369, Dec. 2019, doi: 10.1007/s41315 -019-00103-5

work page doi:10.1007/s41315 2019

[13] [13]

Solving Rubik's Cube with a Robot Hand

OpenAI et al., ‘Solving Rubik’s Cube with a Robot Hand’, Oct. 16, 2019, arXiv: arXiv:1910.07113. doi: 10.48550/arXiv.1910.07113

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.07113 2019

[14] [14]

Kaufmann, L

E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, ‘Champion -level drone racing using deep reinforcement learning’, Nature, vol. 620, no. 7976, pp. 982–987, Aug. 2023, doi: 10.1038/s41586 -023-06419-4

work page doi:10.1038/s41586 2023

[15] [15]

W. Zhao, J. P. Queralta, and T. Westerlund, ‘Sim -to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey’, in 2020 IEEE Symposium Series on Computational Intelligence (SSCI) , Dec. 2020, pp. 737–744. doi: 10.1109/SSCI47803.2020.9308468

work page doi:10.1109/ssci47803.2020.9308468 2020

[16] [16]

C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Martín-Martín, and P. Stone, ‘Deep Reinforcement Learning for Robotics: A Survey of Real - World Successes’, Annu. Rev. Control Robot. Auton. Syst., vol. 8, no. 1, pp. 153–188, May 2025, doi: 10.1146/annurev-control-030323-022510

work page doi:10.1146/annurev-control-030323-022510 2025

[17] [17]

H. Gong, Y. Zhang, Y. Liu, Q. Zhao, X. Zhao, and M. Sun, ‘Automatic Cell Rotation Method Based on Deep Reinforcement Learning’, in 2023 IEEE International Conference on Robotics and Automation (ICRA) , London, United Kingdom: IEEE, May 2023, pp. 5452 –5458. doi: 10.1109/ICRA48891.2023.10161043

work page doi:10.1109/icra48891.2023.10161043 2023

[18] [18]

Zhang et al

Y. Zhang et al. , ‘Robotic Cell Micromanipulation for Posture Adjustment of Zebrafish Embryonic Cell’, J. Phys. Conf. Ser. , vol. 3101, no. 1, p. 012014, Sep. 2025, doi: 10.1088/1742 - 6596/3101/1/012014

work page doi:10.1088/1742 2025

[19] [19]

Coumans and Y

E. Coumans and Y. Bai, ‘PyBullet, a Python module for physics simulation for games, robotics and machine learning’. 2021 2016. [Online]. Available: http://pybullet.org

work page 2021

[20] [20]

2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, pp

I. Marougkas et al. , ‘Integrating Model -Based Control and RL for Sim2Real Transfer of Tight Insertion Policies’, in IEEE International Conference on Robotics and Automation (ICRA) , Atlanta, GA, USA: IEEE, May 2025, pp. 2102 –2109. doi: 10.1109/ICRA55743.2025.11128860

work page doi:10.1109/icra55743.2025.11128860 2025

[21] [21]

Haiderbhai, R

M. Haiderbhai, R. Gondokaryono, A. Wu, and L. A. Kahrs, ‘Sim2Real Rope Cutting With a Surgical Robot Using Vision -Based Reinforcement Learning’, IEEE Trans. Autom. Sci. Eng. , vol. 22, pp. 4354–4365, 2025, doi: 10.1109/TASE.2024.3410297

work page doi:10.1109/tase.2024.3410297 2025

[22] [22]

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong

E. Todorov, T. Erez, and Y. Tassa, ‘MuJoCo: A physics engine for model-based control’, in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , Vilamoura -Algarve, Portugal: IEEE, Oct. 2012, pp. 5026–5033. doi: 10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012

[23] [23]

R. S. Sutton and A. Barto, Reinforcement learning: an introduction , Nachdruck. in Adaptive computation and machine learning. Cambridge, Massachusetts: The MIT Press, 2014

work page 2014

[24] [24]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, ‘Proximal Policy Optimization Algorithms’, Aug. 28, 2017, arXiv: arXiv:1707.06347. doi: 10.48550/arXiv.1707.06347

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017

[25] [25]

A. Raﬃn, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, ‘Stable -baselines3: Reliable reinforcement learning implementations’, J. Mach. Learn. Res., vol. 22, no. 268, pp. 1–8, 2021

work page 2021

[26] [26]

Curriculum learning,

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, ‘Curriculum learning’, in Proceedings of the 26th Annual International Conference on Machine Learning , Montreal Quebec Canada: ACM, Jun. 2009, pp. 41–48. doi: 10.1145/1553374.1553380

work page doi:10.1145/1553374.1553380 2009

[27] [27]

Full -Spectrum Out-of-Distribution Detection,

P. Soviany, R. T. Ionescu, P. Rota, and N. Sebe, ‘Curriculum Learning: A Survey’, Int. J. Comput. Vis. , vol. 130, no. 6, pp. 1526 –1565, Jun. 2022, doi: 10.1007/s11263 -022-01611-x

work page doi:10.1007/s11263 2022

[28] [28]

T. Bi, C. Sferrazza, and R. D’Andrea, ‘Zero -Shot Sim-to-Real Transfer of Tactile Control Policies for Aggressive Swing -Up Manipulation’, IEEE Robot. Autom. Lett. , vol. 6, no. 3, pp. 5761 –5768, Jul. 2021, doi: 10.1109/LRA.2021.3084880

work page doi:10.1109/lra.2021.3084880 2021

[29] [29]

Culjak, D

I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, ‘A brief introduction to OpenCV’, in 2012 Proceedings of the 35th International Convention MIPRO , May 2012, pp. 1725 –1730. Accessed: Mar. 31,

work page 2012

[30] [30]

Available: https://ieeexplore.ieee.org/document/6240859/

[Online]. Available: https://ieeexplore.ieee.org/document/6240859/

work page arXiv

[31] [31]

Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans

J. Canny, ‘A Computational Approach to Edge Detection’, IEEE Trans. Pattern Anal. Mach. Intell. , vol. PAMI -8, no. 6, pp. 679 –698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851

work page doi:10.1109/tpami.1986.4767851 1986

[32] [32]

Bergou, M

M. Bergou, M. Wardetzky, S. Robinson, B. Audoly, and E. Grinspun, ‘Discrete elastic rods’, in ACM SIGGRAPH 2008 papers , Los Angeles California: ACM, Aug. 2008, pp. 1 –12. doi: 10.1145/1399504.1360662

work page doi:10.1145/1399504.1360662 2008