Recognition: 2 theorem links
· Lean TheoremTune to Learn: How Controller Gains Shape Robot Policy Learning
Pith reviewed 2026-05-13 20:59 UTC · model grok-4.3
The pith
Controller gains for robot policy learning should be chosen according to the learning method rather than the target task stiffness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Systematic tests demonstrate that position controller gains affect learnability differently across paradigms: behavior cloning benefits from compliant and overdamped regimes, reinforcement learning succeeds across all regimes when hyperparameters are tuned compatibly, and sim-to-real transfer is harmed by stiff and overdamped regimes. Effective stiffness therefore arises from the interplay between the learned reactions and the control dynamics rather than from the gains in isolation.
What carries the argument
Position controller gains viewed as a learnability filter that modulates the interaction between the policy output and the robot's closed-loop dynamics in behavior cloning, reinforcement learning, and sim-to-real pipelines.
If this is right
- Behavior cloning performs reliably only when the controller is set to compliant and overdamped gains.
- Reinforcement learning from scratch can succeed in any gain regime once its hyperparameters are adjusted to that regime.
- Sim-to-real transfer success drops when stiff and overdamped gain settings are used.
- Gain selection must be decided by the learning paradigm in use rather than by the compliance desired at execution time.
Where Pith is reading between the lines
- Practitioners should decide controller gains at the start of a project once they have chosen imitation learning versus reinforcement learning.
- The same gain-paradigm dependence may appear when other low-level controllers such as velocity or torque interfaces are substituted for position control.
- Policy architectures that explicitly model the controller dynamics could reduce or eliminate the need for separate gain tuning.
Load-bearing premise
The tested tasks, robots, and hyperparameter regimes are representative enough that the observed patterns will hold for other manipulation settings and learning algorithms.
What would settle it
Repeating the exact experimental protocol on a different robot embodiment or task and finding that behavior cloning performs best under stiff gains instead of compliant ones would falsify the central claim.
Figures
read the original abstract
Position controllers have become the dominant interface for executing learned manipulation policies. Yet a critical design decision remains understudied: how should we choose controller gains for policy learning? The conventional wisdom is to select gains based on desired task compliance or stiffness. However, this logic breaks down when controllers are paired with state-conditioned policies: effective stiffness emerges from the interplay between learned reactions and control dynamics, not from gains alone. We argue that gain selection should instead be guided by learnability: how amenable different gain settings are to the learning algorithm in use. In this work, we systematically investigate how position controller gains affect three core components of modern robot learning pipelines: behavior cloning, reinforcement learning from scratch, and sim-to-real transfer. Through extensive experiments across multiple tasks and robot embodiments, we find that: (1) behavior cloning benefits from compliant and overdamped gain regimes, (2) reinforcement learning can succeed across all gain regimes given compatible hyperparameter tuning, and (3) sim-to-real transfer is harmed by stiff and overdamped gain regimes. These findings reveal that optimal gain selection depends not on the desired task behavior, but on the learning paradigm employed. Project website: https://younghyopark.me/tune-to-learn
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that position controller gains for learned robot manipulation policies should be selected based on learnability for the specific paradigm (behavior cloning, RL from scratch, or sim-to-real transfer) rather than conventional task-based compliance or stiffness requirements. It supports this via experiments across multiple tasks and robot embodiments showing that (1) behavior cloning benefits from compliant/overdamped gains, (2) RL succeeds across gain regimes when hyperparameters are compatibly tuned, and (3) sim-to-real transfer is harmed by stiff/overdamped regimes, concluding that optimal gains depend on the learning algorithm rather than desired task behavior.
Significance. If the reported patterns hold under broader conditions, the work provides actionable empirical guidance that could improve success rates in robot learning pipelines by decoupling gain choice from task physics. The multi-task, multi-embodiment experimental scope is a strength, offering concrete evidence against purely task-driven gain selection.
major comments (2)
- [Experiments (across tasks and embodiments)] The central claim that paradigm dictates gains independently of task behavior rests on the assumption that the tested tasks do not embed varying stiffness requirements; without explicit ablation varying target trajectories or adding stiffness objectives while holding the learning paradigm fixed, the decoupling cannot be isolated from task-specific effects.
- [RL experiments and hyperparameter details] For the RL result that it 'can succeed across all gain regimes given compatible hyperparameter tuning,' the manuscript must report the exact search ranges, number of trials, and exclusion criteria used to identify compatible tunings; otherwise the claim reduces to post-hoc selection rather than a general property of the paradigm.
minor comments (2)
- [Abstract] Abstract: the terms 'compliant,' 'overdamped,' and 'stiff' should be defined quantitatively (e.g., via damping ratio ranges or specific K_p/K_d values) rather than left qualitative.
- [Experimental setup] Provide the full list of tasks, robot platforms, and success metrics in a table for reproducibility; the current description is high-level.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the multi-task, multi-embodiment scope of our experiments. We address each major comment below with clarifications and proposed revisions.
read point-by-point responses
-
Referee: [Experiments (across tasks and embodiments)] The central claim that paradigm dictates gains independently of task behavior rests on the assumption that the tested tasks do not embed varying stiffness requirements; without explicit ablation varying target trajectories or adding stiffness objectives while holding the learning paradigm fixed, the decoupling cannot be isolated from task-specific effects.
Authors: We agree that an explicit ablation holding the learning paradigm fixed while varying target stiffness or trajectory requirements would provide stronger isolation. Our current design instead demonstrates consistent paradigm-dependent patterns across a deliberately diverse set of tasks (pushing, grasping, insertion) and two robot embodiments with differing dynamics. This cross-task consistency is our primary evidence that gain effects are not reducible to task-specific stiffness demands. In the revision we will add a dedicated limitations paragraph acknowledging the absence of a controlled stiffness-objective ablation and will include additional discussion of how task selection was intended to mitigate this concern. revision: partial
-
Referee: [RL experiments and hyperparameter details] For the RL result that it 'can succeed across all gain regimes given compatible hyperparameter tuning,' the manuscript must report the exact search ranges, number of trials, and exclusion criteria used to identify compatible tunings; otherwise the claim reduces to post-hoc selection rather than a general property of the paradigm.
Authors: We accept this point. The original manuscript summarized the tuning process at a high level. In the revised version we will add a new subsection (or appendix) that explicitly lists: (i) the hyperparameter search ranges explored for each gain regime, (ii) the total number of trials per regime, and (iii) the quantitative success criteria and exclusion rules applied when declaring a tuning “compatible.” This documentation will make clear that the reported success across regimes rests on systematic search rather than post-hoc selection. revision: yes
Circularity Check
No circularity: purely empirical observations from controlled experiments
full rationale
The paper reports direct experimental measurements of success rates and transfer performance under varying position controller gains for behavior cloning, RL, and sim-to-real pipelines across multiple tasks and robot embodiments. No mathematical derivations, parameter fits, or predictions are presented that reduce to the inputs by construction. Central claims rest on observed patterns (e.g., BC favoring compliant overdamped regimes) rather than self-definitional equations or load-bearing self-citations. The findings are falsifiable via new experiments and do not invoke uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
optimal gain selection depends not on the desired task behavior, but on the learning paradigm employed
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
compliant and overdamped gain regimes
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning
Nonasymptotic analysis shows compliant overdamped PD controllers minimize position error tails in behavior cloning by bounding gain-dependent amplification of sub-Gaussian action errors.
-
A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning
Nonasymptotic analysis shows sub-Gaussian action errors in behavior cloning propagate through gain-dependent closed-loop dynamics to produce sub-Gaussian position errors whose tail is governed by a proxy matrix and am...
Reference graph
Works this paper leans on
-
[1]
A new feedback method for dynamic control of manipulators,
M. Takegaki and S. Arimoto, “A new feedback method for dynamic control of manipulators,”ASME Journal of Dynamic Systems, Measurement, and Control, 1981
work page 1981
-
[2]
Pd control with desired gravity compensation of robotic manipulators: a review,
R. Kelly, “Pd control with desired gravity compensation of robotic manipulators: a review,”The International Journal of Robotics Research, vol. 16, no. 5, pp. 660– 672, 1997
work page 1997
-
[3]
On the role of the action space in robot manipulation learning and sim-to-real transfer,
E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 9, no. 6, p. 5895–5902, Jun. 2024. [Online]. Available: http: //dx.doi.org/10.1109/LRA.2024.3398428
-
[4]
D. Kim, G. Berseth, M. Schwartz, and J. Park, “Torque- based deep reinforcement learning for task-and-robot agnostic learning on bipedal robots using sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 8, no. 10, p. 6251–6258, Oct. 2023. [Online]. Available: http://dx.doi.org/10.1109/LRA.2023.3304561
-
[5]
Action space design in reinforcement learn- ing for robot motor skills,
J. Eßer, G. B. Margolis, O. Urbann, S. Kerner, and P. Agrawal, “Action space design in reinforcement learn- ing for robot motor skills,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[6]
Y . Wu, F. Zhao, T. Tao, and A. Ajoudani, “A framework for autonomous impedance regulation of robots based on imitation learning and optimal control,”IEEE Robotics and Automation Letters, vol. 6, no. 1, pp. 127–134, 2021
work page 2021
-
[7]
Learning compliant ma- nipulation through kinesthetic and tactile human-robot interaction,
K. Kronander and A. Billard, “Learning compliant ma- nipulation through kinesthetic and tactile human-robot interaction,”IEEE Transactions on Haptics, vol. 7, no. 3, pp. 367–380, 2014
work page 2014
-
[8]
Soft- mimic: Learning compliant whole-body control from examples,
G. B. Margolis, M. Wang, N. Fey, and P. Agrawal, “Soft- mimic: Learning compliant whole-body control from examples,”arXiv preprint arXiv:2510.17792, 2025
-
[9]
Sail: Faster-than-demonstration ex- ecution of imitation learning policies,
N. R. Arachchige, Z. Chen, W. Jung, W. C. Shin, R. Bansal, P. Barroso, Y . H. He, Y . C. Lin, B. Joffe, S. Kousiket al., “Sail: Faster-than-demonstration ex- ecution of imitation learning policies,”arXiv preprint arXiv:2506.11948, 2025
-
[10]
Domain randomization for transferring deep neural networks from simulation to the real world,
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” 2017. [Online]. Available: https://arxiv.org/abs/1703.06907
-
[11]
Sim-to-real transfer of robotic control with dynamics randomization,
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2018, p. 3803–3810. [Online]. Available: http://dx.doi.org/10.1109/ICRA.2018.8460528
-
[12]
Solving rubik’s cube with a robot hand,
OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik’s cube with a robot hand,”
-
[13]
Available: https://arxiv.org/abs/1910
[Online]. Available: https://arxiv.org/abs/1910. 07113
work page 1910
-
[14]
Robot learning from randomized simulations: A review,
F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot learning from randomized simulations: A review,” 2022. [Online]. Available: https://arxiv.org/abs/2111.00956
-
[15]
Learning low-frequency motion control for robust and dynamic robot locomotion,
S. Gangapurwala, L. Campanaro, and I. Havoutis, “Learning low-frequency motion control for robust and dynamic robot locomotion,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5085–5091
work page 2023
-
[16]
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Elliset al., “Droid: A large-scale in-the-wild robot manipulation dataset,”arXiv preprint arXiv:2403.12945, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Open x- embodiment: Robotic learning datasets and rt-x models,
Q. Vuong, S. Levine, H. R. Walke, K. Pertsch, A. Singh, R. Doshi, C. Xu, J. Luo, L. Tan, D. Shahet al., “Open x- embodiment: Robotic learning datasets and rt-x models,” inTowards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023
work page 2023
-
[18]
O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,”IEEE Journal on Robotics and Automation, vol. 3, no. 1, pp. 43–53, 2003
work page 2003
-
[19]
Diffusion policy: Visuo- motor policy learning via action diffusion,
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burch- fiel, R. Tedrake, and S. Song, “Diffusion policy: Visuo- motor policy learning via action diffusion,”The Interna- tional Journal of Robotics Research, vol. 44, no. 10-11, pp. 1684–1704, 2025
work page 2025
-
[20]
Automatic environment shaping is the next frontier in rl,
Y . Park, G. B. Margolis, and P. Agrawal, “Automatic environment shaping is the next frontier in rl,”arXiv preprint arXiv:2407.16186, 2024
-
[21]
Optuna: A next-generation hyperparameter optimization framework,
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019
work page 2019
-
[22]
skrl: Modular and flexible library for reinforcement learning,
A. Serrano-Mu ˜noz, D. Chrysostomou, S. Bøgh, and N. Arana-Arexolaleiba, “skrl: Modular and flexible library for reinforcement learning,”Journal of Machine Learning Research, vol. 24, no. 254, pp. 1–9, 2023. [Online]. Available: http://jmlr.org/papers/v24/23-0112. html
work page 2023
-
[23]
Proximal policy optimization algorithms,
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”
-
[24]
Available: https://arxiv.org/abs/1707
[Online]. Available: https://arxiv.org/abs/1707. 06347
-
[25]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
NVIDIA, :, M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Mu ˜noz, X. Yao, R. Zurbr ¨ugg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G....
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
Humanoid policy˜ human policy,
R.-Z. Qiu, S. Yang, X. Cheng, C. Chawla, J. Li, T. He, G. Yan, D. J. Yoon, R. Hoque, L. Paulsen et al., “Humanoid policy˜ human policy,”arXiv preprint arXiv:2503.13441, 2025
-
[27]
Ego4d: Around the world in 3,000 hours of egocentric video,
K. Grauman, A. Westbury, E. Byrne, Z. Chavis, A. Furnari, R. Girdhar, J. Hamburger, H. Jiang, M. Liu, X. Liuet al., “Ego4d: Around the world in 3,000 hours of egocentric video,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 995–19 012
work page 2022
-
[28]
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipula- tion interface: In-the-wild robot teaching without in-the- wild robots,”arXiv preprint arXiv:2402.10329, 2024
work page internal anchor Pith review arXiv 2024
-
[29]
S. H. Crandall and W. D. Mark,Random vibration in mechanical systems. Academic Press, 2014
work page 2014
-
[30]
Using apple vision pro to train and control robots,
Y . Park and P. Agrawal, “Using apple vision pro to train and control robots,” 2024. [Online]. Available: https://github.com/Improbable-AI/VisionProTeleop
work page 2024
-
[31]
Dexhub and dart: Towards internet scale robot data collection,
Y . Park, J. S. Bhatia, L. Ankile, and P. Agrawal, “Dexhub and dart: Towards internet scale robot data collection,” arXiv preprint arXiv:2411.02214, 2024
-
[32]
cmaes : A simple yet practical python library for cma-es,
M. Nomura and M. Shibata, “cmaes : A simple yet practical python library for cma-es,” 2024. [Online]. Available: https://arxiv.org/abs/2402.01373
-
[33]
aiofranka: Asyncio-based franka robot control,
Y . Park, “aiofranka: Asyncio-based franka robot control,” 2025. [Online]. Available: https://github.com/ Improbable-AI/aiofranka APPENDIX A. Analytical Proof of Gain-Dependent Error Sensitivity We formalize the empirical observation that compliant and overdamped controller gains attenuate action prediction errors during behavior cloning. We analyze a sim...
work page 2025
-
[34]
Task Descriptions:The six tasks we study are: Biman- ual Handover, Dishrack Unload, Dishrack Load, Dishwasher Open, Mug Hang, and Block Stack (Figure 16). For all tasks besides Block Stack, we collect 100 teleoperated demon- strations with the Apple Vision Pro [28, 29] for each task. For Block Stack, we use motion-planned trajectories. These demonstration...
-
[35]
Nominal Training Configuration:As a nominal con- figuration, we use V AE as a generative model with MLP network with observation size 10 and action chunk size 10, with privileged simulation states as inputs, using absolute joint as action space
-
[36]
Ablation Training Configurations:We present ablation results across dataset size (Figure 23), policy architectures (Figure 24), action chunk size (Figure 25), action represen- tation (Figure 26), and control frequency (Figure 27). Across (a) Bimanual Handover (b) Dishrack Unload (c) Dishrack Load (d) Dishwasher Open (e) Mug Hang (f) Block Stack Fig. 16: S...
-
[37]
Scaling Law:Beyond absolute performance, the choice of controller gains also affects how efficiently policies improve with additional data. As shown in Fig. 28, compliant and overdamped gains exhibit steeper scaling with dataset size, implying that data collection efforts yield greater returns in this regime. For practitioners with limited demonstration b...
-
[38]
TPR Fidelity Validation:To quantify how faithfully Torque-to-Position Retargeting (TPR) preserves the original demonstration trajectories, we retarget a motion-planned Block Stacking trajectory to four representative gain configurations spanning the gain grid corners and evaluate at varying decima- tion rates (from1×at 500 Hz down to50×at 10 Hz). For each...
-
[39]
Extension to Task-Space Position Control:While the TPR formulation in Section IV-A addresses joint-space po- sition control, many manipulation systems instead use op- erational space control (OSC) [17] with SE(3) end-effector pose targets. OSC computes joint torques through a task-space impedance law: τ=J ⊤Mx (Kp˜x−K d ˙x) +τ null,(23) where ˜xis the pose...
-
[40]
Statistical Significance Analysis:We provide a formal statistical analysis to verify that the compliant-overdamped gain regionG CO significantly outperforms its complement G \ G CO across all six BC tasks. For each task and gain cell, we evaluateN=100closed-loop rollouts and record the binary success outcome. Logistic Regression.We fit a binomial logistic...
-
[41]
Task Description:The non-prehensile box manipulation task used in the user study is shown in Figure 18. For each trial, users teleoperate a Franka Research 3 Robot with a SpaceMouse in order to push the box from an initial pose to the goal (Figure 18b). The box is always initialized to the left and off-axis relative to the goal (Figure 18a), but the preci...
-
[42]
Experimental Design and Results:As described in Sec- tion IV-A, the study collected 1,297 trials from 12 users over 1-hour sessions with randomized, blind gain presentation. The subjective rating is on a scale from 1–5, where 1 means the gain setting provides a completely unintuitive interface and 5 means a completely intuitive interface. Users complete t...
-
[43]
Each task is derived from the IsaacLab [23] template environments
Task Descriptions:The five tasks we study are: FR3 Joint-Reach, FR3 EE-Reach, FR3 Lift Cube, FR3 Open Drawer, and Unitree G1 Track Velocity (Figure 20). Each task is derived from the IsaacLab [23] template environments. Fig. 19:User study survey.After each trial, users complete the survey to rate their subjective experience teleoperating with a given gain...
-
[44]
Action Representations:For all tasks, the position target sent to the low-level PD controller at each timestep is: qdes(t) =α⊙π θ(st) +q ref(t)(26) α= [α 1, . . . , α1| {z } G1 , α 2, . . . , α2| {z } G2 ] whereq ref(t)is an offset equal to either the current joint positionq(t)or the default joint positionq 0, depending on the task. Joints are partitioned...
-
[45]
We evaluate the best (highest reward) checkpoint for each policy
Success Criteria:For each policy trained during hyper- parameter optimization, we record the success rate across 100 simulated trials according to the success metrics in Table IV. We evaluate the best (highest reward) checkpoint for each policy. TABLE IV: Success criteria for each RL task. Task Criterion Threshold FR3 Joint-Reach∥q−q goal∥< ϵ ϵ= 0.1rad FR...
-
[46]
Hyperparameters, including any changes we made, are repro- duced here (Table V and Table VI)
PPO Hyperparameters:We use largely the same PPO hyperparameters as the IsaacLab [23] template environments. Hyperparameters, including any changes we made, are repro- duced here (Table V and Table VI). TABLE V: PPO hyperparameters shared across all tasks. Hyperparameter Value Algorithm PPO (SKRL) Discount factorγ0.99 GAEλ0.95 Learning epochs5 Clip range (...
-
[47]
During execution, we log joint positionsq, joint velocities ˙q, and desired positions qdes at 50 Hz
System Identification Data Collection:For each gain configuration(K p,K d), the real robot executes a sinusoidal reference trajectoryq des(t) =q 0 + 0.1 sin(πt/50)applied uniformly across all joints for 4 seconds. During execution, we log joint positionsq, joint velocities ˙q, and desired positions qdes at 50 Hz. The low-level torque controller on the rea...
-
[48]
TABLE VII: System identification parameter bounds
System Identification Procedure:For each gain config- uration, we use CMA-ES [30] to optimize simulation param- eters per-actuatorψ(Table VII) to minimize the discrepancy between real and simulated response trajectories. TABLE VII: System identification parameter bounds. Parameters are optimized per-actuator. Parameter Lower Upper StiffnessK p 1 1024 Damp...
-
[49]
Training Deployable Policies:We train deployable FR3 Joint-Reach and FR3 EE-Reach policies. To discover policies that respect the real robot’s limits, we modify the outer- loop Optuna objective to a two-stage formulation that al- ways prefers constraint-satisfying configurations over violating ones: J= 1 +r success if allv c ≤¯vc rsuccess Y c∈C ϕc...
-
[50]
For each gain cell, we compute the trajectory error (Eq
Statistical Significance Analysis:We provide a formal statistical analysis to verify that the stiff-overdamped gain regionG SO produces significantly larger sim-to-real trajectory error than its complementG \ G SO across all three sim-to-real conditions. For each gain cell, we compute the trajectory error (Eq. 11) averaged over 30 real-world rollouts. OLS...
work page 1980
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.