pith. sign in

arxiv: 2605.22082 · v1 · pith:PDUTYSKGnew · submitted 2026-05-21 · 💻 cs.RO · cs.LG

CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation

Pith reviewed 2026-05-22 05:33 UTC · model grok-4.3

classification 💻 cs.RO cs.LG
keywords meta-adaptationcontact-rich manipulationsemantic context inferencecausal transformercontrastive learningsim-to-real transferrobotic assemblyforce feedback
0
0 comments X

The pith

CoRMA replaces raw simulator parameters with a 6D semantic contact context inferred by a causal Transformer from force and motion histories to enable real-time adaptation in contact-rich assembly without demonstrations or updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CoRMA to handle force-dominant robotic assembly tasks where standard RMA methods fail to transfer from simulation to hardware. It defines a compact 6D simulator-only semantic contact context covering contact onset, lateral engagement, guided transition, contact direction, and jamming. A causal Transformer is trained via semantic regression plus a force-regime contrastive objective to infer this context online from force, proprioceptive, and action sequences. Once trained, the model substitutes its own inferences for oracle simulator context at deployment, supporting within-episode policy adaptation on physical arms for tasks such as PegInsert, GearMesh, and NutThread. If the substitution succeeds, robots gain a reusable interface for handling contact variations inside related task families using only onboard sensor history.

Core claim

CoRMA shows that a causal Transformer adapter, trained with semantic regression and a force-regime contrastive objective on simulator data, can infer a 6D semantic contact context from force, proprioceptive, and action histories; replacing the oracle context with this inferred context at test time produces effective within-episode adaptation on real hardware for PegInsert, GearMesh, and NutThread without demonstrations, privileged inputs, or gradient updates.

What carries the argument

The deployable causal Transformer adapter that infers the 6D semantic contact context via semantic regression and force-regime contrastive training.

If this is right

  • Higher verified real-world success rates are retained for PegInsert, GearMesh, and NutThread under target-pose noise compared with baselines that degrade after simulation training.
  • Semantic contact inference functions as a reusable adaptation interface across a family of related assembly tasks.
  • Within-episode adaptation occurs on physical robots without any demonstrations, privileged simulator inputs, or online gradient updates.
  • The same inferred context supports force-dominant scenarios where raw parameter adaptation from simulators proves insufficient.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same inference interface could be tested on contact-rich tasks outside the current assembly family to check whether the 6D context representation transfers.
  • Refining the contrastive objective to better align simulated and real force regimes might reduce the remaining Real2Sim gap noted as future work.
  • Combining the inferred context with other meta-learning modules could support limited generalization to tasks outside the trained family.
  • The method's reliance on histories alone suggests it may lower the data cost of deploying contact-rich policies by shifting more adaptation burden into the inference model.

Load-bearing premise

The context inferred by the causal Transformer is accurate enough to replace oracle simulator context and still drive successful within-episode adaptation on hardware.

What would settle it

A controlled hardware trial in which CoRMA achieves real success rates no higher than the FORGE baselines under identical target-pose noise would falsify the claim that the inferred context supports effective adaptation.

Figures

Figures reproduced from arXiv: 2605.22082 by Abdul Haseeb Nizamani, Chutong Wen, Dandi Zhou, Hongxu Ma, Jianqiao Zhu, Wentian Wang, Wuhao Wang, Xinhai Sun, Zhexiong Xue.

Figure 1
Figure 1. Figure 1: CoRMA pipeline overview. Stage 1 trains privileged contact-aware policies in Isaac Lab FORGE using RL-Games PPO with a simulator-only 6D semantic contact latent zt . Stage 2 trains a causal Transformer adapter to infer zˆt from deployable force, proprioceptive, and action histories, using semantic regression and force-regime contrastive learning. Stage 3 removes oracle zt and injects the adapter prediction… view at source ↗
Figure 2
Figure 2. Figure 2: CoRMA adapter. A causal Transformer encodes deployable force/proprioceptive/action history with a learned readout token. The semantic head predicts the 6D context zˆ used by the policy, while an auxiliary contrastive head produces ut for force-regime InfoNCE. Positives are defined by coarse contact regimes: free motion, first contact, guided sliding, and jamming. regime labels, such as first contact or gui… view at source ↗
Figure 3
Figure 3. Figure 3: Real-robot deployment tasks. CoRMA and FORGE are evaluated on the same Marvin hardware interface across NutThread, PegInsert, and GearMesh [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Real force-regime evidence for positive X contact direction. Wrench-derived features are aligned at contact onset. Direct flat-surface contact and inner-rim sliding show different lateral-force ratios, force derivatives, and contact-direction signatures, motivating the coarse contact regimes used for contrastive supervision. semantics. All plots in this section are diagnostic and do not imply fully task-in… view at source ↗
Figure 5
Figure 5. Figure 5: Real force-regime evidence for negative X contact direction. The sharp change near contact onset indicates the transition from free motion to first contact, while the sustained post-onset force texture differentiates guided sliding from direct contact. E.3 Force-Regime Separability from Predicted Context We further evaluate whether the predicted semantic context zˆt preserves the coarse force-regime inform… view at source ↗
Figure 6
Figure 6. Figure 6: Real force-regime evidence for positive Y contact direction. The lateral-force ratio, nor￾malized force derivative, and contact-direction angle provide deployable wrench cues for separating first contact, guided sliding, and jam-like interaction [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Real force-regime evidence for negative Y contact direction. Across probe groups, inner￾rim sliding exhibits sustained lateral-force structure and directionally consistent contact signatures, supporting the use of coarse force-regime labels in Stage 2. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: GearMesh–PegInsert adapter embedding diagnostic. Left: PCA of the predicted semantic context zˆt . Right: PCA of the auxiliary contrastive embedding ut . Colors indicate coarse force regimes and markers indicate task identity. Compared with zˆt , the contrastive embedding shows clearer regime-level organization, supporting the use of force-regime positives and negatives for InfoNCE [PITH_FULL_IMAGE:figure… view at source ↗
Figure 9
Figure 9. Figure 9: GearMesh–NutThread adapter embedding diagnostic. Left: PCA of zˆt . Right: PCA of ut . The auxiliary contrastive embedding provides clearer force-regime organization than the semantic regression output alone. This visualization supports the contrastive head as a representation￾structuring objective, not as a replacement for supervised Privileged Z regression. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: NutThread–PegInsert adapter embedding diagnostic. Left: PCA of zˆt . Right: PCA of ut . The contrastive space separates coarse contact regimes more clearly than the regression output alone, indicating that force-regime positives and negatives provide useful semantic structure. Task￾specific structure may still remain, so this should be interpreted as qualitative diagnostic evidence rather than proof of co… view at source ↗
Figure 11
Figure 11. Figure 11: GearMesh–PegInsert semantic context prediction. Predicted zˆt is plotted against oracle Privileged Z for the six semantic dimensions. The diagonal trend indicates that the adapter preserves numerical fidelity to the simulator-derived contact context [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: GearMesh–NutThread semantic context prediction. Predicted zˆt follows oracle zt across the trained semantic dimensions, supporting the use of zˆt as the deployable context injected into the policy [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: NutThread–PegInsert semantic context prediction. The adapter prediction ˆzt remains aligned with oracle Privileged Z across contact semantics, showing that the contrastive auxiliary objective does not replace or collapse the supervised semantic regression target. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Force-regime separability from semantic context. For each pairwise adapter, a simple probe predicts coarse force-regime labels from oracle zt and predicted zˆt . The predicted context remains above chance and close to the oracle context across task pairs, indicating that zˆt preserves regime-relevant semantic information. This diagnostic supports the use of zˆt as a deployable contact context, but does no… view at source ↗
read the original abstract

We present CoRMA(Contrastive Robotic Motor Adaptation), a context-based meta-adaptation framework that modifies RMA for force-dominant assembly. CoRMA replaces raw simulator-parameter adaptation with a compact 6D simulator-only semantic contact context describing contact onset, lateral engagement, guided transition, contact direction, and jamming. A deployable causal Transformer adapter infers this context online from force, proprioceptive, and action histories using semantic regression and a force-regime contrastive objective. At deployment, oracle context is removed and replaced by the inferred context, enabling within-episode adaptation without demonstrations, privileged inputs, or gradient updates. We evaluate CoRMA on PegInsert, GearMesh, and NutThread in Isaac Lab / Isaac Sim~5.0 and on a real Marvin arm. Compared with FORGE baselines that achieve high simulation success but degrade substantially on hardware, CoRMA retains higher verified real success under controlled target-pose noise. These results support semantic contact inference as a reusable adaptation interface within a related assembly task family, while broader unseen-task generalization and Real2Sim calibration remain future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces CoRMA, a context-based meta-adaptation framework extending RMA for force-dominant assembly tasks. It replaces raw simulator-parameter adaptation with a compact 6D semantic contact context (onset, lateral engagement, guided transition, contact direction, jamming) inferred online by a deployable causal Transformer from force, proprioceptive, and action histories. Training uses semantic regression plus a force-regime contrastive objective; at deployment the oracle context is removed, enabling within-episode adaptation on hardware without demonstrations, privileged inputs, or gradient updates. Experiments on PegInsert, GearMesh, and NutThread in Isaac Lab/Sim and on a real Marvin arm report that CoRMA retains higher verified real success under target-pose noise than FORGE baselines that degrade substantially on hardware.

Significance. If the empirical claims are substantiated, the work would demonstrate a reusable semantic-contact interface for within-family adaptation in contact-rich assembly, reducing reliance on privileged simulator information during real-robot deployment. The contrastive objective for force-regime distinction and the causal Transformer adapter constitute concrete technical contributions that could be adopted in related manipulation pipelines.

major comments (3)
  1. Abstract: the claim that CoRMA 'retains higher verified real success under controlled target-pose noise' is asserted without any numerical success rates, error bars, statistical tests, ablation results, or data-exclusion criteria. This absence directly weakens the central empirical claim and prevents assessment of effect size or robustness.
  2. Real-world evaluation (as summarized in the abstract and skeptic note): the central claim requires that the causal Transformer’s inferred 6D semantic contact vector is sufficiently accurate to substitute for oracle simulator context on hardware. No quantitative alignment metric (MSE, classification accuracy, or correlation) between inferred and oracle 6D vectors collected on the physical Marvin arm is reported. Without this evidence it remains possible that observed gains arise from contrastive regularization, base-policy robustness, or the noise schedule rather than faithful semantic inference.
  3. Method description: the 6D context is presented as a compact, simulator-only semantic representation, yet no explicit mapping from simulator parameters to the five semantic components (onset, lateral engagement, etc.) or any derivation showing compactness is supplied. This leaves open whether the representation is truly parameter-free or merely reparameterized.
minor comments (2)
  1. Add explicit definitions or a table for the five semantic contact dimensions and their relation to simulator state variables.
  2. Clarify the precise form of the force-regime contrastive loss (positive/negative pair construction, temperature, etc.) so that the training objective can be reproduced.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments on our work. We address each of the major comments point by point below, providing clarifications and indicating revisions to the manuscript where appropriate.

read point-by-point responses
  1. Referee: Abstract: the claim that CoRMA 'retains higher verified real success under controlled target-pose noise' is asserted without any numerical success rates, error bars, statistical tests, ablation results, or data-exclusion criteria. This absence directly weakens the central empirical claim and prevents assessment of effect size or robustness.

    Authors: We agree that including quantitative support in the abstract would strengthen the presentation. The full experimental results, including success rates, standard deviations, and comparisons to baselines, are provided in Section 5 of the manuscript along with statistical analysis. In the revised manuscript, we have updated the abstract to include key numerical results such as average success rates across tasks and a note on the evaluation protocol used (e.g., number of trials and noise levels). This allows readers to better assess the effect size without needing to refer to the body immediately. revision: yes

  2. Referee: Real-world evaluation (as summarized in the abstract and skeptic note): the central claim requires that the causal Transformer’s inferred 6D semantic contact vector is sufficiently accurate to substitute for oracle simulator context on hardware. No quantitative alignment metric (MSE, classification accuracy, or correlation) between inferred and oracle 6D vectors collected on the physical Marvin arm is reported. Without this evidence it remains possible that observed gains arise from contrastive regularization, base-policy robustness, or the noise schedule rather than faithful semantic inference.

    Authors: We appreciate this observation. However, the oracle 6D semantic contact context is inherently tied to simulator-specific parameters and states, which cannot be directly measured or computed on the physical robot. As such, collecting paired inferred-oracle vectors on hardware is not possible. We instead demonstrate the effectiveness of the inferred context through improved real-world task success rates compared to baselines that do not use adaptation. To further support this, we have added simulation-based ablations in the revised paper showing strong alignment between inferred and oracle contexts in simulation (with correlation coefficients reported in the supplementary material), and we discuss the sim-to-real transfer assumptions. We believe the performance gains are attributable to the semantic inference as the contrastive objective and other components are ablated in the experiments. revision: partial

  3. Referee: Method description: the 6D context is presented as a compact, simulator-only semantic representation, yet no explicit mapping from simulator parameters to the five semantic components (onset, lateral engagement, etc.) or any derivation showing compactness is supplied. This leaves open whether the representation is truly parameter-free or merely reparameterized.

    Authors: We acknowledge that the mapping and compactness analysis were not sufficiently detailed. In the revised manuscript, we have expanded the method section to include an explicit description of how each semantic component is computed from the simulator state and parameters. For example, contact onset is triggered when force exceeds a threshold, lateral engagement is based on lateral displacement, and so on for the other components. Additionally, we provide a comparison of dimensionality, noting that the raw simulator parameters for contact-rich tasks can involve over 15-20 variables, while the 6D context provides a compact encoding focused on semantically relevant aspects for adaptation. revision: yes

Circularity Check

0 steps flagged

No circularity: framework uses standard training objectives without self-referential definitions or fitted predictions.

full rationale

The manuscript describes an empirical meta-adaptation framework that trains a causal Transformer adapter via a force-regime contrastive objective to infer a 6D semantic contact context from histories, then deploys the inferred context in place of oracle simulator values. No equations, derivations, or first-principles results are presented that could reduce to their own inputs by construction. The contrastive objective functions as a conventional training loss rather than a quantity whose value is presupposed by the claimed prediction. Central performance claims rest on hardware experiments versus FORGE baselines, which constitute independent empirical evidence rather than a closed loop of self-citation or renaming. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the 6D context and contrastive objective are introduced as new constructs but without derivation details or independent evidence.

pith-pipeline@v0.9.0 · 5758 in / 1178 out tokens · 37027 ms · 2026-05-22T05:33:17.914059+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    CoRMA replaces raw simulator-parameter adaptation with a compact 6D simulator-only semantic contact context describing contact onset, lateral engagement, guided transition, contact direction, and jamming. A deployable causal Transformer adapter infers this context online from force, proprioceptive, and action histories using semantic regression and a force-regime contrastive objective.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 10 internal anchors

  1. [1]

    Narang, K

    Y . Narang, K. Storey, I. Akinola, M. Macklin, P. Reist, L. Wawrzyniak, Y . Guo, A. Moravanszky, G. State, M. Lu, A. Handa, and D. Fox. Factory: Fast contact for robotic assembly, 2022. URL https://arxiv.org/abs/2205.03532

  2. [2]

    B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y . Narang. Industreal: Transferring contact-rich assembly tasks from simulation to reality, 2023. URL https://arxiv.org/abs/2305.17110

  3. [3]

    B. Tang, I. Akinola, J. Xu, B. Wen, A. Handa, K. V . Wyk, D. Fox, G. S. Sukhatme, F. Ramos, and Y . Narang. Automate: Specialist and generalist assembly policies over diverse geometries,

  4. [4]

    URLhttps://arxiv.org/abs/2407.08028

  5. [5]

    Noseworthy, B

    M. Noseworthy, B. Tang, B. Wen, A. Handa, C. Kessens, N. Roy, D. Fox, F. Ramos, Y . Narang, and I. Akinola. Forge: Force-guided exploration for robust contact-rich manipulation under uncertainty, 2025. URLhttps://arxiv.org/abs/2408.04587

  6. [6]

    Pomerleau

    D. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In D. Touretzky, editor,Proceedings of (NeurIPS) Neural Information Processing Systems, pages 305 – 313. Morgan Kaufmann, December 1989

  7. [7]

    S. Ross, G. J. Gordon, and J. A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning, 2011. URL https://arxiv.org/abs/1011. 0686

  8. [8]

    S. Schaal. Is imitation learning the route to humanoid robots?Trends in Cognitive Sci- ences, 3(6):233–242, 1999. ISSN 1364-6613. doi:https://doi.org/10.1016/S1364-6613(99) 01327-3. URL https://www.sciencedirect.com/science/article/pii/ S1364661399013273

  9. [9]

    Abbeel and A

    P. Abbeel and A. Y . Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (ICML ’04), pages 1–8. ACM, 2004

  10. [10]

    C. Finn, S. Levine, and P. Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization, 2016. URLhttps://arxiv.org/abs/1603.00448

  11. [11]

    Generative Adversarial Imitation Learning

    J. Ho and S. Ermon. Generative adversarial imitation learning, 2016. URL https://arxiv. org/abs/1606.03476

  12. [12]

    Residual Reinforcement Learning for Robot Control

    T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine. Residual reinforcement learning for robot control, 2018. URL https://arxiv. org/abs/1812.03201

  13. [13]

    Residual Policy Learning

    T. Silver, K. Allen, J. Tenenbaum, and L. Kaelbling. Residual policy learning, 2019. URL https://arxiv.org/abs/1812.06298

  14. [14]

    Ankile, Z

    L. Ankile, Z. Jiang, R. Duan, G. Shi, P. Abbeel, and A. Nagabandi. Residual off-policy rl for finetuning behavior cloning policies, 2025. URL https://arxiv.org/abs/2509. 19301

  15. [15]

    Salloom, X

    T. Salloom, X. Yu, W. He, and O. Kaynak. Adaptive neural network control of underwater robotic manipulators tuned by a genetic algorithm.Journal of Intelligent & Robotic Systems, 97(3–4):657–672, 2020. doi:10.1007/s10846-019-01008-y. URL https://doi.org/10. 1007/s10846-019-01008-y

  16. [16]

    Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

    S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, 2016. URL https: //arxiv.org/abs/1603.02199. 9

  17. [17]

    Rakelly, A

    K. Rakelly, A. Zhou, D. Quillen, C. Finn, and S. Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables, 2019. URL https://arxiv.org/abs/1903. 08254

  18. [18]

    C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks, 2017. URLhttps://arxiv.org/abs/1703.03400

  19. [19]

    Kumar, Z

    A. Kumar, Z. Fu, D. Pathak, and J. Malik. Rma: Rapid motor adaptation for legged robots,

  20. [20]

    URLhttps://arxiv.org/abs/2107.04034

  21. [21]

    G. Liu, M. Tang, and B. Eysenbach. A single goal is all you need: Skills and exploration emerge from contrastive rl without rewards, demonstrations, or subgoals, 2024. URL https: //arxiv.org/abs/2408.05804

  22. [22]

    Eysenbach, T

    B. Eysenbach, T. Zhang, R. Salakhutdinov, and S. Levine. Contrastive learning as goal- conditioned reinforcement learning, 2023. URL https://arxiv.org/abs/2206. 07568

  23. [23]

    Y . Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl2: Fast reinforce- ment learning via slow reinforcement learning, 2016. URL https://arxiv.org/abs/ 1611.02779

  24. [24]

    R. Liu, Y . Du, F. Bai, J. Lyu, and X. Li. Pearl: Zero-shot cross-task preference alignment and robust reward learning for robotic manipulation, 2024. URL https://arxiv.org/abs/ 2306.03615

  25. [25]

    Varibad: A very good method for bayes-adaptive deep rl via meta-learning

    L. Zintgraf, K. Shiarlis, M. Igl, S. Schulze, Y . Gal, K. Hofmann, and S. Whiteson. Varibad: A very good method for bayes-adaptive deep rl via meta-learning, 2020. URL https:// arxiv.org/abs/1910.08348

  26. [26]

    Liang, K

    Y . Liang, K. Ellis, and J. Henriques. Rapid motor adaptation for robotic manipulator arms, 2024. URLhttps://arxiv.org/abs/2312.04670

  27. [27]

    Parisotto, H

    E. Parisotto, H. F. Song, J. W. Rae, R. Pascanu, C. Gulcehre, S. M. Jayakumar, M. Jaderberg, R. L. Kaufman, A. Clark, S. Noury, M. M. Botvinick, N. Heess, and R. Hadsell. Stabilizing transformers for reinforcement learning, 2019. URL https://arxiv.org/abs/1910. 06764

  28. [28]

    Srinivas, M

    A. Srinivas, M. Laskin, and P. Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning, 2020. URLhttps://arxiv.org/abs/2004.04136

  29. [29]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    NVIDIA, :, M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Muñoz, X. Yao, R. Zurbrügg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Mal...

  30. [30]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017. URLhttps://arxiv.org/abs/1707.06347. 10

  31. [31]

    Beeson and B

    P. Beeson and B. Ames. TRAC-IK: An open-source library for improved solving of generic inverse kinematics. In2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935, Seoul, South Korea, 2015. IEEE. doi:10.1109/HUMANOIDS. 2015.7363472. 11 Table 3:Real-robot verified-success accounting.Real success is measured by the final ...