CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation

Abdul Haseeb Nizamani; Chutong Wen; Dandi Zhou; Hongxu Ma; Jianqiao Zhu; Wentian Wang; Wuhao Wang; Xinhai Sun; Zhexiong Xue

arxiv: 2605.22082 · v1 · pith:PDUTYSKGnew · submitted 2026-05-21 · 💻 cs.RO · cs.LG

CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation

Wentian Wang , Chutong Wen , Hongxu Ma , Wuhao Wang , Zhexiong Xue , Abdul Haseeb Nizamani , Dandi Zhou , Xinhai Sun

show 1 more author

Jianqiao Zhu

This is my paper

Pith reviewed 2026-05-22 05:33 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords meta-adaptationcontact-rich manipulationsemantic context inferencecausal transformercontrastive learningsim-to-real transferrobotic assemblyforce feedback

0 comments

The pith

CoRMA replaces raw simulator parameters with a 6D semantic contact context inferred by a causal Transformer from force and motion histories to enable real-time adaptation in contact-rich assembly without demonstrations or updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CoRMA to handle force-dominant robotic assembly tasks where standard RMA methods fail to transfer from simulation to hardware. It defines a compact 6D simulator-only semantic contact context covering contact onset, lateral engagement, guided transition, contact direction, and jamming. A causal Transformer is trained via semantic regression plus a force-regime contrastive objective to infer this context online from force, proprioceptive, and action sequences. Once trained, the model substitutes its own inferences for oracle simulator context at deployment, supporting within-episode policy adaptation on physical arms for tasks such as PegInsert, GearMesh, and NutThread. If the substitution succeeds, robots gain a reusable interface for handling contact variations inside related task families using only onboard sensor history.

Core claim

CoRMA shows that a causal Transformer adapter, trained with semantic regression and a force-regime contrastive objective on simulator data, can infer a 6D semantic contact context from force, proprioceptive, and action histories; replacing the oracle context with this inferred context at test time produces effective within-episode adaptation on real hardware for PegInsert, GearMesh, and NutThread without demonstrations, privileged inputs, or gradient updates.

What carries the argument

The deployable causal Transformer adapter that infers the 6D semantic contact context via semantic regression and force-regime contrastive training.

If this is right

Higher verified real-world success rates are retained for PegInsert, GearMesh, and NutThread under target-pose noise compared with baselines that degrade after simulation training.
Semantic contact inference functions as a reusable adaptation interface across a family of related assembly tasks.
Within-episode adaptation occurs on physical robots without any demonstrations, privileged simulator inputs, or online gradient updates.
The same inferred context supports force-dominant scenarios where raw parameter adaptation from simulators proves insufficient.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inference interface could be tested on contact-rich tasks outside the current assembly family to check whether the 6D context representation transfers.
Refining the contrastive objective to better align simulated and real force regimes might reduce the remaining Real2Sim gap noted as future work.
Combining the inferred context with other meta-learning modules could support limited generalization to tasks outside the trained family.
The method's reliance on histories alone suggests it may lower the data cost of deploying contact-rich policies by shifting more adaptation burden into the inference model.

Load-bearing premise

The context inferred by the causal Transformer is accurate enough to replace oracle simulator context and still drive successful within-episode adaptation on hardware.

What would settle it

A controlled hardware trial in which CoRMA achieves real success rates no higher than the FORGE baselines under identical target-pose noise would falsify the claim that the inferred context supports effective adaptation.

Figures

Figures reproduced from arXiv: 2605.22082 by Abdul Haseeb Nizamani, Chutong Wen, Dandi Zhou, Hongxu Ma, Jianqiao Zhu, Wentian Wang, Wuhao Wang, Xinhai Sun, Zhexiong Xue.

**Figure 1.** Figure 1: CoRMA pipeline overview. Stage 1 trains privileged contact-aware policies in Isaac Lab FORGE using RL-Games PPO with a simulator-only 6D semantic contact latent zt . Stage 2 trains a causal Transformer adapter to infer zˆt from deployable force, proprioceptive, and action histories, using semantic regression and force-regime contrastive learning. Stage 3 removes oracle zt and injects the adapter prediction… view at source ↗

**Figure 2.** Figure 2: CoRMA adapter. A causal Transformer encodes deployable force/proprioceptive/action history with a learned readout token. The semantic head predicts the 6D context zˆ used by the policy, while an auxiliary contrastive head produces ut for force-regime InfoNCE. Positives are defined by coarse contact regimes: free motion, first contact, guided sliding, and jamming. regime labels, such as first contact or gui… view at source ↗

**Figure 3.** Figure 3: Real-robot deployment tasks. CoRMA and FORGE are evaluated on the same Marvin hardware interface across NutThread, PegInsert, and GearMesh [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Real force-regime evidence for positive X contact direction. Wrench-derived features are aligned at contact onset. Direct flat-surface contact and inner-rim sliding show different lateral-force ratios, force derivatives, and contact-direction signatures, motivating the coarse contact regimes used for contrastive supervision. semantics. All plots in this section are diagnostic and do not imply fully task-in… view at source ↗

**Figure 5.** Figure 5: Real force-regime evidence for negative X contact direction. The sharp change near contact onset indicates the transition from free motion to first contact, while the sustained post-onset force texture differentiates guided sliding from direct contact. E.3 Force-Regime Separability from Predicted Context We further evaluate whether the predicted semantic context zˆt preserves the coarse force-regime inform… view at source ↗

**Figure 6.** Figure 6: Real force-regime evidence for positive Y contact direction. The lateral-force ratio, normalized force derivative, and contact-direction angle provide deployable wrench cues for separating first contact, guided sliding, and jam-like interaction [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Real force-regime evidence for negative Y contact direction. Across probe groups, innerrim sliding exhibits sustained lateral-force structure and directionally consistent contact signatures, supporting the use of coarse force-regime labels in Stage 2. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: GearMesh–PegInsert adapter embedding diagnostic. Left: PCA of the predicted semantic context zˆt . Right: PCA of the auxiliary contrastive embedding ut . Colors indicate coarse force regimes and markers indicate task identity. Compared with zˆt , the contrastive embedding shows clearer regime-level organization, supporting the use of force-regime positives and negatives for InfoNCE [PITH_FULL_IMAGE:figure… view at source ↗

**Figure 9.** Figure 9: GearMesh–NutThread adapter embedding diagnostic. Left: PCA of zˆt . Right: PCA of ut . The auxiliary contrastive embedding provides clearer force-regime organization than the semantic regression output alone. This visualization supports the contrastive head as a representationstructuring objective, not as a replacement for supervised Privileged Z regression. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: NutThread–PegInsert adapter embedding diagnostic. Left: PCA of zˆt . Right: PCA of ut . The contrastive space separates coarse contact regimes more clearly than the regression output alone, indicating that force-regime positives and negatives provide useful semantic structure. Taskspecific structure may still remain, so this should be interpreted as qualitative diagnostic evidence rather than proof of co… view at source ↗

**Figure 11.** Figure 11: GearMesh–PegInsert semantic context prediction. Predicted zˆt is plotted against oracle Privileged Z for the six semantic dimensions. The diagonal trend indicates that the adapter preserves numerical fidelity to the simulator-derived contact context [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: GearMesh–NutThread semantic context prediction. Predicted zˆt follows oracle zt across the trained semantic dimensions, supporting the use of zˆt as the deployable context injected into the policy [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: NutThread–PegInsert semantic context prediction. The adapter prediction ˆzt remains aligned with oracle Privileged Z across contact semantics, showing that the contrastive auxiliary objective does not replace or collapse the supervised semantic regression target. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Force-regime separability from semantic context. For each pairwise adapter, a simple probe predicts coarse force-regime labels from oracle zt and predicted zˆt . The predicted context remains above chance and close to the oracle context across task pairs, indicating that zˆt preserves regime-relevant semantic information. This diagnostic supports the use of zˆt as a deployable contact context, but does no… view at source ↗

read the original abstract

We present CoRMA(Contrastive Robotic Motor Adaptation), a context-based meta-adaptation framework that modifies RMA for force-dominant assembly. CoRMA replaces raw simulator-parameter adaptation with a compact 6D simulator-only semantic contact context describing contact onset, lateral engagement, guided transition, contact direction, and jamming. A deployable causal Transformer adapter infers this context online from force, proprioceptive, and action histories using semantic regression and a force-regime contrastive objective. At deployment, oracle context is removed and replaced by the inferred context, enabling within-episode adaptation without demonstrations, privileged inputs, or gradient updates. We evaluate CoRMA on PegInsert, GearMesh, and NutThread in Isaac Lab / Isaac Sim~5.0 and on a real Marvin arm. Compared with FORGE baselines that achieve high simulation success but degrade substantially on hardware, CoRMA retains higher verified real success under controlled target-pose noise. These results support semantic contact inference as a reusable adaptation interface within a related assembly task family, while broader unseen-task generalization and Real2Sim calibration remain future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoRMA swaps raw sim parameters in RMA for a 6D semantic contact context inferred by a causal Transformer under contrastive force-regime training, which the abstract claims improves hardware retention on assembly tasks but without numbers to back the inference step.

read the letter

CoRMA modifies RMA by replacing simulator-parameter adaptation with a compact 6D semantic contact context that encodes onset, lateral engagement, guided transition, direction, and jamming. A causal Transformer infers this vector online from force, proprioceptive, and action histories via semantic regression plus a force-regime contrastive objective. At deployment the oracle context drops out, so the policy adapts within an episode on the real Marvin arm for PegInsert, GearMesh, and NutThread without demonstrations or gradient steps. The abstract states this keeps higher verified success under target-pose noise where FORGE baselines fall off on hardware. That practical framing for a family of related contact-rich tasks is the clearest contribution. The design choice to make the context both simulator-only during training and deployable at test time is straightforward and addresses a real pain point in sim-to-real for force-dominant assembly. The stress-test concern lands: the summary supplies no quantitative alignment metric between inferred and oracle context vectors on the physical robot, no error bars, and no ablation that isolates the semantic inference from the contrastive regularization or the noise schedule. Without those checks it remains possible the reported hardware gains come from elsewhere. The full manuscript may contain the missing tables and figures, but the abstract alone leaves the load-bearing claim under-supported. This paper is for robotics groups working on meta-adaptation and contact-rich manipulation. A reader already familiar with RMA or sim-to-real force policies would get the most out of the specific architectural and objective choices. It is coherent enough on its own terms to deserve a serious referee who can examine the full experiments and ask for the missing alignment data.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces CoRMA, a context-based meta-adaptation framework extending RMA for force-dominant assembly tasks. It replaces raw simulator-parameter adaptation with a compact 6D semantic contact context (onset, lateral engagement, guided transition, contact direction, jamming) inferred online by a deployable causal Transformer from force, proprioceptive, and action histories. Training uses semantic regression plus a force-regime contrastive objective; at deployment the oracle context is removed, enabling within-episode adaptation on hardware without demonstrations, privileged inputs, or gradient updates. Experiments on PegInsert, GearMesh, and NutThread in Isaac Lab/Sim and on a real Marvin arm report that CoRMA retains higher verified real success under target-pose noise than FORGE baselines that degrade substantially on hardware.

Significance. If the empirical claims are substantiated, the work would demonstrate a reusable semantic-contact interface for within-family adaptation in contact-rich assembly, reducing reliance on privileged simulator information during real-robot deployment. The contrastive objective for force-regime distinction and the causal Transformer adapter constitute concrete technical contributions that could be adopted in related manipulation pipelines.

major comments (3)

Abstract: the claim that CoRMA 'retains higher verified real success under controlled target-pose noise' is asserted without any numerical success rates, error bars, statistical tests, ablation results, or data-exclusion criteria. This absence directly weakens the central empirical claim and prevents assessment of effect size or robustness.
Real-world evaluation (as summarized in the abstract and skeptic note): the central claim requires that the causal Transformer’s inferred 6D semantic contact vector is sufficiently accurate to substitute for oracle simulator context on hardware. No quantitative alignment metric (MSE, classification accuracy, or correlation) between inferred and oracle 6D vectors collected on the physical Marvin arm is reported. Without this evidence it remains possible that observed gains arise from contrastive regularization, base-policy robustness, or the noise schedule rather than faithful semantic inference.
Method description: the 6D context is presented as a compact, simulator-only semantic representation, yet no explicit mapping from simulator parameters to the five semantic components (onset, lateral engagement, etc.) or any derivation showing compactness is supplied. This leaves open whether the representation is truly parameter-free or merely reparameterized.

minor comments (2)

Add explicit definitions or a table for the five semantic contact dimensions and their relation to simulator state variables.
Clarify the precise form of the force-regime contrastive loss (positive/negative pair construction, temperature, etc.) so that the training objective can be reproduced.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments on our work. We address each of the major comments point by point below, providing clarifications and indicating revisions to the manuscript where appropriate.

read point-by-point responses

Referee: Abstract: the claim that CoRMA 'retains higher verified real success under controlled target-pose noise' is asserted without any numerical success rates, error bars, statistical tests, ablation results, or data-exclusion criteria. This absence directly weakens the central empirical claim and prevents assessment of effect size or robustness.

Authors: We agree that including quantitative support in the abstract would strengthen the presentation. The full experimental results, including success rates, standard deviations, and comparisons to baselines, are provided in Section 5 of the manuscript along with statistical analysis. In the revised manuscript, we have updated the abstract to include key numerical results such as average success rates across tasks and a note on the evaluation protocol used (e.g., number of trials and noise levels). This allows readers to better assess the effect size without needing to refer to the body immediately. revision: yes
Referee: Real-world evaluation (as summarized in the abstract and skeptic note): the central claim requires that the causal Transformer’s inferred 6D semantic contact vector is sufficiently accurate to substitute for oracle simulator context on hardware. No quantitative alignment metric (MSE, classification accuracy, or correlation) between inferred and oracle 6D vectors collected on the physical Marvin arm is reported. Without this evidence it remains possible that observed gains arise from contrastive regularization, base-policy robustness, or the noise schedule rather than faithful semantic inference.

Authors: We appreciate this observation. However, the oracle 6D semantic contact context is inherently tied to simulator-specific parameters and states, which cannot be directly measured or computed on the physical robot. As such, collecting paired inferred-oracle vectors on hardware is not possible. We instead demonstrate the effectiveness of the inferred context through improved real-world task success rates compared to baselines that do not use adaptation. To further support this, we have added simulation-based ablations in the revised paper showing strong alignment between inferred and oracle contexts in simulation (with correlation coefficients reported in the supplementary material), and we discuss the sim-to-real transfer assumptions. We believe the performance gains are attributable to the semantic inference as the contrastive objective and other components are ablated in the experiments. revision: partial
Referee: Method description: the 6D context is presented as a compact, simulator-only semantic representation, yet no explicit mapping from simulator parameters to the five semantic components (onset, lateral engagement, etc.) or any derivation showing compactness is supplied. This leaves open whether the representation is truly parameter-free or merely reparameterized.

Authors: We acknowledge that the mapping and compactness analysis were not sufficiently detailed. In the revised manuscript, we have expanded the method section to include an explicit description of how each semantic component is computed from the simulator state and parameters. For example, contact onset is triggered when force exceeds a threshold, lateral engagement is based on lateral displacement, and so on for the other components. Additionally, we provide a comparison of dimensionality, noting that the raw simulator parameters for contact-rich tasks can involve over 15-20 variables, while the 6D context provides a compact encoding focused on semantically relevant aspects for adaptation. revision: yes

Circularity Check

0 steps flagged

No circularity: framework uses standard training objectives without self-referential definitions or fitted predictions.

full rationale

The manuscript describes an empirical meta-adaptation framework that trains a causal Transformer adapter via a force-regime contrastive objective to infer a 6D semantic contact context from histories, then deploys the inferred context in place of oracle simulator values. No equations, derivations, or first-principles results are presented that could reduce to their own inputs by construction. The contrastive objective functions as a conventional training loss rather than a quantity whose value is presupposed by the claimed prediction. Central performance claims rest on hardware experiments versus FORGE baselines, which constitute independent empirical evidence rather than a closed loop of self-citation or renaming. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the 6D context and contrastive objective are introduced as new constructs but without derivation details or independent evidence.

pith-pipeline@v0.9.0 · 5758 in / 1178 out tokens · 37027 ms · 2026-05-22T05:33:17.914059+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CoRMA replaces raw simulator-parameter adaptation with a compact 6D simulator-only semantic contact context describing contact onset, lateral engagement, guided transition, contact direction, and jamming. A deployable causal Transformer adapter infers this context online from force, proprioceptive, and action histories using semantic regression and a force-regime contrastive objective.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 10 internal anchors

[1]

Narang, K

Y . Narang, K. Storey, I. Akinola, M. Macklin, P. Reist, L. Wawrzyniak, Y . Guo, A. Moravanszky, G. State, M. Lu, A. Handa, and D. Fox. Factory: Fast contact for robotic assembly, 2022. URL https://arxiv.org/abs/2205.03532

work page arXiv 2022
[2]

B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y . Narang. Industreal: Transferring contact-rich assembly tasks from simulation to reality, 2023. URL https://arxiv.org/abs/2305.17110

work page arXiv 2023
[3]

B. Tang, I. Akinola, J. Xu, B. Wen, A. Handa, K. V . Wyk, D. Fox, G. S. Sukhatme, F. Ramos, and Y . Narang. Automate: Specialist and generalist assembly policies over diverse geometries,

work page
[4]

URLhttps://arxiv.org/abs/2407.08028

work page arXiv
[5]

Noseworthy, B

M. Noseworthy, B. Tang, B. Wen, A. Handa, C. Kessens, N. Roy, D. Fox, F. Ramos, Y . Narang, and I. Akinola. Forge: Force-guided exploration for robust contact-rich manipulation under uncertainty, 2025. URLhttps://arxiv.org/abs/2408.04587

work page arXiv 2025
[6]

Pomerleau

D. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In D. Touretzky, editor,Proceedings of (NeurIPS) Neural Information Processing Systems, pages 305 – 313. Morgan Kaufmann, December 1989

work page 1989
[7]

S. Ross, G. J. Gordon, and J. A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning, 2011. URL https://arxiv.org/abs/1011. 0686

work page 2011
[8]

S. Schaal. Is imitation learning the route to humanoid robots?Trends in Cognitive Sci- ences, 3(6):233–242, 1999. ISSN 1364-6613. doi:https://doi.org/10.1016/S1364-6613(99) 01327-3. URL https://www.sciencedirect.com/science/article/pii/ S1364661399013273

work page doi:10.1016/s1364-6613(99 1999
[9]

Abbeel and A

P. Abbeel and A. Y . Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (ICML ’04), pages 1–8. ACM, 2004

work page 2004
[10]

C. Finn, S. Levine, and P. Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization, 2016. URLhttps://arxiv.org/abs/1603.00448

work page internal anchor Pith review Pith/arXiv arXiv 2016
[11]

Generative Adversarial Imitation Learning

J. Ho and S. Ermon. Generative adversarial imitation learning, 2016. URL https://arxiv. org/abs/1606.03476

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

Residual Reinforcement Learning for Robot Control

T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine. Residual reinforcement learning for robot control, 2018. URL https://arxiv. org/abs/1812.03201

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Residual Policy Learning

T. Silver, K. Allen, J. Tenenbaum, and L. Kaelbling. Residual policy learning, 2019. URL https://arxiv.org/abs/1812.06298

work page internal anchor Pith review Pith/arXiv arXiv 2019
[14]

Ankile, Z

L. Ankile, Z. Jiang, R. Duan, G. Shi, P. Abbeel, and A. Nagabandi. Residual off-policy rl for finetuning behavior cloning policies, 2025. URL https://arxiv.org/abs/2509. 19301

work page 2025
[15]

Salloom, X

T. Salloom, X. Yu, W. He, and O. Kaynak. Adaptive neural network control of underwater robotic manipulators tuned by a genetic algorithm.Journal of Intelligent & Robotic Systems, 97(3–4):657–672, 2020. doi:10.1007/s10846-019-01008-y. URL https://doi.org/10. 1007/s10846-019-01008-y

work page doi:10.1007/s10846-019-01008-y 2020
[16]

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, 2016. URL https: //arxiv.org/abs/1603.02199. 9

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

Rakelly, A

K. Rakelly, A. Zhou, D. Quillen, C. Finn, and S. Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables, 2019. URL https://arxiv.org/abs/1903. 08254

work page 2019
[18]

C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks, 2017. URLhttps://arxiv.org/abs/1703.03400

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Kumar, Z

A. Kumar, Z. Fu, D. Pathak, and J. Malik. Rma: Rapid motor adaptation for legged robots,

work page
[20]

URLhttps://arxiv.org/abs/2107.04034

work page internal anchor Pith review Pith/arXiv arXiv
[21]

G. Liu, M. Tang, and B. Eysenbach. A single goal is all you need: Skills and exploration emerge from contrastive rl without rewards, demonstrations, or subgoals, 2024. URL https: //arxiv.org/abs/2408.05804

work page arXiv 2024
[22]

Eysenbach, T

B. Eysenbach, T. Zhang, R. Salakhutdinov, and S. Levine. Contrastive learning as goal- conditioned reinforcement learning, 2023. URL https://arxiv.org/abs/2206. 07568

work page 2023
[23]

Y . Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl2: Fast reinforce- ment learning via slow reinforcement learning, 2016. URL https://arxiv.org/abs/ 1611.02779

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

R. Liu, Y . Du, F. Bai, J. Lyu, and X. Li. Pearl: Zero-shot cross-task preference alignment and robust reward learning for robotic manipulation, 2024. URL https://arxiv.org/abs/ 2306.03615

work page arXiv 2024
[25]

Varibad: A very good method for bayes-adaptive deep rl via meta-learning

L. Zintgraf, K. Shiarlis, M. Igl, S. Schulze, Y . Gal, K. Hofmann, and S. Whiteson. Varibad: A very good method for bayes-adaptive deep rl via meta-learning, 2020. URL https:// arxiv.org/abs/1910.08348

work page arXiv 2020
[26]

Liang, K

Y . Liang, K. Ellis, and J. Henriques. Rapid motor adaptation for robotic manipulator arms, 2024. URLhttps://arxiv.org/abs/2312.04670

work page arXiv 2024
[27]

Parisotto, H

E. Parisotto, H. F. Song, J. W. Rae, R. Pascanu, C. Gulcehre, S. M. Jayakumar, M. Jaderberg, R. L. Kaufman, A. Clark, S. Noury, M. M. Botvinick, N. Heess, and R. Hadsell. Stabilizing transformers for reinforcement learning, 2019. URL https://arxiv.org/abs/1910. 06764

work page 2019
[28]

Srinivas, M

A. Srinivas, M. Laskin, and P. Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning, 2020. URLhttps://arxiv.org/abs/2004.04136

work page arXiv 2020
[29]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

NVIDIA, :, M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Muñoz, X. Yao, R. Zurbrügg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Mal...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017. URLhttps://arxiv.org/abs/1707.06347. 10

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Beeson and B

P. Beeson and B. Ames. TRAC-IK: An open-source library for improved solving of generic inverse kinematics. In2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935, Seoul, South Korea, 2015. IEEE. doi:10.1109/HUMANOIDS. 2015.7363472. 11 Table 3:Real-robot verified-success accounting.Real success is measured by the final ...

work page doi:10.1109/humanoids 2015

[1] [1]

Narang, K

Y . Narang, K. Storey, I. Akinola, M. Macklin, P. Reist, L. Wawrzyniak, Y . Guo, A. Moravanszky, G. State, M. Lu, A. Handa, and D. Fox. Factory: Fast contact for robotic assembly, 2022. URL https://arxiv.org/abs/2205.03532

work page arXiv 2022

[2] [2]

B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y . Narang. Industreal: Transferring contact-rich assembly tasks from simulation to reality, 2023. URL https://arxiv.org/abs/2305.17110

work page arXiv 2023

[3] [3]

B. Tang, I. Akinola, J. Xu, B. Wen, A. Handa, K. V . Wyk, D. Fox, G. S. Sukhatme, F. Ramos, and Y . Narang. Automate: Specialist and generalist assembly policies over diverse geometries,

work page

[4] [4]

URLhttps://arxiv.org/abs/2407.08028

work page arXiv

[5] [5]

Noseworthy, B

M. Noseworthy, B. Tang, B. Wen, A. Handa, C. Kessens, N. Roy, D. Fox, F. Ramos, Y . Narang, and I. Akinola. Forge: Force-guided exploration for robust contact-rich manipulation under uncertainty, 2025. URLhttps://arxiv.org/abs/2408.04587

work page arXiv 2025

[6] [6]

Pomerleau

D. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In D. Touretzky, editor,Proceedings of (NeurIPS) Neural Information Processing Systems, pages 305 – 313. Morgan Kaufmann, December 1989

work page 1989

[7] [7]

S. Ross, G. J. Gordon, and J. A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning, 2011. URL https://arxiv.org/abs/1011. 0686

work page 2011

[8] [8]

S. Schaal. Is imitation learning the route to humanoid robots?Trends in Cognitive Sci- ences, 3(6):233–242, 1999. ISSN 1364-6613. doi:https://doi.org/10.1016/S1364-6613(99) 01327-3. URL https://www.sciencedirect.com/science/article/pii/ S1364661399013273

work page doi:10.1016/s1364-6613(99 1999

[9] [9]

Abbeel and A

P. Abbeel and A. Y . Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (ICML ’04), pages 1–8. ACM, 2004

work page 2004

[10] [10]

C. Finn, S. Levine, and P. Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization, 2016. URLhttps://arxiv.org/abs/1603.00448

work page internal anchor Pith review Pith/arXiv arXiv 2016

[11] [11]

Generative Adversarial Imitation Learning

J. Ho and S. Ermon. Generative adversarial imitation learning, 2016. URL https://arxiv. org/abs/1606.03476

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

Residual Reinforcement Learning for Robot Control

T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine. Residual reinforcement learning for robot control, 2018. URL https://arxiv. org/abs/1812.03201

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Residual Policy Learning

T. Silver, K. Allen, J. Tenenbaum, and L. Kaelbling. Residual policy learning, 2019. URL https://arxiv.org/abs/1812.06298

work page internal anchor Pith review Pith/arXiv arXiv 2019

[14] [14]

Ankile, Z

L. Ankile, Z. Jiang, R. Duan, G. Shi, P. Abbeel, and A. Nagabandi. Residual off-policy rl for finetuning behavior cloning policies, 2025. URL https://arxiv.org/abs/2509. 19301

work page 2025

[15] [15]

Salloom, X

T. Salloom, X. Yu, W. He, and O. Kaynak. Adaptive neural network control of underwater robotic manipulators tuned by a genetic algorithm.Journal of Intelligent & Robotic Systems, 97(3–4):657–672, 2020. doi:10.1007/s10846-019-01008-y. URL https://doi.org/10. 1007/s10846-019-01008-y

work page doi:10.1007/s10846-019-01008-y 2020

[16] [16]

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, 2016. URL https: //arxiv.org/abs/1603.02199. 9

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

Rakelly, A

K. Rakelly, A. Zhou, D. Quillen, C. Finn, and S. Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables, 2019. URL https://arxiv.org/abs/1903. 08254

work page 2019

[18] [18]

C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks, 2017. URLhttps://arxiv.org/abs/1703.03400

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Kumar, Z

A. Kumar, Z. Fu, D. Pathak, and J. Malik. Rma: Rapid motor adaptation for legged robots,

work page

[20] [20]

URLhttps://arxiv.org/abs/2107.04034

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

G. Liu, M. Tang, and B. Eysenbach. A single goal is all you need: Skills and exploration emerge from contrastive rl without rewards, demonstrations, or subgoals, 2024. URL https: //arxiv.org/abs/2408.05804

work page arXiv 2024

[22] [22]

Eysenbach, T

B. Eysenbach, T. Zhang, R. Salakhutdinov, and S. Levine. Contrastive learning as goal- conditioned reinforcement learning, 2023. URL https://arxiv.org/abs/2206. 07568

work page 2023

[23] [23]

Y . Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl2: Fast reinforce- ment learning via slow reinforcement learning, 2016. URL https://arxiv.org/abs/ 1611.02779

work page internal anchor Pith review Pith/arXiv arXiv 2016

[24] [24]

R. Liu, Y . Du, F. Bai, J. Lyu, and X. Li. Pearl: Zero-shot cross-task preference alignment and robust reward learning for robotic manipulation, 2024. URL https://arxiv.org/abs/ 2306.03615

work page arXiv 2024

[25] [25]

Varibad: A very good method for bayes-adaptive deep rl via meta-learning

L. Zintgraf, K. Shiarlis, M. Igl, S. Schulze, Y . Gal, K. Hofmann, and S. Whiteson. Varibad: A very good method for bayes-adaptive deep rl via meta-learning, 2020. URL https:// arxiv.org/abs/1910.08348

work page arXiv 2020

[26] [26]

Liang, K

Y . Liang, K. Ellis, and J. Henriques. Rapid motor adaptation for robotic manipulator arms, 2024. URLhttps://arxiv.org/abs/2312.04670

work page arXiv 2024

[27] [27]

Parisotto, H

E. Parisotto, H. F. Song, J. W. Rae, R. Pascanu, C. Gulcehre, S. M. Jayakumar, M. Jaderberg, R. L. Kaufman, A. Clark, S. Noury, M. M. Botvinick, N. Heess, and R. Hadsell. Stabilizing transformers for reinforcement learning, 2019. URL https://arxiv.org/abs/1910. 06764

work page 2019

[28] [28]

Srinivas, M

A. Srinivas, M. Laskin, and P. Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning, 2020. URLhttps://arxiv.org/abs/2004.04136

work page arXiv 2020

[29] [29]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

NVIDIA, :, M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Muñoz, X. Yao, R. Zurbrügg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Mal...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017. URLhttps://arxiv.org/abs/1707.06347. 10

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Beeson and B

P. Beeson and B. Ames. TRAC-IK: An open-source library for improved solving of generic inverse kinematics. In2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935, Seoul, South Korea, 2015. IEEE. doi:10.1109/HUMANOIDS. 2015.7363472. 11 Table 3:Real-robot verified-success accounting.Real success is measured by the final ...

work page doi:10.1109/humanoids 2015