CoRMA: Contrastive RMA for Contact-Rich Meta-Adaptation
Pith reviewed 2026-05-22 05:33 UTC · model grok-4.3
The pith
CoRMA replaces raw simulator parameters with a 6D semantic contact context inferred by a causal Transformer from force and motion histories to enable real-time adaptation in contact-rich assembly without demonstrations or updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoRMA shows that a causal Transformer adapter, trained with semantic regression and a force-regime contrastive objective on simulator data, can infer a 6D semantic contact context from force, proprioceptive, and action histories; replacing the oracle context with this inferred context at test time produces effective within-episode adaptation on real hardware for PegInsert, GearMesh, and NutThread without demonstrations, privileged inputs, or gradient updates.
What carries the argument
The deployable causal Transformer adapter that infers the 6D semantic contact context via semantic regression and force-regime contrastive training.
If this is right
- Higher verified real-world success rates are retained for PegInsert, GearMesh, and NutThread under target-pose noise compared with baselines that degrade after simulation training.
- Semantic contact inference functions as a reusable adaptation interface across a family of related assembly tasks.
- Within-episode adaptation occurs on physical robots without any demonstrations, privileged simulator inputs, or online gradient updates.
- The same inferred context supports force-dominant scenarios where raw parameter adaptation from simulators proves insufficient.
Where Pith is reading between the lines
- The same inference interface could be tested on contact-rich tasks outside the current assembly family to check whether the 6D context representation transfers.
- Refining the contrastive objective to better align simulated and real force regimes might reduce the remaining Real2Sim gap noted as future work.
- Combining the inferred context with other meta-learning modules could support limited generalization to tasks outside the trained family.
- The method's reliance on histories alone suggests it may lower the data cost of deploying contact-rich policies by shifting more adaptation burden into the inference model.
Load-bearing premise
The context inferred by the causal Transformer is accurate enough to replace oracle simulator context and still drive successful within-episode adaptation on hardware.
What would settle it
A controlled hardware trial in which CoRMA achieves real success rates no higher than the FORGE baselines under identical target-pose noise would falsify the claim that the inferred context supports effective adaptation.
Figures
read the original abstract
We present CoRMA(Contrastive Robotic Motor Adaptation), a context-based meta-adaptation framework that modifies RMA for force-dominant assembly. CoRMA replaces raw simulator-parameter adaptation with a compact 6D simulator-only semantic contact context describing contact onset, lateral engagement, guided transition, contact direction, and jamming. A deployable causal Transformer adapter infers this context online from force, proprioceptive, and action histories using semantic regression and a force-regime contrastive objective. At deployment, oracle context is removed and replaced by the inferred context, enabling within-episode adaptation without demonstrations, privileged inputs, or gradient updates. We evaluate CoRMA on PegInsert, GearMesh, and NutThread in Isaac Lab / Isaac Sim~5.0 and on a real Marvin arm. Compared with FORGE baselines that achieve high simulation success but degrade substantially on hardware, CoRMA retains higher verified real success under controlled target-pose noise. These results support semantic contact inference as a reusable adaptation interface within a related assembly task family, while broader unseen-task generalization and Real2Sim calibration remain future work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CoRMA, a context-based meta-adaptation framework extending RMA for force-dominant assembly tasks. It replaces raw simulator-parameter adaptation with a compact 6D semantic contact context (onset, lateral engagement, guided transition, contact direction, jamming) inferred online by a deployable causal Transformer from force, proprioceptive, and action histories. Training uses semantic regression plus a force-regime contrastive objective; at deployment the oracle context is removed, enabling within-episode adaptation on hardware without demonstrations, privileged inputs, or gradient updates. Experiments on PegInsert, GearMesh, and NutThread in Isaac Lab/Sim and on a real Marvin arm report that CoRMA retains higher verified real success under target-pose noise than FORGE baselines that degrade substantially on hardware.
Significance. If the empirical claims are substantiated, the work would demonstrate a reusable semantic-contact interface for within-family adaptation in contact-rich assembly, reducing reliance on privileged simulator information during real-robot deployment. The contrastive objective for force-regime distinction and the causal Transformer adapter constitute concrete technical contributions that could be adopted in related manipulation pipelines.
major comments (3)
- Abstract: the claim that CoRMA 'retains higher verified real success under controlled target-pose noise' is asserted without any numerical success rates, error bars, statistical tests, ablation results, or data-exclusion criteria. This absence directly weakens the central empirical claim and prevents assessment of effect size or robustness.
- Real-world evaluation (as summarized in the abstract and skeptic note): the central claim requires that the causal Transformer’s inferred 6D semantic contact vector is sufficiently accurate to substitute for oracle simulator context on hardware. No quantitative alignment metric (MSE, classification accuracy, or correlation) between inferred and oracle 6D vectors collected on the physical Marvin arm is reported. Without this evidence it remains possible that observed gains arise from contrastive regularization, base-policy robustness, or the noise schedule rather than faithful semantic inference.
- Method description: the 6D context is presented as a compact, simulator-only semantic representation, yet no explicit mapping from simulator parameters to the five semantic components (onset, lateral engagement, etc.) or any derivation showing compactness is supplied. This leaves open whether the representation is truly parameter-free or merely reparameterized.
minor comments (2)
- Add explicit definitions or a table for the five semantic contact dimensions and their relation to simulator state variables.
- Clarify the precise form of the force-regime contrastive loss (positive/negative pair construction, temperature, etc.) so that the training objective can be reproduced.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our work. We address each of the major comments point by point below, providing clarifications and indicating revisions to the manuscript where appropriate.
read point-by-point responses
-
Referee: Abstract: the claim that CoRMA 'retains higher verified real success under controlled target-pose noise' is asserted without any numerical success rates, error bars, statistical tests, ablation results, or data-exclusion criteria. This absence directly weakens the central empirical claim and prevents assessment of effect size or robustness.
Authors: We agree that including quantitative support in the abstract would strengthen the presentation. The full experimental results, including success rates, standard deviations, and comparisons to baselines, are provided in Section 5 of the manuscript along with statistical analysis. In the revised manuscript, we have updated the abstract to include key numerical results such as average success rates across tasks and a note on the evaluation protocol used (e.g., number of trials and noise levels). This allows readers to better assess the effect size without needing to refer to the body immediately. revision: yes
-
Referee: Real-world evaluation (as summarized in the abstract and skeptic note): the central claim requires that the causal Transformer’s inferred 6D semantic contact vector is sufficiently accurate to substitute for oracle simulator context on hardware. No quantitative alignment metric (MSE, classification accuracy, or correlation) between inferred and oracle 6D vectors collected on the physical Marvin arm is reported. Without this evidence it remains possible that observed gains arise from contrastive regularization, base-policy robustness, or the noise schedule rather than faithful semantic inference.
Authors: We appreciate this observation. However, the oracle 6D semantic contact context is inherently tied to simulator-specific parameters and states, which cannot be directly measured or computed on the physical robot. As such, collecting paired inferred-oracle vectors on hardware is not possible. We instead demonstrate the effectiveness of the inferred context through improved real-world task success rates compared to baselines that do not use adaptation. To further support this, we have added simulation-based ablations in the revised paper showing strong alignment between inferred and oracle contexts in simulation (with correlation coefficients reported in the supplementary material), and we discuss the sim-to-real transfer assumptions. We believe the performance gains are attributable to the semantic inference as the contrastive objective and other components are ablated in the experiments. revision: partial
-
Referee: Method description: the 6D context is presented as a compact, simulator-only semantic representation, yet no explicit mapping from simulator parameters to the five semantic components (onset, lateral engagement, etc.) or any derivation showing compactness is supplied. This leaves open whether the representation is truly parameter-free or merely reparameterized.
Authors: We acknowledge that the mapping and compactness analysis were not sufficiently detailed. In the revised manuscript, we have expanded the method section to include an explicit description of how each semantic component is computed from the simulator state and parameters. For example, contact onset is triggered when force exceeds a threshold, lateral engagement is based on lateral displacement, and so on for the other components. Additionally, we provide a comparison of dimensionality, noting that the raw simulator parameters for contact-rich tasks can involve over 15-20 variables, while the 6D context provides a compact encoding focused on semantically relevant aspects for adaptation. revision: yes
Circularity Check
No circularity: framework uses standard training objectives without self-referential definitions or fitted predictions.
full rationale
The manuscript describes an empirical meta-adaptation framework that trains a causal Transformer adapter via a force-regime contrastive objective to infer a 6D semantic contact context from histories, then deploys the inferred context in place of oracle simulator values. No equations, derivations, or first-principles results are presented that could reduce to their own inputs by construction. The contrastive objective functions as a conventional training loss rather than a quantity whose value is presupposed by the claimed prediction. Central performance claims rest on hardware experiments versus FORGE baselines, which constitute independent empirical evidence rather than a closed loop of self-citation or renaming. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CoRMA replaces raw simulator-parameter adaptation with a compact 6D simulator-only semantic contact context describing contact onset, lateral engagement, guided transition, contact direction, and jamming. A deployable causal Transformer adapter infers this context online from force, proprioceptive, and action histories using semantic regression and a force-regime contrastive objective.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
B. Tang, I. Akinola, J. Xu, B. Wen, A. Handa, K. V . Wyk, D. Fox, G. S. Sukhatme, F. Ramos, and Y . Narang. Automate: Specialist and generalist assembly policies over diverse geometries,
- [4]
-
[5]
M. Noseworthy, B. Tang, B. Wen, A. Handa, C. Kessens, N. Roy, D. Fox, F. Ramos, Y . Narang, and I. Akinola. Forge: Force-guided exploration for robust contact-rich manipulation under uncertainty, 2025. URLhttps://arxiv.org/abs/2408.04587
- [6]
-
[7]
S. Ross, G. J. Gordon, and J. A. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning, 2011. URL https://arxiv.org/abs/1011. 0686
work page 2011
-
[8]
S. Schaal. Is imitation learning the route to humanoid robots?Trends in Cognitive Sci- ences, 3(6):233–242, 1999. ISSN 1364-6613. doi:https://doi.org/10.1016/S1364-6613(99) 01327-3. URL https://www.sciencedirect.com/science/article/pii/ S1364661399013273
-
[9]
P. Abbeel and A. Y . Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (ICML ’04), pages 1–8. ACM, 2004
work page 2004
-
[10]
C. Finn, S. Levine, and P. Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization, 2016. URLhttps://arxiv.org/abs/1603.00448
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[11]
Generative Adversarial Imitation Learning
J. Ho and S. Ermon. Generative adversarial imitation learning, 2016. URL https://arxiv. org/abs/1606.03476
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[12]
Residual Reinforcement Learning for Robot Control
T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine. Residual reinforcement learning for robot control, 2018. URL https://arxiv. org/abs/1812.03201
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
T. Silver, K. Allen, J. Tenenbaum, and L. Kaelbling. Residual policy learning, 2019. URL https://arxiv.org/abs/1812.06298
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [14]
-
[15]
T. Salloom, X. Yu, W. He, and O. Kaynak. Adaptive neural network control of underwater robotic manipulators tuned by a genetic algorithm.Journal of Intelligent & Robotic Systems, 97(3–4):657–672, 2020. doi:10.1007/s10846-019-01008-y. URL https://doi.org/10. 1007/s10846-019-01008-y
-
[16]
S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, 2016. URL https: //arxiv.org/abs/1603.02199. 9
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
K. Rakelly, A. Zhou, D. Quillen, C. Finn, and S. Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables, 2019. URL https://arxiv.org/abs/1903. 08254
work page 2019
-
[18]
C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks, 2017. URLhttps://arxiv.org/abs/1703.03400
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [19]
-
[20]
URLhttps://arxiv.org/abs/2107.04034
work page internal anchor Pith review Pith/arXiv arXiv
- [21]
-
[22]
B. Eysenbach, T. Zhang, R. Salakhutdinov, and S. Levine. Contrastive learning as goal- conditioned reinforcement learning, 2023. URL https://arxiv.org/abs/2206. 07568
work page 2023
-
[23]
Y . Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl2: Fast reinforce- ment learning via slow reinforcement learning, 2016. URL https://arxiv.org/abs/ 1611.02779
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [24]
-
[25]
Varibad: A very good method for bayes-adaptive deep rl via meta-learning
L. Zintgraf, K. Shiarlis, M. Igl, S. Schulze, Y . Gal, K. Hofmann, and S. Whiteson. Varibad: A very good method for bayes-adaptive deep rl via meta-learning, 2020. URL https:// arxiv.org/abs/1910.08348
- [26]
-
[27]
E. Parisotto, H. F. Song, J. W. Rae, R. Pascanu, C. Gulcehre, S. M. Jayakumar, M. Jaderberg, R. L. Kaufman, A. Clark, S. Noury, M. M. Botvinick, N. Heess, and R. Hadsell. Stabilizing transformers for reinforcement learning, 2019. URL https://arxiv.org/abs/1910. 06764
work page 2019
-
[28]
A. Srinivas, M. Laskin, and P. Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning, 2020. URLhttps://arxiv.org/abs/2004.04136
-
[29]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
NVIDIA, :, M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Muñoz, X. Yao, R. Zurbrügg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Mal...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017. URLhttps://arxiv.org/abs/1707.06347. 10
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[31]
P. Beeson and B. Ames. TRAC-IK: An open-source library for improved solving of generic inverse kinematics. In2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 928–935, Seoul, South Korea, 2015. IEEE. doi:10.1109/HUMANOIDS. 2015.7363472. 11 Table 3:Real-robot verified-success accounting.Real success is measured by the final ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.