Real-IKEA: Physical Fidelity is the Prerequisite for Robust Manipulation

Fan Shi; Kunqi Xu; Siyuan Luo; Zhenhao Huang; Ziqiu Zeng

arxiv: 2606.08564 · v1 · pith:JZYTRF3Jnew · submitted 2026-06-07 · 💻 cs.RO

Real-IKEA: Physical Fidelity is the Prerequisite for Robust Manipulation

Kunqi Xu , Zhenhao Huang , Siyuan Luo , Ziqiu Zeng , Fan Shi This is my paper

Pith reviewed 2026-06-27 18:17 UTC · model grok-4.3

classification 💻 cs.RO

keywords articulated manipulationphysical fidelityreinforcement learningsimulation assetsIKEA objectscontact dynamicsrobust policiesmechanical advantage

0 comments

The pith

High-fidelity simulation of real IKEA handles lets reinforcement-learning policies discover robust mechanical-advantage strategies instead of fragile friction pulling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that simplified simulation assets cause policies to rely on unreliable friction contact, while assets built from real parts with measured resistance and accurate geometry allow policies to find hooking and levering motions that use leverage. The authors build this claim by releasing 1,079 calibrated articulated configurations derived from 83 physical IKEA components through a six-step measurement and meshing workflow. They then train RL agents on both high- and low-fidelity versions and observe qualitatively different behaviors that favor mechanical advantage only when the physics is realistic. A sympathetic reader cares because the physics gap is a common reason sim-trained robots fail on contact-rich tasks once deployed.

Core claim

Real-IKEA demonstrates that physical fidelity in articulated assets is a prerequisite for robust manipulation policies: when damping, friction, and collision geometry are calibrated to real hardware, RL agents learn hooking and levering behaviors that exploit mechanical advantage rather than depending on fragile friction-based pulling.

What carries the argument

The Real-IKEA dataset of 1,079 articulated asset configurations produced from 83 real IKEA handles and knobs via a six-step physical workflow that includes bidirectional surface-deviation collision-mesh validation and resistance-calibrated damping and friction parameters.

If this is right

High-fidelity assets shift learned policies from friction-dependent to leverage-dependent strategies.
Low-fidelity assets produce policies that remain brittle under real contact resistance.
Accurate collision-mesh deviation metrics and resistance calibration are necessary to expose these strategy differences.
Articulated manipulation benchmarks must incorporate measured physical parameters to support transferable robustness claims.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future sim-to-real pipelines may need to prioritize contact-parameter calibration over visual fidelity alone.
The same fidelity requirement could apply to other contact-rich tasks such as door opening or drawer pulling beyond IKEA parts.
Policy comparison studies that omit fidelity controls may systematically underestimate the value of mechanical-advantage behaviors.

Load-bearing premise

The simulation dynamics produced by the six-step workflow and calibrated resistance values are close enough to real articulated contact that the observed policy differences will appear on physical hardware.

What would settle it

Train the same RL policies on the high-fidelity and low-fidelity versions, deploy both on physical IKEA hardware, and check whether the high-fidelity policy consistently succeeds with hooking or levering while the low-fidelity policy fails or reverts to slipping friction pulls.

Figures

Figures reproduced from arXiv: 2606.08564 by Fan Shi, Kunqi Xu, Siyuan Luo, Zhenhao Huang, Ziqiu Zeng.

**Figure 1.** Figure 1: Overview of the Real-IKEA. Real-IKEA provides a large-scale, high-fidelity library of articulated configurations as the foundation for learning robust, contact-rich manipulation policies. Abstract— Robotic manipulation robustness often founders on the physics gap between simplified simulations and the resistance-laden real world. In this work, we emphasize that physical realism in articulated interaction i… view at source ↗

**Figure 2.** Figure 2: Evaluation of physical interaction fidelity. We visu [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Heatmap visualization of collision errors. Left to [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Overview of the RL Policy. The agent observes a [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Emergence of form-closure strategies. Trained on [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 7.** Figure 7: The diversity of interactive components in Real-IKEA, categorized into knobs, two-point handles, and finger-pull [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Four characteristic failure modes observed when [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Mesh statistics of the Real-IKEA interactive parts. Higher vertex counts and smaller face areas provide the high [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Real-IKEA base assets encompass typical joint [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 11.** Figure 11: Heatmap visualization of physical interaction fidelity [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Checkpoint evaluation curves across three distinct [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

**Figure 13.** Figure 13: Learning curves for the RL policy trained via PPO [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗

read the original abstract

Robotic manipulation robustness often founders on the physics gap between simplified simulations and the resistance-laden real world. In this work, we emphasize that physical realism in articulated interaction is an important ingredient for robust policy learning. We present Real-IKEA, a dataset and simulation framework designed with physical accuracy as a first-class goal. Real-IKEA provides 1,079 articulated asset configurations, derived from 83 authentic IKEA handles and knobs processed through a meticulous six-step physical workflow. For contact-geometry accuracy, we introduce a bidirectional surface-deviation metric to quantify collision meshes. For dynamics realism, we establish resistance-calibrated configurations that vary damping and friction. Crucially, we demonstrate through a Reinforcement Learning (RL) policy that high-fidelity assets enable the discovery of robust "hooking" and "levering" strategies that prioritize mechanical advantage over fragile friction-pulling. Together, these results position Real-IKEA as a critical benchmark for developing manipulation policies capable of human-level robustness in articulated object tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Real-IKEA builds a careful dataset from real IKEA parts with documented calibration steps, but the headline claim about discovering robust strategies rests on an unreported simulation demo with no hardware transfer.

read the letter

The main thing here is a new collection of 83 real IKEA handles and knobs turned into 1,079 simulation assets through a six-step physical workflow, plus a bidirectional surface-deviation metric for mesh accuracy and resistance-calibrated damping and friction values. That construction process looks more thorough than most sim datasets I've seen for articulated objects.

What stands out is the attention to contact geometry and dynamics parameters drawn from physical measurements. The RL experiment is presented as showing that these assets lead policies toward hooking and levering instead of friction-based pulls, which is a plausible outcome if the simulation is closer to reality.

The soft spot is that none of this is backed by numbers in the abstract: no success rates, no baselines, no error bars, and crucially no real-robot deployment or even torque measurements to check if the calibrated parameters actually match hardware. The title states physical fidelity is a prerequisite for robust manipulation, yet the evidence stays inside simulation. Without a transfer trial, the policy difference could be an artifact of the sim rather than a predictor of real behavior.

This is the kind of work that matters for groups building sim-to-real benchmarks in manipulation. A reader working on articulated object policies would get value from the assets and the workflow description, even if they treat the RL result as preliminary. The citation pattern seems standard for the area.

I would send it to peer review. The dataset effort is solid enough to warrant referee time, but the authors should expect questions on validation and probably need to add at least one hardware experiment or tone down the robustness claim.

Referee Report

2 major / 1 minor

Summary. The paper presents Real-IKEA, a dataset of 1,079 articulated asset configurations derived from 83 IKEA parts via a six-step physical workflow. It introduces a bidirectional surface-deviation metric for collision-mesh accuracy and resistance-calibrated damping/friction values for dynamics. The central claim is that RL policies trained on these high-fidelity assets discover robust 'hooking' and 'levering' strategies that exploit mechanical advantage, unlike fragile friction-pulling approaches, positioning the dataset as a benchmark for human-level robust manipulation.

Significance. If the simulation-to-reality gap is closed, the dataset and workflow would provide a useful benchmark for articulated-object manipulation, with the bidirectional metric and calibrated parameters offering concrete tools for improving contact and dynamics fidelity in simulation.

major comments (2)

[Abstract] Abstract: The claim that high-fidelity assets enable discovery of robust hooking/levering strategies provides no quantitative metrics, baselines, error bars, success rates, or statistical comparisons between high- and low-fidelity conditions; the demonstration is described only qualitatively.
[Abstract] Abstract and title: The assertion that physical fidelity is a prerequisite for robust manipulation (implying real-world applicability) is unsupported because no sim-to-real transfer experiments, real-robot deployments, or hardware torque/force validation of the calibrated parameters are reported; all results remain inside simulation.

minor comments (1)

[Methods] The six-step workflow is outlined but lacks explicit numerical values or validation data for the resistance-calibrated damping and friction parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that high-fidelity assets enable discovery of robust hooking/levering strategies provides no quantitative metrics, baselines, error bars, success rates, or statistical comparisons between high- and low-fidelity conditions; the demonstration is described only qualitatively.

Authors: We agree that the abstract would be strengthened by quantitative support. The manuscript body contains RL experiments demonstrating the effect; we will revise the abstract to report key success rates, baselines, error bars, and statistical comparisons between high- and low-fidelity conditions. revision: yes
Referee: [Abstract] Abstract and title: The assertion that physical fidelity is a prerequisite for robust manipulation (implying real-world applicability) is unsupported because no sim-to-real transfer experiments, real-robot deployments, or hardware torque/force validation of the calibrated parameters are reported; all results remain inside simulation.

Authors: We acknowledge that all experiments are simulation-only and no sim-to-real transfer or hardware validation is provided. The work positions physical fidelity as a prerequisite for discovering robust strategies inside simulation. We will revise the abstract and title to explicitly limit the scope to simulation-based policy learning and to frame Real-IKEA as a benchmark for simulation fidelity rather than claiming direct real-world applicability. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical dataset construction and simulation demonstration

full rationale

The paper presents a six-step workflow for creating 1,079 articulated assets from IKEA parts, a bidirectional surface-deviation metric, resistance-calibrated damping/friction, and RL policy experiments showing strategy differences in simulation. No load-bearing derivation, equation, or prediction is claimed that reduces to its own inputs by construction. No self-citations are invoked as uniqueness theorems or ansatzes. The work is self-contained as an empirical contribution within simulation; the central claim rests on observed policy outcomes rather than tautological fitting or renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that the six-step workflow produces dynamics close enough to reality for policy differences to matter; no free parameters, axioms, or invented entities are explicitly introduced beyond standard simulation parameters.

pith-pipeline@v0.9.1-grok · 5711 in / 1152 out tokens · 14858 ms · 2026-06-27T18:17:10.068146+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 15 canonical work pages · 3 internal anchors

[1]

Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,

K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 909– 918

2019
[2]

Sapien: A simulated part-based interactive environment,

F. Xiang, Y . Qin, K. Mo, Y . Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y . Yuan, H. Wang,et al., “Sapien: A simulated part-based interactive environment,” inProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, 2020, pp. 11 097–11 107

2020
[3]

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills,

J. Gu, F. Xiang, X. Li, Z. Ling, X. Liu, T. Mu, Y . Tang, S. Tao, X. Wei, Y . Yao, X. Yuan, P. Xie, Z. Huang, R. Chen, and H. Su, “ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills,” in International Conference on Learning Representations (ICLR), 2023

2023
[4]

Unidoormanip: Learning universal door manipulation policy over large-scale and diverse door manipulation environments,

Y . Li, X. Zhang, R. Wu, Z. Zhang, Y . Geng, H. Dong, and Z. He, “Unidoormanip: Learning universal door manipulation policy over large-scale and diverse door manipulation environments,”arXiv preprint arXiv:2403.02604, 2024

work page arXiv 2024
[5]

Articubot: Learning universal articulated object manipulation policy via large scale simulation

Y . Wang, Z. Wang, M. Nakura, P. Bhowal, C.-L. Kuo, Y .-T. Chen, Z. Erickson, and D. Held, “Articubot: Learning universal articulated object manipulation policy via large scale simulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03045

work page arXiv 2025
[6]

Dex1b: Learning with 1b demonstrations for dexterous manipulation,

J. Ye, K. Wang, C. Yuan, R. Yang, Y . Li, J. Zhu, Y . Qin, X. Zou, and X. Wang, “Dex1b: Learning with 1b demonstrations for dexterous manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2506.17198

work page arXiv 2025
[7]

Learning to walk in minutes using massively parallel deep reinforcement learning,

N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” 2022. [Online]. Available: https://arxiv.org/abs/2109.11978

work page arXiv 2022
[8]

Humanoidbench: Simulated humanoid benchmark for whole- body locomotion and manipulation,

C. Sferrazza, D.-M. Huang, X. Lin, Y . Lee, and P. Abbeel, “Humanoidbench: Simulated humanoid benchmark for whole- body locomotion and manipulation,” 2024. [Online]. Available: https://arxiv.org/abs/2403.10506

work page arXiv 2024
[9]

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” 2018. [Online]. Available: https://arxiv.org/abs/1804.10332

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Benchmarking the sim-to-real gap in cloth manipulation,

D. Blanco-Mulero, O. Barbany, G. Alcan, A. Colom ´e, C. Torras, and V . Kyrki, “Benchmarking the sim-to-real gap in cloth manipulation,”
[11]

Available: https://arxiv.org/abs/2310.09543

[Online]. Available: https://arxiv.org/abs/2310.09543

work page arXiv
[12]

Sim2realviz: Visualizing the sim2real gap in robot ego-pose estimation,

T. Jaunet, G. Bono, R. Vuillemot, and C. Wolf, “Sim2realviz: Visualizing the sim2real gap in robot ego-pose estimation,” 2021. [Online]. Available: https://arxiv.org/abs/2109.11801

work page arXiv 2021
[13]

Bridging the sim2real gap: Vision encoder pre-training for visuomotor policy transfer,

Y . Yardi, S. Biruduganti, and L. Ankile, “Bridging the sim2real gap: Vision encoder pre-training for visuomotor policy transfer,” 2025. [Online]. Available: https://arxiv.org/abs/2501.16389

work page arXiv 2025
[14]

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

C. Li, R. Zhang, J. Wong, C. Gokmen, S. Srivastava, R. Mart ´ın-Mart´ın, C. Wang, G. Levine, W. Ai, B. Martinez, H. Yin, M. Lingelbach, M. Hwang, A. Hiranaka, S. Garlanka, A. Aydin, S. Lee, J. Sun, M. Anvari, M. Sharma, D. Bansal, S. Hunter, K.-Y . Kim, A. Lou, C. R. Matthews, I. Villa-Renteria, J. H. Tang, C. Tang, F. Xia, Y . Li, S. Savarese, H. Gweon, ...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. Vainio, Z. Lian, C. Gokmen, S. Buch, C. K. Liu, S. Savarese, H. Gweon, J. Wu, and L. Fei-Fei, “Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,” 2021. [Online]. Available: https://arxiv.org/abs/2108.03332

work page arXiv 2021
[16]

Adamanip: Adaptive articulated object manipulation environments and policy learning,

Y . Wang, X. Zhang, R. Wu, Y . Li, Y . Shen, M. Wu, Z. He, Y . Wang, and H. Dong, “Adamanip: Adaptive articulated object manipulation environments and policy learning,” 2025. [Online]. Available: https://arxiv.org/abs/2502.11124

work page arXiv 2025
[17]

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI,

S. Tao, F. Xiang, A. Shukla, Y . Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y . Liu, T.-K. Chan, Y . Gao, X. Li, T. Mu, N. Xiao, A. Gurha, V . N, Y . Choi, Y .-R. Chen, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI,” inRSS 2025, 2025

2025
[18]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033

2012
[19]

Isaac gym: High performance gpu-based physics simulation for robot learning,

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Mack- lin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac gym: High performance gpu-based physics simulation for robot learning,” 2021

2021
[20]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

2023
[21]

Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search,

X. Wei, M. Liu, Z. Ling, and H. Su, “Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search,”ACM Transactions on Graphics, vol. 41, no. 4, p. 1–18, July
[22]

Available: http://dx.doi.org/10.1145/3528223.3530103

[Online]. Available: http://dx.doi.org/10.1145/3528223.3530103

work page doi:10.1145/3528223.3530103
[23]

Graspgen: A diffusion-based framework for 6-dof grasping with on- generator training,

A. Murali, B. Sundaralingam, Y .-W. Chao, J. Yamada, W. Yuan, M. Carlson, F. Ramos, S. Birchfield, D. Fox, and C. Eppner, “Graspgen: A diffusion-based framework for 6-dof grasping with on- generator training,”arXiv preprint arXiv:2507.13097, 2025. [Online]. Available: https://arxiv.org/abs/2507.13097

work page arXiv 2025
[24]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347 APPENDIX A. Real-IKEA Dataset Construction and Physical Modeling Details While the main text focuses on the emergence of robust manipulation policies, this appendix provides extended de- tai...

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

The Four Characteristic Failure Modes in Manipu- lation:The primary motivation for developing Real-IKEA stems from observing how conventional, friction-dominated policies fail when deployed on real articulated objects. As illustrated in Figure 8, these failures typically manifest in four characteristic modes: i)Slip:When joint resistance exceeds the frict...
[26]

swelling

Dataset Construction Workflow:Unlike existing datasets that often rely on synthetically generated or sim- plified CAD models [2], all interactive assets in Real-IKEA are derived from authentic IKEA products. Constructing a single ready-to-use, high-fidelity articulated asset requires a meticuloussix-step processing workflow: i)Component Segmentation and F...
[27]

Interactive Component Diversity:Leveraging IKEA’s modular design philosophy, we combinatorially paired base cabinet units with a curated library of 83 authentic handles and knobs. As shown in Figure 7, we categorize these inter- active parts into three main types based on their geometric affordances: •Knobs:Require wrist rotation to wedge the gripper agai...
[28]

The metricH Q→P represents the outward deviation from the collision shell to the visual shell

Extended Collision Accuracy Visualization:To visually confirm the quality of our COACD processing (Step 4 of our workflow), we provide a heatmap visualization in Figure 11. The metricH Q→P represents the outward deviation from the collision shell to the visual shell. The results show that the collision models for knobs are highly precise. For complex hand...
[29]

A control decimation ofN dec =50 yields a policy command interval ∆t=N dec ∆tsim =0.1s,(1) corresponding to a 10Hz low-level command rate

Simulation and Control Rate:The simulator integrates rigid-body dynamics with a fixed physics timestep∆t sim = 0.002s. A control decimation ofN dec =50 yields a policy command interval ∆t=N dec ∆tsim =0.1s,(1) corresponding to a 10Hz low-level command rate. Episode duration is bounded by a maximum horizon (fixed wall-clock length in simulation time), afte...
[30]

Privileged State:The Policy observes low-dimensional features sufficient to specify the manipulation geometry: an end-effector–centric pose summary, the handle position in world coordinates, the drawer joint displacement, a gripper aperture–based closure indicator, the previous action (for temporal context), and a four-dimensional stage indicator aligned ...
[31]

Four dimensions specify an incremental end-effector motion relative to the current configuration; one dimension commands the parallel gripper

Action Parameterization and Kinematics:The policy outputs a five-dimensional continuous action each control step, clipped to[−1,1]before execution. Four dimensions specify an incremental end-effector motion relative to the current configuration; one dimension commands the parallel gripper. Let(a 0,a 1,a 2,a 3)denote the clipped arm command. These are scal...
[32]

stay open

Staged Reward: Geometry, Gates, and Monotonic Baselines:Rewards are decomposed into four stages that encourage a coarse ordering: approach a pre-grasp region, refine approach to the handle neighborhood, adopt a grasp- ready closure, then open the drawer by increasing the slide displacement. Letcbe the midpoint between the fingertips andhthe handle positio...
[33]

Episodes may also terminate on timeout, prolonged kinematic infeasibility, or stagnation under commanded mo- tion (task-specific definitions)

Auxiliary Terms and Terminations:A small penalty discourages jerk; inverse-kinematics failures incur a per-step cost. Episodes may also terminate on timeout, prolonged kinematic infeasibility, or stagnation under commanded mo- tion (task-specific definitions). Fig. 11: Heatmap visualization of physical interaction fidelity (HQ→P) for Real-IKEA components....
[34]

First, the handle and robot base poses are fixed at reset to reduce variance while acquir- ing a feasible skill

Domain Randomization and Two-Phase Training: Training proceeds in two phases. First, the handle and robot base poses are fixed at reset to reduce variance while acquir- ing a feasible skill. Second, per-episode pose perturbations are applied to the handle and base within bounded ranges to improve robustness. Specifically, the switch to the second phase oc...
[35]

To select the most robust Policy for deployment and downstream distillation, we systematically evaluated intermediate checkpoints saved during the training process

Policy Selection and Robustness Evaluation:Due to the highly dynamic nature of reinforcement learning and the aggressive domain randomization applied during the second phase of training, the policy’s performance can fluctuate between iterations. To select the most robust Policy for deployment and downstream distillation, we systematically evaluated interm...

[1] [1]

Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,

K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 909– 918

2019

[2] [2]

Sapien: A simulated part-based interactive environment,

F. Xiang, Y . Qin, K. Mo, Y . Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y . Yuan, H. Wang,et al., “Sapien: A simulated part-based interactive environment,” inProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, 2020, pp. 11 097–11 107

2020

[3] [3]

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills,

J. Gu, F. Xiang, X. Li, Z. Ling, X. Liu, T. Mu, Y . Tang, S. Tao, X. Wei, Y . Yao, X. Yuan, P. Xie, Z. Huang, R. Chen, and H. Su, “ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills,” in International Conference on Learning Representations (ICLR), 2023

2023

[4] [4]

Unidoormanip: Learning universal door manipulation policy over large-scale and diverse door manipulation environments,

Y . Li, X. Zhang, R. Wu, Z. Zhang, Y . Geng, H. Dong, and Z. He, “Unidoormanip: Learning universal door manipulation policy over large-scale and diverse door manipulation environments,”arXiv preprint arXiv:2403.02604, 2024

work page arXiv 2024

[5] [5]

Articubot: Learning universal articulated object manipulation policy via large scale simulation

Y . Wang, Z. Wang, M. Nakura, P. Bhowal, C.-L. Kuo, Y .-T. Chen, Z. Erickson, and D. Held, “Articubot: Learning universal articulated object manipulation policy via large scale simulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03045

work page arXiv 2025

[6] [6]

Dex1b: Learning with 1b demonstrations for dexterous manipulation,

J. Ye, K. Wang, C. Yuan, R. Yang, Y . Li, J. Zhu, Y . Qin, X. Zou, and X. Wang, “Dex1b: Learning with 1b demonstrations for dexterous manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2506.17198

work page arXiv 2025

[7] [7]

Learning to walk in minutes using massively parallel deep reinforcement learning,

N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” 2022. [Online]. Available: https://arxiv.org/abs/2109.11978

work page arXiv 2022

[8] [8]

Humanoidbench: Simulated humanoid benchmark for whole- body locomotion and manipulation,

C. Sferrazza, D.-M. Huang, X. Lin, Y . Lee, and P. Abbeel, “Humanoidbench: Simulated humanoid benchmark for whole- body locomotion and manipulation,” 2024. [Online]. Available: https://arxiv.org/abs/2403.10506

work page arXiv 2024

[9] [9]

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” 2018. [Online]. Available: https://arxiv.org/abs/1804.10332

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Benchmarking the sim-to-real gap in cloth manipulation,

D. Blanco-Mulero, O. Barbany, G. Alcan, A. Colom ´e, C. Torras, and V . Kyrki, “Benchmarking the sim-to-real gap in cloth manipulation,”

[11] [11]

Available: https://arxiv.org/abs/2310.09543

[Online]. Available: https://arxiv.org/abs/2310.09543

work page arXiv

[12] [12]

Sim2realviz: Visualizing the sim2real gap in robot ego-pose estimation,

T. Jaunet, G. Bono, R. Vuillemot, and C. Wolf, “Sim2realviz: Visualizing the sim2real gap in robot ego-pose estimation,” 2021. [Online]. Available: https://arxiv.org/abs/2109.11801

work page arXiv 2021

[13] [13]

Bridging the sim2real gap: Vision encoder pre-training for visuomotor policy transfer,

Y . Yardi, S. Biruduganti, and L. Ankile, “Bridging the sim2real gap: Vision encoder pre-training for visuomotor policy transfer,” 2025. [Online]. Available: https://arxiv.org/abs/2501.16389

work page arXiv 2025

[14] [14]

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

C. Li, R. Zhang, J. Wong, C. Gokmen, S. Srivastava, R. Mart ´ın-Mart´ın, C. Wang, G. Levine, W. Ai, B. Martinez, H. Yin, M. Lingelbach, M. Hwang, A. Hiranaka, S. Garlanka, A. Aydin, S. Lee, J. Sun, M. Anvari, M. Sharma, D. Bansal, S. Hunter, K.-Y . Kim, A. Lou, C. R. Matthews, I. Villa-Renteria, J. H. Tang, C. Tang, F. Xia, Y . Li, S. Savarese, H. Gweon, ...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. Vainio, Z. Lian, C. Gokmen, S. Buch, C. K. Liu, S. Savarese, H. Gweon, J. Wu, and L. Fei-Fei, “Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,” 2021. [Online]. Available: https://arxiv.org/abs/2108.03332

work page arXiv 2021

[16] [16]

Adamanip: Adaptive articulated object manipulation environments and policy learning,

Y . Wang, X. Zhang, R. Wu, Y . Li, Y . Shen, M. Wu, Z. He, Y . Wang, and H. Dong, “Adamanip: Adaptive articulated object manipulation environments and policy learning,” 2025. [Online]. Available: https://arxiv.org/abs/2502.11124

work page arXiv 2025

[17] [17]

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI,

S. Tao, F. Xiang, A. Shukla, Y . Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y . Liu, T.-K. Chan, Y . Gao, X. Li, T. Mu, N. Xiao, A. Gurha, V . N, Y . Choi, Y .-R. Chen, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI,” inRSS 2025, 2025

2025

[18] [18]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033

2012

[19] [19]

Isaac gym: High performance gpu-based physics simulation for robot learning,

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Mack- lin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac gym: High performance gpu-based physics simulation for robot learning,” 2021

2021

[20] [20]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

2023

[21] [21]

Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search,

X. Wei, M. Liu, Z. Ling, and H. Su, “Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search,”ACM Transactions on Graphics, vol. 41, no. 4, p. 1–18, July

[22] [22]

Available: http://dx.doi.org/10.1145/3528223.3530103

[Online]. Available: http://dx.doi.org/10.1145/3528223.3530103

work page doi:10.1145/3528223.3530103

[23] [23]

Graspgen: A diffusion-based framework for 6-dof grasping with on- generator training,

A. Murali, B. Sundaralingam, Y .-W. Chao, J. Yamada, W. Yuan, M. Carlson, F. Ramos, S. Birchfield, D. Fox, and C. Eppner, “Graspgen: A diffusion-based framework for 6-dof grasping with on- generator training,”arXiv preprint arXiv:2507.13097, 2025. [Online]. Available: https://arxiv.org/abs/2507.13097

work page arXiv 2025

[24] [24]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347 APPENDIX A. Real-IKEA Dataset Construction and Physical Modeling Details While the main text focuses on the emergence of robust manipulation policies, this appendix provides extended de- tai...

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

The Four Characteristic Failure Modes in Manipu- lation:The primary motivation for developing Real-IKEA stems from observing how conventional, friction-dominated policies fail when deployed on real articulated objects. As illustrated in Figure 8, these failures typically manifest in four characteristic modes: i)Slip:When joint resistance exceeds the frict...

[26] [26]

swelling

Dataset Construction Workflow:Unlike existing datasets that often rely on synthetically generated or sim- plified CAD models [2], all interactive assets in Real-IKEA are derived from authentic IKEA products. Constructing a single ready-to-use, high-fidelity articulated asset requires a meticuloussix-step processing workflow: i)Component Segmentation and F...

[27] [27]

Interactive Component Diversity:Leveraging IKEA’s modular design philosophy, we combinatorially paired base cabinet units with a curated library of 83 authentic handles and knobs. As shown in Figure 7, we categorize these inter- active parts into three main types based on their geometric affordances: •Knobs:Require wrist rotation to wedge the gripper agai...

[28] [28]

The metricH Q→P represents the outward deviation from the collision shell to the visual shell

Extended Collision Accuracy Visualization:To visually confirm the quality of our COACD processing (Step 4 of our workflow), we provide a heatmap visualization in Figure 11. The metricH Q→P represents the outward deviation from the collision shell to the visual shell. The results show that the collision models for knobs are highly precise. For complex hand...

[29] [29]

A control decimation ofN dec =50 yields a policy command interval ∆t=N dec ∆tsim =0.1s,(1) corresponding to a 10Hz low-level command rate

Simulation and Control Rate:The simulator integrates rigid-body dynamics with a fixed physics timestep∆t sim = 0.002s. A control decimation ofN dec =50 yields a policy command interval ∆t=N dec ∆tsim =0.1s,(1) corresponding to a 10Hz low-level command rate. Episode duration is bounded by a maximum horizon (fixed wall-clock length in simulation time), afte...

[30] [30]

Privileged State:The Policy observes low-dimensional features sufficient to specify the manipulation geometry: an end-effector–centric pose summary, the handle position in world coordinates, the drawer joint displacement, a gripper aperture–based closure indicator, the previous action (for temporal context), and a four-dimensional stage indicator aligned ...

[31] [31]

Four dimensions specify an incremental end-effector motion relative to the current configuration; one dimension commands the parallel gripper

Action Parameterization and Kinematics:The policy outputs a five-dimensional continuous action each control step, clipped to[−1,1]before execution. Four dimensions specify an incremental end-effector motion relative to the current configuration; one dimension commands the parallel gripper. Let(a 0,a 1,a 2,a 3)denote the clipped arm command. These are scal...

[32] [32]

stay open

Staged Reward: Geometry, Gates, and Monotonic Baselines:Rewards are decomposed into four stages that encourage a coarse ordering: approach a pre-grasp region, refine approach to the handle neighborhood, adopt a grasp- ready closure, then open the drawer by increasing the slide displacement. Letcbe the midpoint between the fingertips andhthe handle positio...

[33] [33]

Episodes may also terminate on timeout, prolonged kinematic infeasibility, or stagnation under commanded mo- tion (task-specific definitions)

Auxiliary Terms and Terminations:A small penalty discourages jerk; inverse-kinematics failures incur a per-step cost. Episodes may also terminate on timeout, prolonged kinematic infeasibility, or stagnation under commanded mo- tion (task-specific definitions). Fig. 11: Heatmap visualization of physical interaction fidelity (HQ→P) for Real-IKEA components....

[34] [34]

First, the handle and robot base poses are fixed at reset to reduce variance while acquir- ing a feasible skill

Domain Randomization and Two-Phase Training: Training proceeds in two phases. First, the handle and robot base poses are fixed at reset to reduce variance while acquir- ing a feasible skill. Second, per-episode pose perturbations are applied to the handle and base within bounded ranges to improve robustness. Specifically, the switch to the second phase oc...

[35] [35]

To select the most robust Policy for deployment and downstream distillation, we systematically evaluated intermediate checkpoints saved during the training process

Policy Selection and Robustness Evaluation:Due to the highly dynamic nature of reinforcement learning and the aggressive domain randomization applied during the second phase of training, the policy’s performance can fluctuate between iterations. To select the most robust Policy for deployment and downstream distillation, we systematically evaluated interm...