pith. sign in

arxiv: 2009.12293 · v3 · submitted 2020-09-25 · 💻 cs.RO · cs.AI· cs.LG

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Pith reviewed 2026-05-12 22:31 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords robot learningsimulation frameworkMuJoCobenchmark environmentsmodular designreproducible researchrobotic tasks
0
0 comments X

The pith

robosuite is a modular simulation framework powered by MuJoCo that supplies benchmark environments for reproducible robot learning research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents robosuite as a simulation framework for robot learning built on the MuJoCo physics engine. It emphasizes a modular design that lets users assemble and customize robotic tasks from reusable components. The release also includes a collection of standard benchmark environments intended to make experimental results comparable across different research groups. A reader would care because robot learning experiments often rely on bespoke simulation setups that prevent direct comparisons and slow collective progress.

Core claim

The authors establish that robosuite v1.5 delivers key system modules supporting modular task creation alongside a suite of benchmark environments, enabling researchers to define custom robotic tasks and run reproducible learning experiments without rebuilding simulation infrastructure from scratch.

What carries the argument

The modular system modules for assembling robotic tasks, combined with the provided suite of benchmark environments.

If this is right

  • Researchers can compose new robotic tasks by combining existing modules instead of starting from zero.
  • Standard benchmark environments allow direct side-by-side comparison of different learning algorithms.
  • Reproducible simulation setups reduce the time spent on infrastructure and increase time available for algorithm development.
  • Consistent environments support cumulative progress because results from one paper can be verified or extended by others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use could reduce duplication of effort across labs by providing a shared simulation base.
  • The same modular structure might later support easier sim-to-real transfer once real-robot interfaces are added.
  • Benchmark results could serve as a common reference point for comparing learning methods that currently rely on private environments.

Load-bearing premise

That researchers will adopt the modular architecture and benchmark environments without needing to write substantial additional custom code for their own tasks.

What would settle it

A survey or usage study in which most researchers report that they must still implement large amounts of custom simulation code to match their experimental needs, or in which benchmark results prove difficult to reproduce across independent implementations.

read the original abstract

robosuite is a simulation framework for robot learning powered by the MuJoCo physics engine. It offers a modular design for creating robotic tasks as well as a suite of benchmark environments for reproducible research. This paper discusses the key system modules and the benchmark environments of our new release robosuite v1.5.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces robosuite v1.5, a modular simulation framework for robot learning powered by the MuJoCo physics engine. It describes the key system modules for task creation and presents a suite of benchmark environments intended to support reproducible research in the field.

Significance. If the described modular architecture and benchmarks function as outlined, the framework could provide a standardized platform that reduces the need for custom simulation code, thereby improving reproducibility across robot learning studies. The release of an open tool with explicit benchmark support is a practical contribution to the community.

minor comments (3)
  1. [Abstract] Abstract: the claim that the framework offers 'a suite of benchmark environments for reproducible research' would be strengthened by briefly noting the specific tasks included (e.g., manipulation, locomotion) and any quantitative validation of their stability or fidelity.
  2. The manuscript should include a dedicated section or table comparing robosuite v1.5 features against prior versions or alternative simulators (e.g., PyBullet, Gazebo) to clarify incremental advances.
  3. Ensure that all module descriptions cite the corresponding source files or API references so readers can directly inspect the implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our work on robosuite v1.5. The recommendation for minor revision is noted. As no specific major comments were provided in the report, we have no substantive points to address and believe the manuscript requires no technical revisions.

Circularity Check

0 steps flagged

No circularity: purely descriptive software framework paper

full rationale

The manuscript is a software release note for robosuite v1.5. It describes the modular architecture, MuJoCo integration, task-creation utilities, and benchmark environments without any derivations, equations, fitted parameters, predictions, or uniqueness theorems. No load-bearing self-citations or ansatzes appear; the central claim is simply that the described interfaces exist and are exposed. This is self-contained descriptive documentation rather than a chain of inferences that could reduce to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software framework description paper containing no mathematical derivations, fitted parameters, or postulated entities.

pith-pipeline@v0.9.0 · 5375 in / 1044 out tokens · 46507 ms · 2026-05-12T22:31:41.630161+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning

    cs.RO 2026-05 accept novelty 8.0

    TAVIS is a released benchmark showing active vision improves imitation learning in a task-dependent manner, multi-task policies struggle with shifts, and imitation produces human-like anticipatory gaze.

  2. RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

    cs.RO 2026-04 unverdicted novelty 8.0

    RoboLab is a new simulation benchmark with 120 tasks across visual, procedural, and relational axes that quantifies generalization gaps and perturbation sensitivity in task-generalist robotic policies.

  3. BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

    cs.RO 2024-03 accept novelty 8.0

    BEHAVIOR-1K introduces a benchmark of 1,000 human everyday activities in realistic simulated scenes together with the OMNIGIBSON physics simulator to evaluate embodied AI.

  4. LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

    cs.AI 2023-06 conditional novelty 8.0

    LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

  5. CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

    cs.CV 2026-05 unverdicted novelty 7.0

    Capability vectors extracted from parameter differences between standard and auxiliary-finetuned VLA models can be merged into pretrained weights to match auxiliary-training performance while reducing computational ov...

  6. CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation

    cs.RO 2026-05 unverdicted novelty 7.0

    CoRAL lets LLMs act as adaptive cost designers for motion planners while using VLM priors and online identification to handle unknown physics, achieving over 50% higher success rates than baselines in unseen contact-r...

  7. Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

    cs.RO 2026-04 unverdicted novelty 7.0

    A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...

  8. HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness

    cs.RO 2026-04 unverdicted novelty 7.0

    HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.

  9. Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

    cs.RO 2026-04 unverdicted novelty 7.0

    ACO-MoE employs agent-centric mixture-of-experts to decouple task-relevant features from dynamic visual perturbations in RL, recovering 95.3% of clean performance on the new VDCS benchmark.

  10. Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

    cs.RO 2026-04 unverdicted novelty 7.0

    ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.

  11. BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

    cs.RO 2026-04 conditional novelty 7.0

    BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.

  12. Towards Generalizable Robotic Manipulation in Dynamic Environments

    cs.CV 2026-03 unverdicted novelty 7.0

    DOMINO dataset and PUMA architecture enable better dynamic robotic manipulation by incorporating motion history, delivering 6.3% higher success rates than prior VLA models.

  13. ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

    cs.RO 2026-02 unverdicted novelty 7.0

    ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.

  14. MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies

    cs.RO 2025-09 unverdicted novelty 7.0

    MIMIC-D enables multi-modal multi-agent coordination via joint training of decentralized diffusion policies using only local information.

  15. Voyager: An Open-Ended Embodied Agent with Large Language Models

    cs.AI 2023-05 unverdicted novelty 7.0

    Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more uniq...

  16. Behavior-Consistent Deep Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.

  17. Behavior-Consistent Deep Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    QED sets state-dependent temperature proportional to double-critic disagreement to bound pairwise KL divergence between Boltzmann policies, cutting cross-run divergence by two orders of magnitude on 18 continuous-cont...

  18. Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning

    cs.RO 2026-05 unverdicted novelty 6.0

    ZPRL adapts frozen flow-matching imitation policies via RL perturbations on a task-relevant bottleneck latent, yielding 33.7% higher average success on four real-world manipulation tasks than action-residual baselines.

  19. COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

    cs.RO 2026-05 conditional novelty 6.0

    COBALT enables scalable crowdsourced teleoperation of robots using smartphones, supporting concurrent users with low latency and yielding a 7500+ demonstration dataset validated on imitation learning tasks.

  20. COBALT: Crowdsourcing Robot Learning via Cloud-Based Teleoperation with Smartphones

    cs.RO 2026-05 unverdicted novelty 6.0

    COBALT provides scalable cloud infrastructure for crowdsourced robot teleoperation via smartphones, supporting concurrent users with low latency and enabling collection of a 7500+ demonstration dataset validated throu...

  21. DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

    cs.RO 2026-05 conditional novelty 6.0

    DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.

  22. Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

    cs.LG 2026-05 unverdicted novelty 6.0

    Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.

  23. HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions

    cs.RO 2026-05 unverdicted novelty 6.0

    A task-conditioned two-stage system decouples grasp localization from interaction trajectory planning using specialized foundation models to improve generalization across heterogeneous object types.

  24. HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions

    cs.RO 2026-05 unverdicted novelty 6.0

    HeteroGenManip decouples grasp localization from interaction planning using task-conditioned foundation models and multi-model diffusion policies, delivering 31% average gains in broad simulation tasks and 36.7% in fo...

  25. Kintsugi: Learning Policies by Repairing Executable Knowledge Bases

    cs.LG 2026-05 unverdicted novelty 6.0

    Kintsugi learns policies by repairing composable executable knowledge bases through agentic diagnosis, localized typed edits, and deterministic verification gates that admit only improvements.

  26. BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation

    cs.RO 2026-05 unverdicted novelty 6.0

    BEACON uses discrepancy-aware importance reweighting to co-train generative robot policies from abundant source and limited target demonstrations, yielding better robustness and implicit feature alignment.

  27. BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation

    cs.RO 2026-05 unverdicted novelty 6.0

    BEACON uses discrepancy-aware importance reweighting to jointly train diffusion-based robot policies and source sample weights, improving performance over target-only and fixed-ratio baselines in cross-domain manipula...

  28. How to Utilize Failure Demo Data?: Effective Data Selection for Imitation Learning Using Distribution Differences in Attention Mechanism

    cs.RO 2026-05 unverdicted novelty 6.0

    The method uses attention discrepancy metrics on latent success-failure representations to select beneficial failure data for imitation learning, raising task success rates in simulations.

  29. Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

    cs.RO 2026-04 unverdicted novelty 6.0

    Empirical study on robosuite tasks reveals a dominant-skill effect in compositions and shows that an atomic probe approximates full revalidation for skill updates at much lower cost.

  30. GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

    cs.RO 2026-04 unverdicted novelty 6.0

    GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...

  31. Visual-Tactile Peg-in-Hole Assembly Learning from Peg-out-of-Hole Disassembly

    cs.RO 2026-04 unverdicted novelty 6.0

    A visual-tactile RL method learns peg-in-hole assembly from reversed peg-out-of-hole disassembly trajectories, reaching 87.5% success on seen objects and 77.1% on unseen objects while lowering contact forces.

  32. A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

    cs.RO 2026-04 unverdicted novelty 6.0

    Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.

  33. RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

    cs.RO 2026-04 unverdicted novelty 6.0

    RoboLab is a photorealistic simulation benchmark with 120 tasks and perturbation analysis to evaluate true generalization and robustness of robotic foundation models.

  34. Learning Without Losing Identity: Capability Evolution for Embodied Agents

    cs.RO 2026-04 unverdicted novelty 6.0

    Embodied agents maintain a persistent identity while evolving capabilities via modular ECMs, raising simulated task success from 32.4% to 91.3% over 20 iterations with zero policy drift or safety violations.

  35. Learning Without Losing Identity: Capability Evolution for Embodied Agents

    cs.RO 2026-04 unverdicted novelty 6.0

    Embodied agents maintain persistent identity while evolving modular capabilities through a closed-loop process, raising simulated task success from 32.4% to 91.3% with zero policy drift.

  36. RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

    cs.RO 2026-03 unverdicted novelty 6.0

    RoboMME is a new benchmark with 16 tasks and 14 memory-augmented VLA variants that shows memory effectiveness is highly task-dependent.

  37. Unify Robot Actions in Camera Frame

    cs.RO 2025-11 conditional novelty 6.0

    CalibAll estimates camera extrinsics on existing datasets to convert robot actions into a unified camera-frame representation, enabling stronger cross-embodiment pretraining.

  38. RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

    cs.RO 2025-07 unverdicted novelty 6.0

    RoboEval is a new benchmark providing eight bimanual tasks, thousands of expert demonstrations, and standardized metrics for efficiency, coordination, safety, and failure localization in robotic manipulation.

  39. RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    cs.RO 2025-06 unverdicted novelty 6.0

    RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.

  40. From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback

    cs.RO 2025-02 unverdicted novelty 6.0

    CLIC uses set-valued action targets from interactive human corrections instead of pointwise labels to train more robust imitation learning policies.

  41. RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields

    cs.RO 2024-12 unverdicted novelty 6.0

    A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.

  42. RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

    cs.RO 2024-06 unverdicted novelty 6.0

    RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.

  43. Evaluating Real-World Robot Manipulation Policies in Simulation

    cs.RO 2024-05 conditional novelty 6.0

    SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.

  44. 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

    cs.RO 2024-03 unverdicted novelty 6.0

    DP3 uses compact 3D representations from sparse point clouds inside diffusion policies to learn generalizable visuomotor skills from few demonstrations, reporting 24% gains in simulation and 85% success on real robots.

  45. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

    cs.RO 2021-08 accept novelty 6.0

    A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all d...

  46. ComPose: When to Trust Hands for Object Pose Tracking

    cs.CV 2026-05 unverdicted novelty 5.0

    ComPose tracks object poses in hand-occluded RGB videos by adaptively fusing cues from object and hand foundation models, selecting informative joints, and enforcing temporal consistency without external smoothing.

  47. stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation

    cs.LG 2026-05 unverdicted novelty 5.0

    The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.

  48. OrbiSim: World Models as Differentiable Physics Engines for Embodied Intelligence

    cs.RO 2026-05 unverdicted novelty 5.0

    OrbiSim builds a differentiable physics engine from world models to support gradient-based policy optimization and contact modeling in robotics.

  49. Nautilus: From One Prompt to Plug-and-Play Robot Learning

    cs.RO 2026-05 unverdicted novelty 5.0

    NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.

  50. How to Utilize Failure Demo Data?: Effective Data Selection for Imitation Learning Using Distribution Differences in Attention Mechanism

    cs.RO 2026-05 unverdicted novelty 5.0

    A method for imitation learning that learns latent success-failure discrepancy representations in attention and uses an attention-based metric to select beneficial failure demonstrations for improved task performance ...

  51. CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation

    cs.RO 2026-05 unverdicted novelty 5.0

    CoRAL lets LLMs design objective functions for robot motion planners and uses vision-language models plus real-time identification to adapt to unknown physical properties, raising success rates by over 50 percent on n...

  52. E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

    cs.RO 2026-04 unverdicted novelty 5.0

    E²DT couples a Decision Transformer with a k-Determinantal Point Process that scores trajectories on return-to-go quantiles, predictive uncertainty, and stage coverage to improve sample efficiency and policy quality i...

  53. AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

    cs.LG 2026-04 unverdicted novelty 5.0

    AEGIS uses a pre-computed Gaussian anchor and layer-wise Gram-Schmidt orthogonal projections to isolate destructive gradients during VLA fine-tuning, preserving VQA performance without co-training or replay.

  54. EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development

    cs.RO 2026-04 unverdicted novelty 5.0

    EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.

  55. From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments

    cs.AI 2026-03 unverdicted novelty 5.0

    An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.

  56. What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?

    cs.AI 2025-12 unverdicted novelty 5.0

    An empirical study of JEPA world models identifies architecture, training objective, and planning choices that yield a model outperforming DINO-WM and V-JEPA-2-AC on navigation and manipulation tasks.

  57. SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

    cs.RO 2025-11 unverdicted novelty 5.0

    SlotVLA uses slot attention to model object-relation representations for multitask robotic manipulation, reducing visual tokens while achieving competitive generalization on the new LIBERO+ benchmark.

  58. Robust and Resilient Soft Robotic Object Insertion with Compliance-Enabled Contact Formation and Failure Recovery

    cs.RO 2025-09 unverdicted novelty 5.0

    A passively compliant soft wrist structures insertion as sequential contact formations and uses a VLM to recover from failures, reaching 83% success in simulation across randomized grasp, pose, friction, and shape var...

  59. A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

    cs.RO 2025-07 accept novelty 5.0

    Multi-task pretraining of diffusion policies on diverse robot data produces more successful, robust, and data-efficient policies for dexterous manipulation than single-task baselines, with performance scaling with pre...

  60. Unreal Robotics Lab: A High-Fidelity Robotics Simulator with Advanced Physics and Rendering

    cs.RO 2025-04 unverdicted novelty 5.0

    Unreal Robotics Lab integrates Unreal Engine rendering with MuJoCo physics to enable high-fidelity simulation for robotics perception, control, and benchmarking under diverse conditions.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 54 Pith papers · 3 internal anchors

  1. [1]

    OpenAI Gym

    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016

  2. [2]

    CARLA: An Open Urban Driving Simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. arXiv preprint arXiv:1711.03938, 2017

  3. [3]

    Surreal: Open-source 17 reinforcement learning framework and robot manipulation benchmark

    Linxi Fan*, Yuke Zhu*, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, and Li Fei-Fei. Surreal: Open-source 17 reinforcement learning framework and robot manipulation benchmark. In Conference on Robot Learning , 2018

  4. [4]

    Soft Actor-Critic Algorithms and Applications

    Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Se- hoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018

  5. [5]

    Deep reinforcement learning that matters

    Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. In AAAI, 2018

  6. [6]

    Inertial properties in robotic manipulation: An object- level framework

    Oussama Khatib. Inertial properties in robotic manipulation: An object- level framework. The international journal of robotics research , 14(1):19– 36, 1995

  7. [7]

    Reinforcement learning in robotics: A survey

    Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research , 32(11):1238–1274, 2013

  8. [8]

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. AI2-THOR: An interactive 3d environment for visual AI. arXiv preprint arXiv:1712.05474, 2017

  9. [9]

    A review of robot learning for manipulation: Challenges, representations, and algorithms

    Oliver Kroemer, Scott Niekum, and George Konidaris. A review of robot learning for manipulation: Challenges, representations, and algorithms. arXiv preprint arXiv:1907.03146 , 2019

  10. [10]

    Roboturk: A crowdsourcing platform for robotic skill learning through im- itation

    Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, et al. Roboturk: A crowdsourcing platform for robotic skill learning through im- itation. In Conference on Robot Learning , pages 879–893, 2018

  11. [11]

    Variable impedance control in end- effector space: An action space for reinforcement learning in contact-rich tasks

    Roberto Mart´ ın-Mart´ ın, Michelle A Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, and Animesh Garg. Variable impedance control in end- effector space: An action space for reinforcement learning in contact-rich tasks. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 1010–1017. IEEE, 2019

  12. [12]

    Recent advances in robot learning from demonstration

    Harish Ravichandar, Athanasios S Polydoros, Sonia Chernova, and Aude Billard. Recent advances in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems , 3, 2020

  13. [13]

    Reinforcement learning: An in- troduction

    Richard S Sutton and Andrew G Barto. Reinforcement learning: An in- troduction. MIT press, 2018

  14. [14]

    Lillicrap and Nicolas Heess , title =

    Yuval Tassa, Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, and Nico- las Heess. dm control: Software and tasks for continuous control. arXiv preprint arXiv:2006.12983, 2020. 18

  15. [15]

    Mujoco: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In International Conference on Intelligent Robots and Systems , pages 5026–5033, 2012

  16. [16]

    Interactive gibson benchmark: A benchmark for interactive navigation in cluttered environments

    Fei Xia, William B Shen, Chengshu Li, Priya Kasimbeg, Micael Edmond Tchapmi, Alexander Toshev, Roberto Mart´ ın-Mart´ ın, and Silvio Savarese. Interactive gibson benchmark: A benchmark for interactive navigation in cluttered environments. IEEE Robotics and Automation Letters , 5(2):713– 720, 2020

  17. [17]

    Mink: Python inverse kinematics based on MuJoCo, July 2024

    Kevin Zakka. Mink: Python inverse kinematics based on MuJoCo, July 2024

  18. [18]

    Reinforcement and imitation learning for diverse visuomotor skills

    Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, J´ anos Kram´ ar, Raia Hadsell, Nando de Freitas, et al. Reinforcement and imitation learning for diverse visuomotor skills. Robotics: Science and Systems , 2018. 19