pith. machine review for the scientific record. sign in

arxiv: 2310.17596 · v1 · submitted 2023-10-26 · 💻 cs.RO · cs.AI· cs.CV· cs.LG

Recognition: 2 theorem links

· Lean Theorem

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

Authors on Pith no claims yet

Pith reviewed 2026-05-17 09:41 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CVcs.LG
keywords imitation learningrobot learningdata generationhuman demonstrationslong-horizon tasksmanipulationscalable datasets
0
0 comments X

The pith

MimicGen adapts a few hundred human demonstrations into over 50,000 varied examples that train robots for long-horizon tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MimicGen as a system that automatically synthesizes large robot training datasets by taking a small collection of human demonstrations and adapting them to new object placements, scenes, and robot arms. This addresses the bottleneck of expensive and time-consuming data collection for imitation learning, which currently limits scaling to complex behaviors. A sympathetic reader would care because the generated data lets robots learn high-precision, multi-step skills like assembly and coffee preparation across many starting conditions, without needing fresh human demonstrations for every variation. The work shows that imitation learning on the synthetic data reaches strong performance and compares favorably to using additional real demonstrations, pointing to a more practical route for building capable robot agents.

Core claim

MimicGen is a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations by adapting them to new contexts. We use MimicGen to generate over 50K demonstrations across 18 tasks with diverse scene configurations, object instances, and robot arms from just ~200 human demonstrations. We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks, such as multi-part assembly and coffee preparation, across broad initial state distributions.

What carries the argument

The adaptation process in MimicGen that modifies existing human demonstrations to fit new scene configurations, object instances, and robot arms while preserving the underlying task behavior.

If this is right

  • Robots achieve strong performance on long-horizon and high-precision tasks when trained via imitation on the generated demonstrations.
  • Training success holds across broad initial state distributions for tasks such as multi-part assembly and coffee preparation.
  • The generated data compares favorably in effectiveness to collecting additional real human demonstrations.
  • Robot learning becomes more economical because large datasets no longer require proportional increases in human collection effort.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptation idea could be applied in physical robot experiments to test whether the performance gains transfer beyond simulation.
  • Combining MimicGen-style generation with other data sources might further reduce the total number of real demonstrations needed for multi-task training.
  • Similar techniques could shorten iteration cycles when researchers want to explore many environment variations without repeating full human data collection.

Load-bearing premise

Data created by adapting human demonstrations to new contexts trains robots as effectively as fresh human demonstrations collected directly in those same contexts.

What would settle it

A side-by-side test in which imitation learning agents trained on an equal volume of newly collected real human demonstrations in the target contexts outperform agents trained on the MimicGen-generated data would falsify the central claim.

read the original abstract

Imitation learning from a large set of human demonstrations has proved to be an effective paradigm for building capable robot agents. However, the demonstrations can be extremely costly and time-consuming to collect. We introduce MimicGen, a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations by adapting them to new contexts. We use MimicGen to generate over 50K demonstrations across 18 tasks with diverse scene configurations, object instances, and robot arms from just ~200 human demonstrations. We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks, such as multi-part assembly and coffee preparation, across broad initial state distributions. We further demonstrate that the effectiveness and utility of MimicGen data compare favorably to collecting additional human demonstrations, making it a powerful and economical approach towards scaling up robot learning. Datasets, simulation environments, videos, and more at https://mimicgen.github.io .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MimicGen, a system that automatically synthesizes large-scale robot demonstration datasets by adapting a small number (~200) of human demonstrations to new scene configurations, object instances, and robot arms. It generates over 50K demonstrations across 18 tasks and claims that imitation learning agents trained on this data achieve strong performance on long-horizon, high-precision tasks such as multi-part assembly and coffee preparation across broad initial state distributions, with effectiveness and utility that compare favorably to collecting additional human demonstrations.

Significance. If the empirical results hold, the work offers a practical method to scale imitation learning data collection economically, addressing a major bottleneck in robot learning. The scale of generated data, focus on challenging long-horizon tasks, and release of datasets, environments, and videos are strengths that support reproducibility and further research.

major comments (2)
  1. [§4 and §5] §4 (Experiments) and §5 (Results): The claim that MimicGen data compares favorably to additional human demonstrations (and supports strong performance across broad initial states) is load-bearing but requires explicit quantitative evidence that the adaptation process preserves task success. The manuscript should report the success rate of generated trajectories (e.g., fraction of valid, collision-free executions after rigid transformation and sub-task stitching) versus fresh human data in the target contexts; without this, it is unclear whether the imitation learning results reflect high-quality data or are undermined by invalid trajectories.
  2. [§3.2] §3.2 (Adaptation Procedure): The description of context adaptation (object pose changes, new instances, different arms) must detail mechanisms for ensuring kinematic reachability and avoiding collisions or altered contact dynamics. If these are not addressed, the generated dataset may contain a higher fraction of failed executions than real human data, directly affecting the central scalability claim.
minor comments (2)
  1. [Abstract] Abstract: Include at least one key quantitative result (e.g., success rate or performance gap versus baselines) to strengthen the empirical claims made in the opening paragraph.
  2. [Throughout] Notation and figures: Ensure consistent use of terms such as 'context adaptation' and 'sub-task stitching' across text and figures; add a table summarizing the 18 tasks with their horizon lengths and precision requirements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below. We agree that certain clarifications and additions will strengthen the paper and plan to incorporate them in the revision.

read point-by-point responses
  1. Referee: [§4 and §5] §4 (Experiments) and §5 (Results): The claim that MimicGen data compares favorably to additional human demonstrations (and supports strong performance across broad initial states) is load-bearing but requires explicit quantitative evidence that the adaptation process preserves task success. The manuscript should report the success rate of generated trajectories (e.g., fraction of valid, collision-free executions after rigid transformation and sub-task stitching) versus fresh human data in the target contexts; without this, it is unclear whether the imitation learning results reflect high-quality data or are undermined by invalid trajectories.

    Authors: We agree that directly reporting the success rates of the generated trajectories would provide stronger support for the central claims. The manuscript primarily evaluates data utility through downstream imitation learning performance across broad initial state distributions, which serves as an indirect but practical measure of data quality. However, we acknowledge the value of explicit metrics on adaptation validity. In the revised manuscript, we will add quantitative results (e.g., a table in §4 or §5) reporting the fraction of MimicGen trajectories that are valid, collision-free, and task-successful after adaptation, with direct comparisons to human demonstrations collected in the target contexts. This addition will clarify that the reported IL results are based on high-quality data. revision: yes

  2. Referee: [§3.2] §3.2 (Adaptation Procedure): The description of context adaptation (object pose changes, new instances, different arms) must detail mechanisms for ensuring kinematic reachability and avoiding collisions or altered contact dynamics. If these are not addressed, the generated dataset may contain a higher fraction of failed executions than real human data, directly affecting the central scalability claim.

    Authors: We thank the referee for this suggestion to enhance the technical description. Section 3.2 currently focuses on the high-level adaptation pipeline (rigid transformations for poses, sub-task segmentation, and stitching). The procedure incorporates inverse kinematics feasibility checks during pose adaptation for different robot arms and uses the underlying simulator to detect and filter collisions or unreachable configurations before including trajectories in the dataset. Contact dynamics are preserved by maintaining relative end-effector trajectories within each sub-task. We will expand §3.2 with additional details on these mechanisms, including the specific reachability checks and filtering steps, to address the concern directly. revision: yes

Circularity Check

0 steps flagged

MimicGen is a practical data synthesis system with no circular derivations or modeling

full rationale

The paper introduces an engineering system for adapting a small number of human demonstrations to new scene configurations, object instances, and robot arms to synthesize large datasets. No equations, first-principles derivations, fitted parameters, or predictions are described that could reduce to inputs by construction. The central claim rests on empirical results from training imitation learning policies on the generated data, which is externally falsifiable via real-robot or simulator success rates and does not rely on self-citations, uniqueness theorems, or ansatzes from prior author work. This is a standard non-circular systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or parameters are described in the abstract; the work is an applied system for data generation in robotics.

pith-pipeline@v0.9.0 · 5510 in / 1021 out tokens · 58671 ms · 2026-05-17T09:41:01.720309+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy

    cs.RO 2026-05 unverdicted novelty 7.0

    GiB filters erroneous subtasks from mixed-quality human demonstrations using self-supervised latent features and Mahalanobis distance to train more robust imitation learning policies.

  2. DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

    cs.RO 2026-04 unverdicted novelty 7.0

    DockAnywhere lifts single demonstrations to diverse docking points via structure-preserving augmentation and point-cloud spatial editing to improve viewpoint generalization in visuomotor policies for mobile manipulation.

  3. Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

    cs.RO 2026-04 unverdicted novelty 7.0

    ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving hi...

  4. Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy

    cs.RO 2026-05 unverdicted novelty 6.0

    GiB uses self-supervised latent features and Mahalanobis distance to filter erroneous subtasks from mixed-quality human demonstrations, improving robot policy learning in simulation and real-world tasks.

  5. Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.

  6. Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

    cs.RO 2026-04 unverdicted novelty 6.0

    X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.

  7. Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

    cs.RO 2026-04 unverdicted novelty 6.0

    X-WAM unifies real-time robotic action execution with high-fidelity 4D world synthesis by adapting video diffusion priors through lightweight depth branches and asynchronous noise sampling, achieving 79-91% success on...

  8. Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

    cs.RO 2026-04 unverdicted novelty 6.0

    State-of-the-art vision-language-action models catastrophically fail dynamic embodied reasoning due to lexical-kinematic shortcuts, behavioral inertia, and semantic feature collapse caused by architectural bottlenecks...

  9. A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

    cs.RO 2026-04 unverdicted novelty 6.0

    Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.

  10. WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

    cs.RO 2026-04 unverdicted novelty 6.0

    WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match tele...

  11. Generative Simulation for Policy Learning in Physical Human-Robot Interaction

    cs.RO 2026-04 unverdicted novelty 6.0

    A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.

  12. SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

    cs.RO 2026-04 unverdicted novelty 6.0

    SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformabl...

  13. ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

    cs.RO 2026-03 conditional novelty 6.0

    ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

  14. Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

    cs.AI 2026-01 conditional novelty 6.0

    Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.

  15. IGen: Scalable Data Generation for Robot Learning from Open-World Images

    cs.RO 2025-12 unverdicted novelty 6.0

    IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.

  16. $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

    cs.LG 2024-10 unverdicted novelty 6.0

    π₀ is a vision-language-action flow model trained on diverse multi-platform robot data that supports zero-shot task performance, language instruction following, and efficient fine-tuning for dexterous tasks.

  17. RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

    cs.RO 2024-06 unverdicted novelty 6.0

    RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.

  18. EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development

    cs.RO 2026-04 unverdicted novelty 5.0

    EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.

  19. World Action Models: The Next Frontier in Embodied AI

    cs.RO 2026-05 unverdicted novelty 4.0

    The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Reference graph

Works this paper leans on

128 extracted references · 128 canonical work pages · cited by 17 Pith papers · 11 internal anchors

  1. [1]

    Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation

    T. Zhang, Z. McCarthy, O. Jow, D. Lee, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” arXiv preprint arXiv:1710.04615, 2017

  2. [2]

    RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation,

    A. Mandlekar, Y . Zhu, A. Garg, J. Booher, M. Spero, A. Tung, J. Gao, J. Emmons, A. Gupta, E. Orbay, S. Savarese, and L. Fei-Fei, “RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation,” in Conference on Robot Learning, 2018

  3. [3]

    Bc- z: Zero-shot task generalization with robotic imitation learning,

    E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn, “Bc- z: Zero-shot task generalization with robotic imitation learning,” in Conference on Robot Learning. PMLR, 2022, pp. 991–1002

  4. [4]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    M. Ahn, A. Brohan, N. Brown, Y . Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog et al. , “Do as i can, not as i say: Grounding language in robotic affordances,”arXiv preprint arXiv:2204.01691, 2022

  5. [5]

    RT-1: Robotics Transformer for Real-World Control at Scale

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu et al. , “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022

  6. [6]

    Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

    F. Ebert, Y . Yang, K. Schmeckpeper, B. Bucher, G. Georgakis, K. Daniilidis, C. Finn, and S. Levine, “Bridge data: Boosting generalization of robotic skills with cross-domain datasets,” arXiv preprint arXiv:2109.13396, 2021

  7. [7]

    What matters in learning from offline human demonstrations for robot manipulation,

    A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın, “What matters in learning from offline human demonstrations for robot manipulation,” in Conference on Robot Learning (CoRL) , 2021

  8. [8]

    You only demonstrate once: Category-level manipulation from single visual demonstration,

    B. Wen, W. Lian, K. Bekris, and S. Schaal, “You only demonstrate once: Category-level manipulation from single visual demonstration,” in Robotics: Science and Systems (RSS) , 2022

  9. [9]

    Coarse-to-fine imitation learning: Robot manipulation from a single demonstra- tion,

    E. Johns, “Coarse-to-fine imitation learning: Robot manipulation from a single demonstra- tion,” in 2021 IEEE international conference on robotics and automation (ICRA) . IEEE, 2021, pp. 4613–4619

  10. [10]

    Demonstrate once, imitate imme- diately (dome): Learning visual servoing for one-shot imitation learning,

    E. Valassakis, G. Papagiannis, N. Di Palo, and E. Johns, “Demonstrate once, imitate imme- diately (dome): Learning visual servoing for one-shot imitation learning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2022, pp. 8614– 8621

  11. [11]

    Learning multi-stage tasks with one demonstration via self-replay,

    N. Di Palo and E. Johns, “Learning multi-stage tasks with one demonstration via self-replay,” in Conference on Robot Learning. PMLR, 2022, pp. 1180–1189

  12. [12]

    Learning hand-eye coordination for robotic grasping with large-scale data collection,

    S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning hand-eye coordination for robotic grasping with large-scale data collection,” in ISER, 2016, pp. 173–184

  13. [13]

    Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,

    L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,” inRobotics and Automation (ICRA), 2016 IEEE Int’l Conference on. IEEE, 2016

  14. [14]

    QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

    D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke et al. , “Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation,” arXiv preprint arXiv:1806.10293, 2018

  15. [15]

    Mt-opt: Continuous multi-task robotic reinforcement learning at scale,

    D. Kalashnikov, J. Varley, Y . Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman, “Mt-opt: Continuous multi-task robotic reinforcement learning at scale,” arXiv preprint arXiv:2104.08212, 2021. 9

  16. [16]

    More than a million ways to be pushed. a high-fidelity experimental dataset of planar pushing,

    K.-T. Yu, M. Bauza, N. Fazeli, and A. Rodriguez, “More than a million ways to be pushed. a high-fidelity experimental dataset of planar pushing,” in Int’l Conference on Intelligent Robots and Systems, 2016

  17. [17]

    Robonet: Large-scale multi-robot learning,

    S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn, “Robonet: Large-scale multi-robot learning,” arXiv preprint arXiv:1910.11215, 2019

  18. [18]

    Rlbench: The robot learning benchmark & learning environment,

    S. James, Z. Ma, D. R. Arrojo, and A. J. Davison, “Rlbench: The robot learning benchmark & learning environment,”IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3019–3026, 2020

  19. [19]

    Transporter networks: Rearranging the visual world for robotic manipulation,

    A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V . Sindhwani et al. , “Transporter networks: Rearranging the visual world for robotic manipulation,” arXiv preprint arXiv:2010.14406, 2020

  20. [20]

    Vima: General robot manipulation with multimodal prompts,

    Y . Jiang, A. Gupta, Z. Zhang, G. Wang, Y . Dou, Y . Chen, L. Fei-Fei, A. Anandkumar, Y . Zhu, and L. Fan, “Vima: General robot manipulation with multimodal prompts,” arXiv preprint arXiv:2210.03094, 2022

  21. [21]

    Maniskill2: A unified benchmark for generalizable manipulation skills,

    J. Gu, F. Xiang, X. Li, Z. Ling, X. Liu, T. Mu, Y . Tang, S. Tao, X. Wei, Y . Yao et al. , “Maniskill2: A unified benchmark for generalizable manipulation skills,” arXiv preprint arXiv:2302.04659, 2023

  22. [22]

    Imitating task and motion planning with visuomotor transformers,

    M. Dalal, A. Mandlekar, C. Garrett, A. Handa, R. Salakhutdinov, and D. Fox, “Imitating task and motion planning with visuomotor transformers,”arXiv preprint arXiv:2305.16309, 2023

  23. [23]

    Scaling robot supervision to hundreds of hours with roboturk: Robotic manip- ulation dataset through human reasoning and dexterity,

    A. Mandlekar, J. Booher, M. Spero, A. Tung, A. Gupta, Y . Zhu, A. Garg, S. Savarese, and L. Fei-Fei, “Scaling robot supervision to hundreds of hours with roboturk: Robotic manip- ulation dataset through human reasoning and dexterity,” arXiv preprint arXiv:1911.04052 , 2019

  24. [24]

    Human-in-the- loop imitation learning using remote teleoperation,

    A. Mandlekar, D. Xu, R. Mart´ın-Mart´ın, Y . Zhu, L. Fei-Fei, and S. Savarese, “Human-in-the- loop imitation learning using remote teleoperation,” arXiv preprint arXiv:2012.06733, 2020

  25. [25]

    Learning multi-arm manipulation through collaborative teleoperation,

    A. Tung, J. Wong, A. Mandlekar, R. Mart ´ın-Mart´ın, Y . Zhu, L. Fei-Fei, and S. Savarese, “Learning multi-arm manipulation through collaborative teleoperation,” arXiv preprint arXiv:2012.06738, 2020

  26. [26]

    Error-aware imitation learning from teleoperation data for mobile manipulation,

    J. Wong, A. Tung, A. Kurenkov, A. Mandlekar, L. Fei-Fei, S. Savarese, and R. Mart´ın-Mart´ın, “Error-aware imitation learning from teleoperation data for mobile manipulation,” inConfer- ence on Robot Learning . PMLR, 2022, pp. 1367–1378

  27. [27]

    Interactive language: Talking to robots in real time,

    C. Lynch, A. Wahid, J. Tompson, T. Ding, J. Betker, R. Baruch, T. Armstrong, and P. Florence, “Interactive language: Talking to robots in real time,”arXiv preprint arXiv:2210.06407, 2022

  28. [28]

    Alvinn: An autonomous land vehicle in a neural network,

    D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, 1989, pp. 305–313

  29. [29]

    Movement imitation with nonlinear dynamical systems in humanoid robots,

    A. J. Ijspeert, J. Nakanishi, and S. Schaal, “Movement imitation with nonlinear dynamical systems in humanoid robots,” Proceedings 2002 IEEE International Conference on Robotics and Automation, vol. 2, pp. 1398–1403 vol.2, 2002

  30. [30]

    One-shot visual imitation learning via meta-learning,

    C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, “One-shot visual imitation learning via meta-learning,” in Conference on robot learning. PMLR, 2017, pp. 357–368

  31. [31]

    Robot programming by demonstration,

    A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Robot programming by demonstration,” in Springer Handbook of Robotics , 2008

  32. [32]

    Learning and re- production of gestures by imitation,

    S. Calinon, F. D’halluin, E. L. Sauser, D. G. Caldwell, and A. Billard, “Learning and re- production of gestures by imitation,” IEEE Robotics and Automation Magazine , vol. 17, pp. 44–54, 2010

  33. [33]

    Learning to generalize across long-horizon tasks from human demonstrations,

    A. Mandlekar, D. Xu, R. Mart ´ın-Mart´ın, S. Savarese, and L. Fei-Fei, “Learning to generalize across long-horizon tasks from human demonstrations,” arXiv preprint arXiv:2003.06085 , 2020

  34. [34]

    Generalization through hand-eye coordination: An action space for learning spatially-invariant visuomotor control,

    C. Wang, R. Wang, D. Xu, A. Mandlekar, L. Fei-Fei, and S. Savarese, “Generalization through hand-eye coordination: An action space for learning spatially-invariant visuomotor control,” arXiv preprint arXiv:2103.00375, 2021. 10

  35. [35]

    Data augmentation for manipulation,

    P. Mitrano and D. Berenson, “Data augmentation for manipulation,” arXiv preprint arXiv:2205.02886, 2022

  36. [36]

    Reinforcement learning with augmented data,

    M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” arXiv preprint arXiv:2004.14990, 2020

  37. [37]

    Image augmentation is all you need: Regularizing deep reinforcement learning from pixels,

    I. Kostrikov, D. Yarats, and R. Fergus, “Image augmentation is all you need: Regularizing deep reinforcement learning from pixels,” arXiv preprint arXiv:2004.13649, 2020

  38. [38]

    Visual imitation made easy,

    S. Young, D. Gandhi, S. Tulsiani, A. Gupta, P. Abbeel, and L. Pinto, “Visual imitation made easy,”arXiv e-prints, pp. arXiv–2008, 2020

  39. [39]

    A framework for efficient robotic manipulation,

    A. Zhan, P. Zhao, L. Pinto, P. Abbeel, and M. Laskin, “A framework for efficient robotic manipulation,” arXiv preprint arXiv:2012.07975, 2020

  40. [40]

    S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics,

    S. Sinha, A. Mandlekar, and A. Garg, “S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics,” in Conference on Robot Learning . PMLR, 2022, pp. 907–917

  41. [41]

    Counterfactual data augmentation using locally factored dynamics,

    S. Pitis, E. Creager, and A. Garg, “Counterfactual data augmentation using locally factored dynamics,” Advances in Neural Information Processing Systems , vol. 33, pp. 3976–3990, 2020

  42. [42]

    Mocoda: Model-based counterfactual data augmentation,

    S. Pitis, E. Creager, A. Mandlekar, and A. Garg, “Mocoda: Model-based counterfactual data augmentation,” arXiv preprint arXiv:2210.11287, 2022

  43. [43]

    Cacti: A framework for scalable multi-task multi-scene visual imitation learning,

    Z. Mandi, H. Bharadhwaj, V . Moens, S. Song, A. Rajeswaran, and V . Kumar, “Cacti: A framework for scalable multi-task multi-scene visual imitation learning,” arXiv preprint arXiv:2212.05711, 2022

  44. [44]

    Scaling robot learning with semantically imagined experience,

    T. Yu, T. Xiao, A. Stone, J. Tompson, A. Brohan, S. Wang, J. Singh, C. Tan, J. Per- alta, B. Ichter et al., “Scaling robot learning with semantically imagined experience,” arXiv preprint arXiv:2302.11550, 2023

  45. [45]

    Genaug: Retargeting behaviors to unseen situations via generative augmentation,

    Z. Chen, S. Kiami, A. Gupta, and V . Kumar, “Genaug: Retargeting behaviors to unseen situations via generative augmentation,”arXiv preprint arXiv:2302.06671, 2023

  46. [46]

    Where to start? transferring simple skills to complex environ- ments,

    V . V osylius and E. Johns, “Where to start? transferring simple skills to complex environ- ments,” arXiv preprint arXiv:2212.06111, 2022

  47. [47]

    Leveraging sequentiality in reinforce- ment learning from a single demonstration,

    A. Chenu, O. Serris, O. Sigaud, and N. Perrin-Gilbert, “Leveraging sequentiality in reinforce- ment learning from a single demonstration,” arXiv preprint arXiv:2211.04786, 2022

  48. [48]

    Learning sensorimotor primitives of sequen- tial manipulation tasks from visual demonstrations,

    J. Liang, B. Wen, K. Bekris, and A. Boularias, “Learning sensorimotor primitives of sequen- tial manipulation tasks from visual demonstrations,” in 2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 8591–8597

  49. [49]

    robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

    Y . Zhu, J. Wong, A. Mandlekar, and R. Mart ´ın-Mart´ın, “robosuite: A modular simulation framework and benchmark for robot learning,” inarXiv preprint arXiv:2009.12293, 2020

  50. [50]

    Mujoco: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033

  51. [51]

    Factory: Fast contact for robotic assembly,

    Y . Narang, K. Storey, I. Akinola, M. Macklin, P. Reist, L. Wawrzyniak, Y . Guo, A. Mora- vanszky, G. State, M. Lu et al., “Factory: Fast contact for robotic assembly,” arXiv preprint arXiv:2205.03532, 2022

  52. [52]

    Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa et al. , “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021

  53. [53]

    Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation,

    Y . Zhu, P. Stone, and Y . Zhu, “Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation,” IEEE Robotics and Automation Letters , vol. 7, no. 2, pp. 4126–4133, 2022

  54. [54]

    Viola: Imitation learning for vision-based manipula- tion with object proposal priors,

    Y . Zhu, A. Joshi, P. Stone, and Y . Zhu, “Viola: Imitation learning for vision-based manipula- tion with object proposal priors,” 6th Annual Conference on Robot Learning , 2022

  55. [55]

    Learning and retrieval from prior data for skill-based imitation learning,

    S. Nasiriany, T. Gao, A. Mandlekar, and Y . Zhu, “Learning and retrieval from prior data for skill-based imitation learning,” in Conference on Robot Learning (CoRL) , 2022. 11

  56. [56]

    Calvin: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks,

    O. Mees, L. Hermann, E. Rosete-Beas, and W. Burgard, “Calvin: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7327–7334, 2022

  57. [57]

    Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020

  58. [58]

    Sim-to-real transfer of robotic control with dynamics randomization,

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 3803–3810

  59. [59]

    Sim2real transfer for reinforcement learning without dynamics randomization,

    M. Kaspar, J. D. M. Osorio, and J. Bock, “Sim2real transfer for reinforcement learning without dynamics randomization,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 4383–4388

  60. [60]

    Transferring dexterous manipulation from gpu simulation to a remote real-world trifinger,

    A. Allshire, M. MittaI, V . Lodaya, V . Makoviychuk, D. Makoviichuk, F. Widmaier, M. W¨uthrich, S. Bauer, A. Handa, and A. Garg, “Transferring dexterous manipulation from gpu simulation to a remote real-world trifinger,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2022, pp. 11 802–11 809

  61. [61]

    Practical imitation learning in the real world via task consistency loss,

    M. Khansari, D. Ho, Y . Du, A. Fuentes, M. Bennice, N. Sievers, S. Kirmani, Y . Bai, and E. Jang, “Practical imitation learning in the real world via task consistency loss,” arXiv preprint arXiv:2202.01862, 2022

  62. [62]

    Dextreme: Transfer of agile in-hand manipulation from simulation to reality,

    A. Handa, A. Allshire, V . Makoviychuk, A. Petrenko, R. Singh, J. Liu, D. Makoviichuk, K. Van Wyk, A. Zhurkevich, B. Sundaralingam et al., “Dextreme: Transfer of agile in-hand manipulation from simulation to reality,”arXiv preprint arXiv:2210.13702, 2022

  63. [63]

    A unified approach for motion and force control of robot manipulators: The operational space formulation,

    O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE Journal on Robotics and Automation , vol. 3, no. 1, pp. 43–53, 1987

  64. [64]

    Rb2: Robotic manipulation benchmarking with a twist,

    S. Dasari, J. Wang, J. Hong, S. Bahl, Y . Lin, A. Wang, A. Thankaraj, K. Chahal, B. Calli, S. Gupta et al. , “Rb2: Robotic manipulation benchmarking with a twist,” arXiv preprint arXiv:2203.08098, 2022

  65. [65]

    Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,

    T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” inConference on robot learning. PMLR, 2020, pp. 1094–1100

  66. [66]

    Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,

    T. Mu, Z. Ling, F. Xiang, D. Yang, X. Li, S. Tao, Z. Huang, Z. Jia, and H. Su, “Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,”arXiv preprint arXiv:2107.14483, 2021

  67. [67]

    Query-Efficient Imitation Learning for End-to-End Autonomous Driving

    J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end autonomous driving,” arXiv preprint arXiv:1605.06450, 2016

  68. [68]

    Lazydagger: Reducing context switching in interactive imitation learning,

    R. Hoque, A. Balakrishna, C. Putterman, M. Luo, D. S. Brown, D. Seita, B. Thananjeyan, E. Novoseller, and K. Goldberg, “Lazydagger: Reducing context switching in interactive imitation learning,” in 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE). IEEE, 2021, pp. 502–509

  69. [69]

    Thriftydagger: Budget-aware novelty and risk gating for interactive imitation learning,

    R. Hoque, A. Balakrishna, E. Novoseller, A. Wilcox, D. S. Brown, and K. Goldberg, “Thriftydagger: Budget-aware novelty and risk gating for interactive imitation learning,” arXiv preprint arXiv:2109.08273, 2021

  70. [70]

    Pato: Policy assisted teleoperation for scalable robot data collection,

    S. Dass, K. Pertsch, H. Zhang, Y . Lee, J. J. Lim, and S. Nikolaidis, “Pato: Policy assisted teleoperation for scalable robot data collection,” arXiv preprint arXiv:2212.04708, 2022

  71. [71]

    Datasetgan: Efficient labeled data factory with minimal human effort,

    Y . Zhang, H. Ling, J. Gao, K. Yin, J.-F. Lafleche, A. Barriuso, A. Torralba, and S. Fidler, “Datasetgan: Efficient labeled data factory with minimal human effort,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021, pp. 10 145– 10 155

  72. [72]

    Bigdatasetgan: Synthesiz- ing imagenet with pixel-wise annotations,

    D. Li, H. Ling, S. W. Kim, K. Kreis, S. Fidler, and A. Torralba, “Bigdatasetgan: Synthesiz- ing imagenet with pixel-wise annotations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21 330–21 340

  73. [73]

    Meta-sim: Learning to generate synthetic datasets,

    A. Kar, A. Prakash, M.-Y . Liu, E. Cameracci, J. Yuan, M. Rusiniak, D. Acuna, A. Torralba, and S. Fidler, “Meta-sim: Learning to generate synthetic datasets,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2019, pp. 4551–4560. 12

  74. [74]

    Meta-sim2: Unsupervised learning of scene structure for synthetic data generation,

    J. Devaranjan, A. Kar, and S. Fidler, “Meta-sim2: Unsupervised learning of scene structure for synthetic data generation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16. Springer, 2020, pp. 715–733

  75. [75]

    Drivegan: Towards a controllable high- quality neural simulation,

    S. W. Kim, J. Philion, A. Torralba, and S. Fidler, “Drivegan: Towards a controllable high- quality neural simulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5820–5829

  76. [76]

    Atiss: Autoregres- sive transformers for indoor scene synthesis,

    D. Paschalidou, A. Kar, M. Shugrina, K. Kreis, A. Geiger, and S. Fidler, “Atiss: Autoregres- sive transformers for indoor scene synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 013–12 026, 2021

  77. [77]

    Scenegen: Learning to generate realistic traffic scenes,

    S. Tan, K. Wong, S. Wang, S. Manivasagam, M. Ren, and R. Urtasun, “Scenegen: Learning to generate realistic traffic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 892–901

  78. [78]

    Motionbenchmaker: A tool to generate and benchmark motion planning datasets,

    C. Chamzas, C. Quintero-Pena, Z. Kingston, A. Orthey, D. Rakita, M. Gleicher, M. Toussaint, and L. E. Kavraki, “Motionbenchmaker: A tool to generate and benchmark motion planning datasets,” IEEE Robotics and Automation Letters , vol. 7, no. 2, pp. 882–889, 2021

  79. [79]

    Learning latent plans from play,

    C. Lynch, M. Khansari, T. Xiao, V . Kumar, J. Tompson, S. Levine, and P. Sermanet, “Learning latent plans from play,” inConference on Robot Learning, 2019

  80. [80]

    Demonstration-guided reinforcement learning with learned skills,

    K. Pertsch, Y . Lee, Y . Wu, and J. J. Lim, “Demonstration-guided reinforcement learning with learned skills,” in Conference on Robot Learning, 2021

Showing first 80 references.