Building a Scalable, Reproducible, Evaluatable, and Closed-Loop Simulation Environment Foundation for Embodied Intelligence Cloud-Native Simulation Infrastructure for Embodied Intelligence Training, Evaluation, and Data Collection
Pith reviewed 2026-06-29 04:23 UTC · model grok-4.3
The pith
A cloud-native simulation infrastructure unifies data generation, training, evaluation, and deployment for embodied intelligence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that cloud-native simulation infrastructure, through its four-layer architecture and adoption of elastic resource scheduling, containerized simulation, unified data management, and service-oriented design, provides a unified foundation for simulation environment generation, task execution, trajectory collection, model evaluation, data management, and cloud services in embodied intelligence.
What carries the argument
The four-layer cloud-native simulation infrastructure architecture that unifies environment assets, automated task generation, trajectory collection, benchmark evaluation, and closed-loop data optimization.
If this is right
- Enables efficient large-scale simulation for multi-model and multi-task workloads.
- Supports standardized evaluation and real-time data filtering through integrated systems.
- Facilitates closed-loop data optimization for continuous improvement.
- Provides a foundation for real-world deployment of embodied intelligence models.
Where Pith is reading between the lines
- The framework could enable researchers without access to physical robots to conduct large-scale experiments.
- It might accelerate the development of embodied AI by making simulation data more accessible and consistent.
- Future extensions could include direct transfer learning from simulation to real hardware.
- Scalability claims could be tested by running thousands of parallel simulations and measuring resource utilization.
Load-bearing premise
That cloud-native technologies such as elastic scheduling and containerization will substantially reduce the cost, improve scalability, and enhance reproducibility compared to traditional robotic data collection methods.
What would settle it
A side-by-side experiment measuring total cost, number of successful trajectories per unit time, and variance in results between this framework and a non-cloud-native simulation setup.
read the original abstract
This paper presents a cloud-native simulation infrastructure framework for embodied intelligence that supports large-scale training, standardized evaluation, and simulation-based data collection. The framework unifies simulation environment generation, task execution, trajectory collection, model evaluation, data management, and cloud services into a scalable and reproducible platform. To address the high cost, limited scalability, and poor reproducibility of real-world robotic data collection, the framework adopts cloud-native technologies including elastic resource scheduling, containerized simulation, unified data management, and service-oriented system design, enabling efficient large-scale simulation for multi-model and multi-task workloads. Built on a four-layer architecture, the framework provides standardized environment assets, automated task generation, trajectory collection, benchmark evaluation, and closed-loop data optimization. It further integrates representative systems including D-VLA, RL-VLA3, Sword, and Pre-VLA to support scalable simulation, dynamic scheduling, visual augmentation, and real-time data filtering. We argue that cloud-native simulation infrastructure provides a unified foundation for data generation, model training, standardized evaluation, and real-world deployment, and will play a key role in the future development of embodied intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a cloud-native simulation infrastructure framework for embodied intelligence that unifies environment asset management, automated task generation, trajectory collection, benchmark evaluation, and closed-loop data optimization. It adopts elastic scheduling, containerization, and unified data management in a four-layer architecture, integrates with D-VLA, RL-VLA3, Sword, and Pre-VLA, and argues that this design addresses the high cost, limited scalability, and poor reproducibility of real-world robotic data collection while providing a foundation for training, evaluation, and deployment.
Significance. If implemented and quantitatively validated, the proposed infrastructure could offer a standardized, reproducible platform that lowers barriers to large-scale embodied AI experimentation. The manuscript supplies only a high-level systems description with no scaling curves, throughput numbers, cost comparisons, or reproducibility metrics, so its significance is currently prospective rather than demonstrated.
major comments (2)
- [Abstract] Abstract: the claim that cloud-native technologies 'enable efficient large-scale simulation for multi-model and multi-task workloads' and solve high cost, limited scalability, and poor reproducibility is presented without any supporting measurements, scaling experiments, or comparisons against non-cloud baselines.
- [Four-layer architecture description] Four-layer architecture and integration sections: benefits of the environment-assets / task-generation / trajectory-collection / benchmark-evaluation layers and the cited integrations (D-VLA, RL-VLA3, Sword, Pre-VLA) are asserted as design-enabled outcomes, yet no ablation studies, throughput figures, trajectory-variance statistics, or baseline comparisons are reported.
minor comments (1)
- The manuscript would benefit from explicit definitions or references for the concrete container orchestration and data-management protocols employed in the unified data-management layer.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive criticism. The manuscript is intended as a systems paper describing a cloud-native simulation infrastructure framework. We agree with the observation that it lacks quantitative evaluations and will revise the text to more accurately reflect the scope and nature of the contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that cloud-native technologies 'enable efficient large-scale simulation for multi-model and multi-task workloads' and solve high cost, limited scalability, and poor reproducibility is presented without any supporting measurements, scaling experiments, or comparisons against non-cloud baselines.
Authors: We accept this point. The abstract overstates the demonstrated benefits. In the revised manuscript, we will rephrase the abstract to present these as intended outcomes of the design rather than proven results, and we will add a discussion on the rationale behind the design choices that are expected to address these issues. revision: yes
-
Referee: [Four-layer architecture description] Four-layer architecture and integration sections: benefits of the environment-assets / task-generation / trajectory-collection / benchmark-evaluation layers and the cited integrations (D-VLA, RL-VLA3, Sword, Pre-VLA) are asserted as design-enabled outcomes, yet no ablation studies, throughput figures, trajectory-variance statistics, or baseline comparisons are reported.
Authors: The four-layer architecture is presented as a proposed structure to achieve the goals of scalability and reproducibility. The integrations are examples of systems that can leverage this infrastructure. We will revise these sections to clarify that the benefits are hypothesized based on the architecture and that no empirical studies are included in this work, as the paper focuses on the infrastructure foundation rather than specific performance metrics. revision: yes
Circularity Check
No circularity: systems architecture paper with no derivations or predictions
full rationale
The manuscript is a descriptive systems paper proposing a four-layer cloud-native simulation framework. It contains no equations, no fitted parameters, no predictions of quantities, and no derivation chains. Claims about scalability and reproducibility are presented as enabled by the adopted technologies (elastic scheduling, containerization) rather than derived from any inputs or self-citations. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear. The central assertions reduce to design choices, not to any tautological reduction of outputs to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
RT-1: Robotics transformer for real-world control at scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, A vinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashniko...
2023
-
[2]
Ryoo, Grecia Salazar, Pannag R
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deepak Manjunath, Igor Mordatch...
2023
-
[3]
OpenVLA: An open-source vision-language-action model
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. OpenVLA: An open-source vision-language-action model. arXiv preprint arXiv:2406.09246, 2024
Pith/arXiv arXiv 2024
-
[4]
Octo: An open-source generalist robot policy
Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jianlan Luo, Teodor Kreiman, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. arXiv preprint arXiv:2405.12213, 2024
Pith/arXiv arXiv 2024
-
[5]
David Ha and Jürgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018
Pith/arXiv arXiv 2018
-
[6]
Dream to control: Learning behaviors by latent imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020
2020
-
[7]
RL-VLA3: A flexible and asynchronous reinforcement learning framework for vla training
Haoran Sun, Yongjian Guo, Zhong Guan, Shuai Di, Xiaodong Bai, Jing Long, Tianyun Zhao, Mingxi Luo, Hongke Zhao, Likang Wu, Xiaotie Deng, Xu Chu, Xi Xiao, Sheng Wen, Yicheng Gong, and Junwu Xiong. RL-VLA3: A flexible and asynchronous reinforcement learning framework for vla training. arXiv preprint arXiv:2602.05765, 2026
Pith/arXiv arXiv 2026
-
[8]
Yucheng Guo, Yongjian Guo, Zhong Guan, Wen Huang, Haoran Sun, Haodong Yue, Xiaolong Xiang, Shuai Di, Zhen Sun, Luqiao Wang, Junwu Xiong, and Yicheng Gong. D-VLA: A high-concurrency distributed asynchronous reinforcement learning framework for vision-language-action models. arXiv preprint arXiv:2605.13276, 2026
Pith/arXiv arXiv 2026
-
[9]
Robert E. Shannon. Introduction to the art and science of simulation. In Proceedings of the 30th Conference on Winter Simulation, pages 7–14, 1998
1998
-
[10]
Domain random- ization for transferring deep neural networks from simulation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain random- ization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 23–30, 2017
2017
-
[11]
CAD2RL: Real single-image flight without a single real image
Fereshteh Sadeghi and Sergey Levine. CAD2RL: Real single-image flight without a single real image. In Robotics: Science and Systems, 2017
2017
-
[12]
Sim-to-real transfer of robotic control with dynamics randomization
Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In IEEE International Conference on Robotics and Automation, pages 3803–3810, 2018
2018
-
[13]
Isaac Gym: High performance GPU-based physics simulation for robot learning
Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac Gym: High performance GPU-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021. 27
Pith/arXiv arXiv 2021
-
[14]
MuJoCo: A physics engine for model-based control
Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012
2012
-
[15]
Chang, Leonidas J
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. SAPIEN: A simulated part- based interactive environment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020
2020
-
[16]
Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. RLBench: The robot learning benchmark and learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020
2020
-
[17]
ManiSkill2: A unified benchmark for generalizable manipulation skills
Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yunchao Yao, Xiaodi Yuan, Pengwei Xie, Zhiao Huang, Rui Chen, and Hao Su. ManiSkill2: A unified benchmark for generalizable manipulation skills. In International Conference on Learning Representations, 2023
2023
-
[18]
LIBERO: Benchmarking knowledge transfer for lifelong robot learning
Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. LIBERO: Benchmarking knowledge transfer for lifelong robot learning. In Advances in Neural Information Processing Systems, 2024
2024
-
[19]
RoboCasa: Large-scale simulation of everyday tasks for generalist robots
Yifeng Zhu, Abhishek Joshi, Peter Stone, and Yuke Zhu. RoboCasa: Large-scale simulation of everyday tasks for generalist robots. In Robotics: Science and Systems, 2024
2024
-
[20]
AI2-THOR: An interactive 3d environment for visual AI
Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. AI2-THOR: An interactive 3d environment for visual AI. arXiv preprint arXiv:1712.05474, 2017
Pith/arXiv arXiv 2017
-
[21]
Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel X. Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Trainin...
2021
-
[22]
BEHA VIOR: Benchmark for everyday household activities in virtual, interactive, and ecological environments
Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, and Li Fei-Fei. BEHA VIOR: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on Robot Learning, 2022
2022
-
[23]
CAL VIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks
Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. CAL VIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks. In IEEE Robotics and Automation Letters, 2022
2022
-
[24]
Open X-Embodiment: Robotic learning datasets and RT-X models
Open X-Embodiment Collaboration. Open X-Embodiment: Robotic learning datasets and RT-X models. In IEEE International Conference on Robotics and Automation, 2024
2024
-
[25]
DROID: A large-scale in-the-wild robot manipulation dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter Luo, Fan Qian, Ethan Zhu, Dibya Gandhi, Bradly Stadie, Austin Stone, Michael Chiang, Fei Xia, Chelsea Finn, and Sergey Levine. DROID: A large-scale in-the-wild robot man...
2024
-
[26]
Bridgedata v2: A dataset for robot learning at scale
Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen- Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, and Sergey Levine. Bridgedata v2: A dataset for robot learning at scale. Conference on Robot Learning Workshop, 2023
2023
-
[27]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In International Conference on Machine Learning, pages 2555–2565, 2019
2019
-
[28]
Jiaxuan Gao, Yongjian Guo, Zhong Guan, Wen Huang, Wanlun Ma, Xi Xiao, Junwu Xiong, and Sheng Wen. Sword: Style-robust world models as simulators via dynamic latent bootstrapping for vla policy post-training. arXiv preprint arXiv:2605.07288, 2026
Pith/arXiv arXiv 2026
-
[29]
Zhen Sun, Yongjian Guo, Haoran Sun, Luqiao Wang, Wei Lu, Jiachi Ji, Shengzhe Ji, Junwu Xiong, and Zhijun Meng. Pre-vla: Preemptive runtime verification for reliable vision-language-action and world-model rollouts. arXiv preprint arXiv:2605.22446, 2026
Pith/arXiv arXiv 2026
-
[30]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems, 2023. 28
2023
-
[31]
Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. In Robotics: Science and Systems, 2023
2023
-
[32]
Design and use paradigms for Gazebo, an open-source multi-robot simulator
Nathan Koenig and Andrew Howard. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2149–2154, 2004
2004
-
[33]
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y. Ng. ROS: An open-source robot operating system. In ICRA Workshop on Open Source Software, 2009
2009
-
[34]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016
Pith/arXiv arXiv 2016
-
[35]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018
Pith/arXiv arXiv 2018
-
[36]
robosuite: A modular simulation framework and benchmark for robot learning
Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Martin-Martin, Abhishek Joshi, Soroush Nasiriany, and Yifeng Zhu. robosuite: A modular simulation framework and benchmark for robot learning. arXiv preprint arXiv:2009.12293, 2020
Pith/arXiv arXiv 2009
-
[37]
PyBullet, a Python module for physics simulation for games, robotics and machine learning
Erwin Coumans and Yunfei Bai. PyBullet, a Python module for physics simulation for games, robotics and machine learning. GitHub repository, 2016
2016
-
[38]
Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning
Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pages 1094–1100, 2020
2020
-
[39]
Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, and Silvio Savarese
Fei Xia, Amir R. Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, and Silvio Savarese. Gibson Env: Real- world perception for embodied agents. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9068–9079, 2018
2018
-
[40]
iGibson 1.0: A simulation environment for interactive tasks in large realistic scenes
Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Shyamal Buch, Claudia D’Arpino, Sanjana Srivastava, Lyne Tchapmi, Kent Vainio, James Wong, Li Fei-Fei, and Silvio Savarese. iGibson 1.0: A simulation environment for interactive tasks in large realistic scenes. In IEEE/RSJ International Conference on Intelligent Robots and S...
2021
-
[41]
Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang
Angel X. Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3D: Learning from RGB-D data in indoor environments. In International Conference on 3D Vision, pages 667–676, 2017
2017
-
[42]
Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Ming Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Stra...
Pith/arXiv arXiv 1906
-
[43]
ProcTHOR: Large-scale embodied AI using procedural generation
Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. ProcTHOR: Large-scale embodied AI using procedural generation. In Advances in Neural Information Processing Systems, 2022
2022
-
[44]
Habitat: A platform for embodied AI research
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied AI research. In IEEE/CVF International Conference on Computer Vision, pages 9339–9347, 2019
2019
-
[45]
Vision-and-language navigation: Interpreting visually-grounded navigation instruc- tions in real environments
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, and Anton van den Hengel. Vision-and-language navigation: Interpreting visually-grounded navigation instruc- tions in real environments. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3674–3683, 2018
2018
-
[46]
ALFRED: A benchmark for interpreting grounded instructions for everyday tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettle- moyer, and Dieter Fox. ALFRED: A benchmark for interpreting grounded instructions for everyday tasks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10740–10749, 2020. 29
2020
-
[47]
TEACh: Task-driven embodied agents that chat
Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, and Dilek Hakkani-Tur. TEACh: Task-driven embodied agents that chat. In AAAI Conference on Artificial Intelligence, pages 2017–2025, 2022
2017
-
[48]
Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics
Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, and Ken Goldberg. Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and Systems, 2017
2017
-
[49]
End-to-end training of deep visuomotor policies
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. In Journal of Machine Learning Research, volume 17, pages 1–40, 2016
2016
-
[50]
QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation
Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, and Sergey Levine. QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pages 651–673, 2018
2018
-
[51]
RoboNet: Large-scale multi-robot learning
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, and Chelsea Finn. RoboNet: Large-scale multi-robot learning. In Conference on Robot Learning, pages 885–897, 2019
2019
-
[52]
What matters in learning from offline human demonstrations for robot manipulation
Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning, 2021
2021
-
[53]
Mastering diverse domains through world models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023
Pith/arXiv arXiv 2023
-
[54]
Mas- tering atari, go, chess and shogi by planning with a learned model
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, and David Silver. Mas- tering atari, go, chess and shogi by planning with a learned model. Nature, 588:604–609, 2020
2020
-
[55]
A generalist agent
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gómez Colmenarejo, Alexander Novikov, Gabriel Barth- Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, and Nando de Freitas. A generalist agent. Transactions on...
2022
-
[56]
Do as i can, not as i say: Grounding language in robotic affordances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang...
2022
-
[57]
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duck- worth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. PaLM-E: An embodie...
2023
-
[58]
Inner monologue: Embodied reasoning through planning with language models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tianhe Yu Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models. In Conference on Robot Learning, 2022
2022
-
[59]
Code as policies: Language model programs for embodied control
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. In IEEE International Conference on Robotics and Automation, pages 9493–9500, 2023. 30
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.