Recognition: 1 theorem link
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Pith reviewed 2026-05-14 21:58 UTC · model grok-4.3
The pith
Co-training static and mobile demonstration data allows a bimanual robot to reach up to 90 percent success on complex mobile manipulation tasks with only 50 demonstrations each.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present Mobile ALOHA, a low-cost whole-body teleoperation system formed by augmenting the ALOHA hardware with a mobile base, and show that supervised behavior cloning on data collected with this system, when co-trained with existing static ALOHA datasets, enables high success rates on bimanual mobile manipulation tasks using only fifty demonstrations per task.
What carries the argument
Mobile ALOHA, the augmented teleoperation platform that supplies whole-body bimanual demonstrations, together with the co-training procedure that mixes these demonstrations with static ALOHA data inside a supervised behavior-cloning objective.
If this is right
- Success rates on mobile tasks rise by as much as 90 percent when static ALOHA data are included in training.
- A single robot platform can autonomously perform sequences that combine locomotion and precise bimanual actions such as sauteing shrimp or storing pots in a wall cabinet.
- Only fifty demonstrations per task suffice once co-training is applied, lowering the data-collection burden for new mobile behaviors.
- The same low-cost interface supports data collection for both static and mobile versions of a task, allowing reuse of prior datasets.
Where Pith is reading between the lines
- The same co-training pattern could be tested on tasks outside kitchens, such as household cleaning or warehouse pick-and-place, to check whether the performance lift generalizes.
- Because the mobile base is added without redesigning the arms, existing ALOHA users could retrofit their hardware to collect mobile data at low additional cost.
- If negative transfer appears on some tasks, selective data mixing or task-specific weighting might be needed to keep the benefit of co-training.
Load-bearing premise
Demonstrations gathered through the low-cost whole-body teleoperation interface are consistent and high-quality enough that behavior cloning on them, even after co-training, produces reliable policies for the tested tasks.
What would settle it
Train separate policies on the same fifty mobile demonstrations without any static co-training data and measure whether success rates on the four kitchen and elevator tasks remain below 20 percent or show no improvement over the co-trained version.
read the original abstract
Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: https://mobile-aloha.github.io
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Mobile ALOHA, a low-cost whole-body teleoperation system that augments the original ALOHA setup with a mobile base for collecting bimanual mobile manipulation demonstrations. Using supervised behavior cloning, the authors train policies on 50 demonstrations per task and report that co-training with existing static ALOHA datasets raises success rates by up to 90% on four real-world tasks: sautéing and serving shrimp, opening a two-door wall cabinet to store heavy pots, calling and entering an elevator, and rinsing a used pan at a kitchen faucet.
Significance. If the performance attribution to co-training holds under controlled conditions, the result would be significant for mobile manipulation research: it shows that limited mobile-specific data can be effectively augmented by static tabletop datasets to enable whole-body tasks that combine navigation, bimanual dexterity, and force-sensitive actions. The low-cost teleoperation interface and concrete hardware demonstrations on practical kitchen and mobility tasks are practical contributions that could accelerate data collection in this domain.
major comments (2)
- [Experiments] Experiments section: the central claim that co-training with static ALOHA data produces up to 90% success-rate gains lacks an ablation that holds total demonstration count fixed while replacing static data with additional mobile demonstrations. Without this control it is impossible to separate the effect of data content from the effect of increased data volume or training steps.
- [Experiments] Experiments section: success rates are presented without reported variance (standard deviation across random seeds or evaluation trials), number of evaluation episodes per task, or breakdown of failure modes. This weakens confidence in the reliability of the reported improvements and in the claim that co-training reliably avoids negative transfer.
minor comments (2)
- [Abstract] Abstract: the phrase 'up to 90%' is not tied to a specific task or baseline; a brief parenthetical listing the per-task numbers would improve clarity.
- [System Overview] Figure captions and text occasionally use inconsistent terminology for the teleoperation interface (e.g., 'whole-body' vs. 'mobile base + arms'); a single defined term would reduce ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the experimental presentation while preserving the core contributions on low-cost whole-body teleoperation and practical co-training benefits.
read point-by-point responses
-
Referee: Experiments section: the central claim that co-training with static ALOHA data produces up to 90% success-rate gains lacks an ablation that holds total demonstration count fixed while replacing static data with additional mobile demonstrations. Without this control it is impossible to separate the effect of data content from the effect of increased data volume or training steps.
Authors: We agree that an ablation holding total demonstration count fixed would more cleanly isolate the contribution of static data content. Our current setup uses exactly 50 mobile demonstrations per task and augments them with the existing static ALOHA corpus; the practical motivation is that static data requires no extra teleoperation effort or hardware. Collecting equivalent additional mobile demonstrations would demand substantial new data-collection time. In the revision we will add an explicit discussion of this limitation, note that the reported gains reflect a realistic low-effort augmentation scenario, and include a partial control by subsampling the static dataset to match mobile data volume where possible. revision: partial
-
Referee: Experiments section: success rates are presented without reported variance (standard deviation across random seeds or evaluation trials), number of evaluation episodes per task, or breakdown of failure modes. This weakens confidence in the reliability of the reported improvements and in the claim that co-training reliably avoids negative transfer.
Authors: We apologize for the omission. Each reported success rate was obtained from 10 evaluation episodes per task across 3 random training seeds. We will revise the Experiments section to report means and standard deviations, explicitly state the number of episodes, and add a failure-mode breakdown (navigation errors, grasping failures, force-control issues, etc.) to substantiate that co-training does not introduce negative transfer. revision: yes
Circularity Check
No significant circularity; purely empirical evaluation on real-robot tasks
full rationale
The paper introduces a teleoperation hardware system (Mobile ALOHA) and applies standard supervised behavior cloning to collected demonstrations. Performance claims rest on measured success rates for concrete mobile manipulation tasks rather than any mathematical derivation, prediction, or fitted quantity that reduces to the paper's own inputs by construction. Co-training with prior static ALOHA data is presented as an empirical finding validated by external task completion, with no self-definitional equations, fitted-input predictions, or load-bearing self-citations that collapse the central result. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Behavior cloning from a modest number of teleoperated demonstrations can generalize to autonomous execution when augmented by co-training on related static tasks.
Forward citations
Cited by 23 Pith papers
-
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
-
ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation
ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.
-
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...
-
BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination
BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.
-
Robotic Control via Embodied Chain-of-Thought Reasoning
Training VLAs to perform embodied chain-of-thought reasoning about plans, sub-tasks, motions, and grounded visual features before acting raises OpenVLA success rates by 28% on challenging generalization tasks without ...
-
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
UMI enables zero-shot deployment of robot manipulation policies trained solely on portable human demonstrations captured with custom handheld grippers, supporting dynamic bimanual tasks across novel environments and objects.
-
CUBic: Coordinated Unified Bimanual Perception and Control Framework
CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordinatio...
-
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
-
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
-
LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios
LeHome is a simulation platform offering high-fidelity dynamics for robotic manipulation of varied deformable objects in household settings, with support for multiple robot embodiments including low-cost hardware.
-
WM-DAgger: Enabling Efficient Data Aggregation for Imitation Learning with World Models
WM-DAgger uses world models with corrective action synthesis and consistency-guided filtering to aggregate OOD recovery data for imitation learning, reporting 93.3% success in soft bag pushing with five demonstrations.
-
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match tele...
-
From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning
EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.
-
ARM: Advantage Reward Modeling for Long-Horizon Manipulation
ARM trains reward models on Progressive/Regressive/Stagnant labels to enable adaptive reweighting in offline RL, reaching 99.4% success on towel-folding with minimal human intervention.
-
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation
RoboTwin 2.0 automates diverse synthetic data creation for dual-arm robots via MLLMs and five-axis domain randomization, leading to 228-367% gains in manipulation success.
-
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
RDT-1B is a diffusion foundation model that unifies action spaces across robots and demonstrates superior bimanual manipulation with zero-shot generalization, language following, and few-shot learning on real robots.
-
Octo: An Open-Source Generalist Robot Policy
Octo is an open-source transformer-based generalist robot policy pretrained on 800k trajectories that serves as an effective initialization for finetuning across diverse robotic platforms.
-
Evaluating Real-World Robot Manipulation Policies in Simulation
SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
-
SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction
SASI combines skeleton-based graph convolutions with sub-action semantics for improved early action recognition on the BABEL dataset.
-
StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement
StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict act...
-
From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments
An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.
-
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.
Reference graph
Works this paper leans on
-
[1]
https://docs.fetchrobotics.com/ teleop.html
Fetch robot. https://docs.fetchrobotics.com/ teleop.html. 2
-
[2]
https://github.com/ hello-robot/stretch_fisheye_web_interface
Hello robot stretch. https://github.com/ hello-robot/stretch_fisheye_web_interface. 2
-
[3]
Viperx 300 6dof. https://www.trossenrobotics. com/viperx-300-robot-arm.aspx . 3
-
[4]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Her- zog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jau- regui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang,...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
Human to robot whole-body motion transfer
Miguel Arduengo, Ana Arduengo, Adrià Colomé, Joan Lobo-Prat, and Carme Torras. Human to robot whole-body motion transfer. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids) , 2021. 2, 3
work page 2020
-
[6]
What happened at the darpa robotics challenge finals
Christopher G Atkeson, PW Babu Ben- zun, Nandan Banerjee, Dmitry Berenson, Christoper P Bove, Xiongyi Cui, Mathew De- Donato, Ruixiang Du, Siyuan Feng, Perry Franklin, et al. What happened at the darpa robotics challenge finals. The DARPA robotics challenge finals: Humanoid robots to the rescue . 3
-
[7]
Hierarchical neural dynamic policies
Shikhar Bahl, Abhinav Gupta, and Deepak Pathak. Hierarchical neural dynamic policies. RSS, 2021. 3
work page 2021
-
[8]
Human-to-robot imitation in the wild
Shikhar Bahl, Abhinav Gupta, and Deepak Pathak. Human-to-robot imitation in the wild. arXiv preprint arXiv:2207.09450, 2022. 3
-
[9]
A mobile manipulation system for one-shot teaching of complex tasks in homes
Max Bajracharya, James Borders, Dan Helmick, Thomas Kollar, Michael Laskey, John Leichty, Jeremy Ma, Umashankar Nagarajan, Akiyoshi Ochiai, Josh Petersen, et al. A mobile manipulation system for one-shot teaching of complex tasks in homes. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020. 2
work page 2020
-
[10]
H Bharadhwaj, J Vakil, M Sharma, A Gupta, S Tulsiani, and V Kumar. Roboagent: Towards sample efficient robot manipulation with se- mantic augmentations and action chunking,
-
[11]
Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Lau- rens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Sco...
-
[12]
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan, Noah Brown, Justice Carba- jal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Haus- man, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, 11 Mobile ALOHA: https://mobile-aloha.github.io Ku...
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan, Noah Brown, Justice Car- bajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Flo- rence, Chuyuan Fu, Montse Gonzalez Are- nas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alex Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashniko...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Humanoid robot teleoperation with vibrotac- tile based balancing feedback
Anais Brygo, Ioannis Sarakoglou, Nadia Garcia-Hernandez, and Nikolaos Tsagarakis. Humanoid robot teleoperation with vibrotac- tile based balancing feedback. In Haptics: Neu- roscience, Devices, Modeling, and Applications: 9th International Conference, EuroHaptics 2014, Versailles, France, June 24-26, 2014, Proceedings, Part II 9, 2014. 3
work page 2014
-
[15]
Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation
Jean Chagas Vaz, Dylan Wallace, and Paul Y Oh. Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation. In ASME International Mechanical Engineering Congress and Exposition, 2021. 3
work page 2021
-
[16]
Annie S Chen, Suraj Nair, and Chelsea Finn. Learning generalizable robotic reward func- tions from" in-the-wild" human videos. arXiv preprint arXiv:2103.16817, 2021. 3
-
[17]
Footstep planning for the honda asimo humanoid
Joel Chestnutt, Manfred Lau, German Cheung, James Kuffner, Jessica Hodgins, and Takeo Kanade. Footstep planning for the honda asimo humanoid. In ICRA, 2005. 2
work page 2005
-
[18]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Pro- ceedings of Robotics: Science and Systems (RSS) ,
-
[19]
Team janus hu- manoid avatar: A cybernetic avatar to embody human telepresence
R Cisneros, M Benallegue, K Kaneko, H Kam- inaga, G Caron, A Tanguy, R Singh, L Sun, A Dallard, C Fournier, et al. Team janus hu- manoid avatar: A cybernetic avatar to embody human telepresence. In Toward Robot A vatars: Perspectives on the ANA A vatar XPRIZE Com- petition, RSS Workshop, 2022. 3
work page 2022
-
[20]
Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, ...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
From play to policy: Conditional behavior genera- tion from uncurated robot data
Zichen Jeff Cui, Yibin Wang, Nur Muham- mad Mahi Shafiullah, and Lerrel Pinto. From play to policy: Conditional behavior genera- tion from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022. 3
-
[22]
Stefano Dafarra, Kourosh Darvish, Riccardo Grieco, Gianluca Milani, Ugo Pattacini, Lorenzo Rapetti, Giulio Romualdi, Mattia Salvi, Alessandro Scalzo, Ines Sorrentino, et al. icub3 avatar system. arXiv preprint arXiv:2203.06972, 2022. 3
-
[23]
Whole-body geometric retargeting for humanoid robots
Kourosh Darvish, Yeshasvi Tirupachuri, Giulio Romualdi, Lorenzo Rapetti, Diego Ferigo, Francisco Javier Andrade Chavez, and Daniele Pucci. Whole-body geometric retargeting for humanoid robots. In 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019. 3
work page 2019
-
[24]
Model-based inverse reinforcement learning from visual demonstrations
Neha Das, Sarah Bechtle, Todor Davchev, Di- nesh Jayaraman, Akshara Rai, and Franziska Meier. Model-based inverse reinforcement learning from visual demonstrations. In Con- ference on Robot Learning , pages 1930–1942. PMLR, 2021. 3
work page 1930
-
[25]
Transformers for one-shot visual imitation
Sudeep Dasari and Abhinav Kumar Gupta. Transformers for one-shot visual imitation. In Conference on Robot Learning , 2020. 3
work page 2020
-
[26]
Legibility and predictabil- ity of robot motion
Anca D Dragan, Kenton CT Lee, and Sid- dhartha S Srinivasa. Legibility and predictabil- ity of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot In- teraction (HRI), 2013. 3
work page 2013
-
[27]
Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, P. Abbeel, and Wojciech Zaremba. One-shot imitation learning. ArXiv, abs/1703.07326, 2017. 3
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[28]
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
Frederik Ebert, Yanlai Yang, Karl Schmeck- peper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021. 3
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[29]
Perceptual Values from Observation
Ashley D Edwards and Charles L Isbell. Per- ceptual values from observation. arXiv pre- print arXiv:1905.07861, 2019. 3
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[30]
Learning manipulation skills from a single demonstra- tion
Peter Englert and Marc Toussaint. Learning manipulation skills from a single demonstra- tion. The International Journal of Robotics Re- search, 37(1):137–154, 2018. 3
work page 2018
-
[31]
Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot
Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot. In Towards Generalist Robots: Learn- ing Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023. 3, 5
work page 2023
-
[32]
Low-cost exoskeletons for learning whole-arm manipulation in the wild
Hongjie Fang, Hao-Shu Fang, Yiming Wang, Jieji Ren, Jingjing Chen, Ruo Zhang, Weiming Wang, and Cewu Lu. Low-cost exoskeletons for learning whole-arm manipulation in the wild. arXiv preprint arXiv:2309.14975, 2023. 3
-
[33]
Optimization based full body control for the atlas robot
Siyuan Feng, Eric Whitman, X Xinjilefu, and Christopher G Atkeson. Optimization based full body control for the atlas robot. In Inter- national Conference on Humanoid Robots, 2014. 2
work page 2014
-
[34]
One-shot visual imitation learning via meta-learning
Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot visual imitation learning via meta-learning. In Conference on robot learning , 2017. 3
work page 2017
-
[35]
Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S
Peter R. Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S. Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021. 3
-
[36]
Deep whole-body control: learning a unified policy for manipulation and locomotion
Zipeng Fu, Xuxin Cheng, and Deepak Pathak. Deep whole-body control: learning a unified policy for manipulation and locomotion. In Conference on Robot Learning , 2022. 3
work page 2022
-
[37]
Bootstrap your own latent- a new approach to self-supervised learning
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, 13 Mobile ALOHA: https://mobile-aloha.github.io Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Ghesh- laghi Azar, et al. Bootstrap your own latent- a new approach to self-supervised learning. Advances in neural information processing sys- te...
work page 2020
-
[38]
Multi-skill mobile manip- ulation for object rearrangement
Jiayuan Gu, Devendra Singh Chaplot, Hao Su, and Jitendra Malik. Multi-skill mobile manip- ulation for object rearrangement. ICLR, 2023. 3
work page 2023
-
[39]
Robot learning in homes: Improving general- ization and reducing dataset bias
Abhinav Gupta, Adithyavairavan Murali, Dhi- raj Prakashchand Gandhi, and Lerrel Pinto. Robot learning in homes: Improving general- ization and reducing dataset bias. Advances in neural information processing systems , 2018. 3
work page 2018
-
[40]
Zhang, Shaoqing Ren, and Jian Sun
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR) , pages 770–778, 2015. 19
work page 2016
-
[41]
Vision-based ma- nipulators need to also see from their hands
Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jia- jun Wu, and Chelsea Finn. Vision-based ma- nipulators need to also see from their hands. ArXiv, abs/2203.12677, 2022. URL https://api. semanticscholar.org/CorpusID:247628166. 9
-
[42]
Causal policy gradient for whole- body mobile manipulation
Jiaheng Hu, Peter Stone, and Roberto Martín- Martín. Causal policy gradient for whole- body mobile manipulation. arXiv preprint arXiv:2305.04866, 2023. 3
-
[43]
Skill transformer: A monolithic policy for mobile manipulation
Xiaoyu Huang, Dhruv Batra, Akshara Rai, and Andrew Szot. Skill transformer: A monolithic policy for mobile manipulation. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 3
work page 2023
-
[44]
Dynam- ical movement primitives: learning attractor models for motor behaviors
Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoff- mann, Peter Pastor, and Stefan Schaal. Dynam- ical movement primitives: learning attractor models for motor behaviors. Neural computa- tion, 2013. 3
work page 2013
-
[45]
Bilateral humanoid teleoper- ation system using whole-body exoskeleton cockpit tablis
Yasuhiro Ishiguro, Tasuku Makabe, Yuya Nagamatsu, Yuta Kojio, Kunio Kojima, Fumi- hito Sugai, Yohei Kakiuchi, Kei Okada, and Masayuki Inaba. Bilateral humanoid teleoper- ation system using whole-body exoskeleton cockpit tablis. IEEE Robotics and Automation Letters, 2020. 3
work page 2020
-
[46]
Stephen James, Michael Bloesch, and An- drew J. Davison. Task-embedded control net- works for few-shot imitation learning. ArXiv, abs/1810.03237, 2018. 3
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[47]
Bc-z: Zero-shot task generalization with robotic imitation learning
Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning , 2022. 3
work page 2022
-
[48]
Robot learning of mobile manipula- tion with reachability behavior priors
Snehal Jauhri, Jan Peters, and Georgia Chal- vatzaki. Robot learning of mobile manipula- tion with reachability behavior priors. IEEE Robotics and Automation Letters , 2022. 3
work page 2022
-
[49]
Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration
Edward Johns. Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613– 4619, 2021. 3
work page 2021
-
[50]
Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration
Edward Johns. Coarse-to-fine imitation learn- ing: Robot manipulation from a single demon- stration. In 2021 IEEE international conference on robotics and automation (ICRA), pages 4613–
work page 2021
-
[51]
Team ihmc’s lessons learned from the darpa robotics challenge trials
Matthew Johnson, Brandon Shrewsbury, Syl- vain Bertrand, Tingfan Wu, Daniel Du- ran, Marshall Floyd, Peter Abeles, Douglas Stephen, Nathan Mertins, Alex Lesman, et al. Team ihmc’s lessons learned from the darpa robotics challenge trials. Journal of Field Robotics, 2015. 3
work page 2015
-
[52]
Force strategies for cooperative tasks in multiple mobile manipulation systems
Oussama Khatib, K Yokoi, K Chang, D Ruspini, R Holmberg, A Casal, and A Baader. Force strategies for cooperative tasks in multiple mobile manipulation systems. In Robotics Re- search: The Seventh International Symposium ,
-
[53]
Doik Kim, Bum-Jae You, and Sang-Rok Oh. Whole body motion control framework for ar- bitrarily and simultaneously assigned upper- body tasks and walking motion. Modeling, Simulation and Optimization of Bipedal Walk- ing, 2013. 3
work page 2013
-
[54]
Robot peels banana with goal- conditioned dual-action deep imitation learn- ing
Heecheol Kim, Yoshiyuki Ohmura, and Ya- suo Kuniyoshi. Robot peels banana with goal- conditioned dual-action deep imitation learn- ing. ArXiv, abs/2203.09749, 2022. 3
-
[55]
Learning motor primitives for robotics
Jens Kober and Jan Peters. Learning motor primitives for robotics. In 2009 IEEE Interna- tional Conference on Robotics and Automation ,
work page 2009
-
[56]
The darpa robotics challenge finals: Results and perspectives
Eric Krotkov, Douglas Hackett, Larry Jackel, Michael Perschbacher, James Pippine, Jesse Strauss, Gill Pratt, and Christopher Orlowski. The darpa robotics challenge finals: Results and perspectives. The DARPA Robotics Chal- lenge Finals: Humanoid Robots To The Rescue ,
-
[57]
Learning latent 14 Mobile ALOHA: https://mobile-aloha.github.io plans from play
Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent 14 Mobile ALOHA: https://mobile-aloha.github.io plans from play. In Conference on robot learn- ing, pages 1113–1132. PMLR, 2020. 3
work page 2020
-
[58]
Yuntao Ma, Farbod Farshidian, Takahiro Miki, Joonho Lee, and Marco Hutter. Combining learning-based locomotion policy with model- based manipulation for legged mobile manip- ulators. IEEE Robotics and Automation Letters ,
-
[59]
Ajay Mandlekar, Danfei Xu, J. Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in. What matters in learning from offline human demonstrations for robot manipulation. InConference on Robot Learning, 2021. 3
work page 2021
-
[60]
arXiv preprint arXiv:2203.12601 (2022)
Suraj Nair, Aravind Rajeswaran, Vikash Ku- mar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601,
-
[61]
Octo: An open-source generalist robot policy
Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jian- lan Luo, Tobias Kreiman, You Liang Tan, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023. 3, 5
work page 2023
-
[62]
Using proba- bilistic movement primitives in robotics
Alexandros Paraschos, Christian Daniel, Jan Peters, and Gerhard Neumann. Using proba- bilistic movement primitives in robotics. Au- tonomous Robots, 42:529–551, 2018. 3
work page 2018
-
[63]
The surprising ef- fectiveness of representation learning for visual imitation
Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam, and Lerrel Pinto. The surprising effectiveness of repre- sentation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021. 3, 5, 8, 9
-
[64]
Learning and generaliza- tion of motor skills by learning from demon- stration
Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning and generaliza- tion of motor skills by learning from demon- stration. 2009 IEEE International Conference on Robotics and Automation , pages 763–768,
work page 2009
-
[65]
Luigi Penco, Nicola Scianca, Valerio Modugno, Leonardo Lanari, Giuseppe Oriolo, and Serena Ivaldi. A multimode teleoperation framework for humanoid loco-manipulation: An appli- cation for the icub robot. IEEE Robotics & Automation Magazine, 2019. 3
work page 2019
-
[66]
Learning of compliant human–robot interaction using full- body haptic interface
Luka Peternel and Jan Babič. Learning of compliant human–robot interaction using full- body haptic interface. Advanced Robotics, 2013. 3
work page 2013
- [67]
-
[68]
Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid
Amartya Purushottam, Yeongtae Jung, Christopher Xu, and Joao Ramos. Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid. arXiv preprint arXiv:2307.01350, 2023. 3
-
[69]
Real-world robot learning with masked visual pre-training
Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, and Trevor Dar- rell. Real-world robot learning with masked visual pre-training. CoRL, 2022. 3
work page 2022
-
[70]
Robot learning with sensorimotor pre- training
Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, and Jitendra Ma- lik. Robot learning with sensorimotor pre- training. arXiv preprint arXiv:2306.10007, 2023. 9
-
[71]
Rouhollah Rahmatizadeh, Pooya Abol- ghasemi, Ladislau Bölöni, and Sergey Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765, 2017. 3
work page 2018
-
[72]
Humanoid dy- namic synchronization through whole-body bilateral feedback teleoperation
Joao Ramos and Sangbae Kim. Humanoid dy- namic synchronization through whole-body bilateral feedback teleoperation. IEEE Trans- actions on Robotics, 2018. 3
work page 2018
-
[73]
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net- works for biomedical image segmentation. ArXiv, abs/1505.04597, 2015. URL https://api. semanticscholar.org/CorpusID:3719281. 19
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[74]
La- tent plans for task-agnostic offline reinforce- ment learning
Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, and Wolfram Burgard. La- tent plans for task-agnostic offline reinforce- ment learning. In Conference on Robot Learn- ing, pages 1838–1849. PMLR, 2023. 3
work page 2023
-
[75]
Nim- bro avatar: Interactive immersive telepresence with force-feedback telemanipulation
Max Schwarz, Christian Lenz, Andre Rochow, Michael Schreiber, and Sven Behnke. Nim- bro avatar: Interactive immersive telepresence with force-feedback telemanipulation. In 2021 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS) , pages 5312– 5319, 2021. 3
work page 2021
-
[76]
Deep imitation learning for humanoid loco-manipulation through human teleoperation
Mingyo Seo, Steve Han, Kyutae Sim, Se- ung Hyeon Bang, Carlos Gonzalez, Luis Sentis, and Yuke Zhu. Deep imitation learning for humanoid loco-manipulation through human teleoperation. Humanoids, 2023. 3
work page 2023
-
[77]
Behavior transformers: Cloning k modes with one stone
Nur Muhammad (Mahi) Shafiullah, Zichen Jeff Cui, Ariuntuya Altanzaya, and Lerrel Pinto. Behavior transformers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022. 3 15 Mobile ALOHA: https://mobile-aloha.github.io
-
[78]
Nur Muhammad Mahi Shafiullah, Anant Rai, Haritheja Etukuru, Yiqian Liu, Ishan Misra, Soumith Chintala, and Lerrel Pinto. On bringing robots home. arXiv preprint arXiv:2311.16098, 2023. 3
-
[79]
Gnm: A general navigation model to drive any robot
Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hirose, and Sergey Levine. Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226–
work page 2023
-
[80]
Con- cept2robot: Learning manipulation concepts from instructions and human demonstrations
Lin Shao, Toki Migimatsu, Qiang Zhang, Karen Yang, and Jeannette Bohg. Con- cept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research , 40(12-14):1419–1434, 2021. 3
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.