A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

Aditya Bhat; Aimee Goncalves; Alejandro Castro; Alex Alspach; Allison Henry; Andrew Beaulieu; Aykut Onol; Basile Van Hoorick; Benjamin Burchfiel; Blake Wulfe

arxiv: 2507.05331 · v1 · pith:ZAXTO6RHnew · submitted 2025-07-07 · 💻 cs.RO

A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

TRI LBM Team , Jose Barreiros , Andrew Beaulieu , Aditya Bhat , Rick Cory , Eric Cousineau , Hongkai Dai , Ching-Hsin Fang

show 74 more authors

Kunimatsu Hashimoto Muhammad Zubair Irshad Masha Itkina Naveen Kuppuswamy Kuan-Hui Lee Katherine Liu Dale McConachie Ian McMahon Haruki Nishimura Calder Phillips-Grafflin Charles Richter Paarth Shah Krishnan Srinivasan Blake Wulfe Chen Xu Mengchao Zhang Alex Alspach Maya Angeles Kushal Arora Vitor Campagnolo Guizilini Alejandro Castro Dian Chen Ting-Sheng Chu Sam Creasey Sean Curtis Richard Denitto Emma Dixon Eric Dusel Matthew Ferreira Aimee Goncalves Grant Gould Damrong Guoy Swati Gupta Xuchen Han Kyle Hatch Brendan Hathaway Allison Henry Hillel Hochsztein Phoebe Horgan Shun Iwase Donovon Jackson Siddharth Karamcheti Sedrick Keh Joseph Masterjohn Jean Mercat Patrick Miller Paul Mitiguy Tony Nguyen Jeremy Nimmer Yuki Noguchi Reko Ong Aykut Onol Owen Pfannenstiehl Richard Poyner Leticia Priebe Mendes Rocha Gordon Richardson Christopher Rodriguez Derick Seale Michael Sherman Mariah Smith-Jones David Tago Pavel Tokmakov Matthew Tran Basile Van Hoorick Igor Vasiljevic Sergey Zakharov Mark Zolotas Rares Ambrus Kerri Fetzer-Borelli Benjamin Burchfiel Hadas Kress-Gazit Siyuan Feng Stacie Ford Russ Tedrake

This is my paper

Pith reviewed 2026-05-25 04:29 UTC · model grok-4.3

classification 💻 cs.RO

keywords large behavior modelsmultitask learningdexterous manipulationimitation learningrobot foundation modelsdiffusion policypretrainingsample efficiency

0 comments

The pith

Multi-task pretraining makes robot policies more successful, robust, and data-efficient than single-task training for dexterous manipulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates large behavior models by extending the Diffusion Policy approach over a mix of simulated and real robot data for multitask dexterous manipulation. It compares these models against single-task baselines using blind, randomized trials in controlled settings. Multi-task pretraining improves success rates and robustness while allowing new complex tasks to be taught faster with far less data. Performance rises in a predictable way as the scale and diversity of the pretraining data increase. The work supplies a validated evaluation pipeline to support these comparisons with statistical confidence.

Core claim

Multi-task pretraining on a corpus of robot data produces policies that are more successful and robust than single-task policies, allow quicker teaching of new complex tasks with a fraction of the data, and show performance that improves predictably with greater pretraining scale and diversity.

What carries the argument

An evaluation pipeline that analyzes multitask policies with statistical confidence through blind randomized trials on simulated and real-world data.

If this is right

Multi-task policies achieve higher success rates and greater robustness than single-task baselines across the evaluated tasks.
New tasks can be taught with substantially less data when starting from a multi-task pretrained model.
Performance gains continue in a predictable manner as pretraining data volume and task diversity increase.
The advantages appear in both simulation and real-world blind trials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scaling pattern may apply to other robot learning settings that currently rely on task-specific training.
Collecting larger and more varied robot datasets could accelerate progress toward general manipulation capabilities.
The results motivate experiments that combine these behavior models with language or vision inputs for further gains.
Future work could test whether the observed data-efficiency benefits persist when the new task lies far outside the pretraining distribution.

Load-bearing premise

The selected tasks and data composition give an unbiased test of general multitask benefits that would apply to other dexterous manipulation problems.

What would settle it

A follow-up study that trains and tests single-task and multi-task policies on a fresh corpus of tasks chosen without regard to the original data distribution and finds no advantage for multi-task pretraining.

read the original abstract

Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnered significant enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting both the pace of development and inhibiting a nuanced understanding of current capabilities. In this paper, we rigorously evaluate multitask robot manipulation policies, referred to as Large Behavior Models (LBMs), by extending the Diffusion Policy paradigm across a corpus of simulated and real-world robot data. We propose and validate an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compare against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We find that multi-task pretraining makes the policies more successful and robust, and enables teaching complex new tasks more quickly, using a fraction of the data when compared to single-task baselines. Moreover, performance predictably increases as pretraining scale and diversity grows. Project page: https://toyotaresearchinstitute.github.io/lbm1/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multi-task pretraining on mixed sim and real data beats single-task baselines on success, robustness, and few-shot adaptation for dexterous tasks, with predictable scaling; the blind trials are the main strength.

read the letter

The main takeaway is that multi-task pretraining on a diverse corpus of simulated and real robot data produces policies that succeed more often, handle perturbations better, and pick up new complex tasks with far less data than single-task training from scratch. Performance also rises steadily as pretraining scale and variety increase. They reach these conclusions by extending Diffusion Policy across the corpus and running blind randomized trials in both simulation and real hardware, with statistical confidence intervals on the comparisons. That setup is stronger than the usual robot learning paper, which often relies on small numbers of hand-picked runs or no randomization at all. The quantified gains in robustness and data efficiency for dexterous manipulation are the concrete new evidence here, even if the underlying multi-task versus single-task idea is familiar from other domains. The evaluation pipeline itself is a useful contribution for anyone trying to measure these models reliably. The soft spot is the task corpus and data composition. If the chosen tasks share primitives, objects, or state distributions, the multi-task model can exploit that overlap while single-task baselines cannot, which would inflate the reported advantages. The abstract gives little detail on task selection criteria, filtering steps, or explicit checks for independence, so it is hard to judge how much the results depend on the specific mix. The stress-test note correctly flags this as the least secure part of the central claim. This paper is for groups working on scaling imitation learning or robot foundation models. Anyone running experiments on policy pretraining will find the scaling observations and the evaluation method directly usable. It deserves peer review because the empirical controls are solid enough for referees to test the claims rather than just accept them at face value.

Referee Report

1 major / 2 minor

Summary. The paper extends the Diffusion Policy framework to train Large Behavior Models (LBMs) via multi-task pretraining on a corpus of simulated and real-world dexterous manipulation data. Through blind randomized trials with statistical analysis, it claims that multi-task pretraining yields higher success rates and robustness than single-task baselines, enables faster adaptation to novel tasks with substantially less data, and exhibits predictable performance gains as pretraining scale and diversity increase.

Significance. If the empirical claims hold, the work supplies statistically grounded evidence that multi-task pretraining confers concrete advantages in sample efficiency and robustness for robot manipulation policies. The use of blind randomized trials and controlled real-world experiments is a notable strength that reduces experimenter bias and supports reproducibility in the field.

major comments (1)

[§4] §4 (Evaluation Pipeline) and the task corpus description: the central claim that multi-task benefits are general requires explicit evidence that tasks are sufficiently independent (e.g., non-overlapping state distributions or skill primitives). Without reported controls or ablation on task selection criteria, it remains possible that shared structure in the corpus favors multi-task training over single-task baselines.

minor comments (2)

[Abstract] Abstract: provide one additional sentence summarizing the exact number of tasks, total demonstrations, and filtering criteria used in the pretraining corpus.
[§5] Figure captions and §5: ensure all success-rate plots include the number of trials per condition and the exact statistical test employed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. The concern about task independence is well-taken, and we address it directly below.

read point-by-point responses

Referee: [§4] §4 (Evaluation Pipeline) and the task corpus description: the central claim that multi-task benefits are general requires explicit evidence that tasks are sufficiently independent (e.g., non-overlapping state distributions or skill primitives). Without reported controls or ablation on task selection criteria, it remains possible that shared structure in the corpus favors multi-task training over single-task baselines.

Authors: We agree that stronger evidence of task independence would better support the generality claim. Our corpus comprises 20 tasks drawn from distinct sources (simulation and real-world) with varied objects, initial states, and skill primitives (e.g., in-hand reorientation, tool use, and bimanual coordination). Performance scaling with both dataset size and diversity (Figure 7) provides indirect support that gains are not solely due to overlap. Nevertheless, we will add to §4 an analysis of pairwise state-distribution distances (using maximum mean discrepancy on proprioceptive and visual features) across tasks, plus an ablation that removes the most similar task pairs and re-trains. These additions will appear in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of trained policies with no derivations or self-referential reductions.

full rationale

The paper performs direct empirical evaluation of multi-task vs. single-task policies using imitation learning on a corpus of simulated and real data, with blind randomized trials. No equations, fitted parameters, or derivations are presented that reduce reported performance gains (success rates, robustness, adaptation speed) to quantities defined by the paper's own inputs or self-citations. The central claims rest on experimental measurements rather than any self-definitional, fitted-input, or uniqueness-theorem structure. Self-citations (e.g., to Diffusion Policy) are external and not load-bearing for the comparison results. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard imitation learning assumptions and the validity of the experimental design rather than new free parameters, axioms, or invented entities.

axioms (1)

domain assumption The diffusion policy architecture can be extended to multitask pretraining while preserving its core learning properties.
The paper builds directly on extending the Diffusion Policy paradigm to LBMs without additional justification in the abstract.

pith-pipeline@v0.9.0 · 6118 in / 1220 out tokens · 43791 ms · 2026-05-25T04:29:50.491496+00:00 · methodology

discussion (0)

Forward citations

Cited by 23 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics
cs.RO 2026-04 conditional novelty 8.0

Open-H-Embodiment is the largest open multi-embodiment medical robotics dataset, used to train GR00T-H, the first open vision-language-action model that achieves end-to-end suturing completion where prior models fail.
${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
cs.LG 2026-04 unverdicted novelty 7.0

π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.
Large Video Planner Enables Generalizable Robot Control
cs.RO 2025-12 conditional novelty 7.0

A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
Instrumentation for Imitation Learning: Enhancing Training Datasets for Clothes Hanger Insertion
cs.RO 2026-05 unverdicted novelty 6.0

Instrumented objects boost diffusion policy success in robotic hanger insertion by 14-25 percentage points over vision-only baselines, and augmenting datasets with instrumented expert rollouts lets a vision-only stude...
Semantically Structured Mixture-of-Experts for Compositional Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

SMoDP routes action chunks in a diffusion policy to semantically specialized experts via a VLM-supervised skill predictor and dual contrastive alignment, achieving better efficiency and compositional transfer than baselines.
Safe and Steerable Geometric Motion Policies for Robotic Dexterous Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

SafePBDS uses pullback control barrier functions and a task manifold action interface to generate certifiably safe, steerable motions on high-DOF robots from objectives defined on arbitrary geometric spaces.
Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning
cs.RO 2026-05 unverdicted novelty 6.0

ZPRL adapts frozen flow-matching imitation policies via RL perturbations on a task-relevant bottleneck latent, yielding 33.7% higher average success on four real-world manipulation tasks than action-residual baselines.
Distributionally Robust Control via Stein Variational Inference for Contact-Rich Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

Introduces a Stein variational inference-based deterministic formulation for distributionally robust control in contact-rich robotic manipulation, reporting up to 3x improved robustness under parametric uncertainty.
From a Single Demonstration to a General Policy for Contact-Rich Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

A one-shot LfD framework abstracts a single demonstration into environmental-constraint primitives, then uses self-exploration, human corrections, and compliant recovery to produce a policy that generalizes across pos...
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
cs.LG 2026-05 unverdicted novelty 6.0

Replacing Gaussian noise with a temporally grounded prior from recent actions straightens flow-matching paths and improves success rates in robotic manipulation and prior-space RL.
Long-Horizon Manipulation via Trace-Conditioned VLA Planning
cs.RO 2026-04 unverdicted novelty 6.0

LoHo-Manip enables robust long-horizon robot manipulation by using a receding-horizon VLM manager to output progress-aware subtask sequences and 2D visual traces that condition a VLA executor for automatic replanning.
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies
cs.RO 2026-04 unverdicted novelty 6.0

Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
cs.RO 2026-03 unverdicted novelty 6.0

Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations
cs.RO 2026-03 unverdicted novelty 6.0

HoMMI learns whole-body mobile manipulation policies from robot-free human demonstrations by augmenting UMI with egocentric sensing and bridging the embodiment gap through an agnostic visual representation, relaxed he...
World Action Models are Zero-shot Policies
cs.RO 2026-02 unverdicted novelty 6.0

DreamZero uses a 14B video diffusion model as a World Action Model to achieve over 2x better zero-shot generalization on real robots than state-of-the-art VLAs, real-time 7Hz closed-loop control, and cross-embodiment ...
Learning Native Continuation for Action Chunking Flow Policies
cs.RO 2026-02 unverdicted novelty 6.0

Legato trains flow-based VLA policies with schedule-shaped action-noise mixtures and randomized conditions to achieve smoother trajectories and ~10% faster task completion than real-time chunking across five real-worl...
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
cs.RO 2025-11 unverdicted novelty 6.0

SPEAR-1 combines a 3D-enriched VLM with embodied control to match or exceed existing robotic foundation models using 20 times fewer robot demonstrations.
Video Generators are Robot Policies
cs.RO 2025-08 conditional novelty 6.0

Training models to generate videos of robot actions produces policies that generalize better to new objects and tasks while using far less demonstration data than standard behavior cloning.
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models
cs.RO 2026-04 unverdicted novelty 5.0

VLA Foundry provides a single training stack for VLA models and releases open models that match prior closed-source performance or outperform baselines on multi-task manipulation in simulation.
Causal World Modeling for Robot Control
cs.CV 2026-01 unverdicted novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.
Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input
cs.RO 2025-12 conditional novelty 5.0

A four-stage RL system with teacher-student distillation and online constrained adaptation enables humanoid robots to achieve robust ball-kicking accuracy under noisy perception in simulation and on physical hardware.
Contact-Rich Robotic Assembly in Construction via Diffusion Policy Learning
cs.RO 2025-11 unverdicted novelty 5.0

Diffusion policies achieve 100% success on nominal mortise-tenon timber assembly and 75% average success under randomized 10 mm perturbations using force/torque sensing on an industrial robot.
GR-3 Technical Report
cs.RO 2025-07 unverdicted novelty 5.0

GR-3 is a VLA model that generalizes to novel objects, environments, and abstract instructions, outperforms the π0 baseline, and integrates with the new ByteMini bi-manual mobile robot.

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages · cited by 23 Pith papers · 28 internal anchors

[1]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Journal of Robotics Research, 2024

work page 2024
[2]

Learning fine-grained bimanual manipulation with low-cost hardware,

T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” in Robotics: Science and Systems XIX . Robotics: Science and Systems Foundation, 2023. [Online]. Available: https: //roboticsproceedings.org/rss19/p078.pdf

work page 2023
[3]

Aloha unleashed: A simple recipe for robot dexterity,

T. Z. Zhao, J. Tompson, D. Driess, P. Florence, K. Ghasemipour, C. Finn, and A. Wahid, “Aloha unleashed: A simple recipe for robot dexterity,” in8th Annual Conference on Robot Learning , 2024

work page 2024
[4]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y. L. Tan, L. Y. Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine, “Octo: An open-source generalist robot policy,” 2024. [Online]. Available: https://arxiv.org/abs/2405.12213

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky, “ 𝜋0: A vision-language-action flow model for general robot control,” 2024. [Online]. Availabl...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA, :, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y. Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y. L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y. Xie, Y. Xu, Z. Xu, S. Ye, Z. Yu...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Gemini Robotics: Bringing AI into the Physical World

G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakrishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chiang, K. Choromanski, D. D’ Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, T. ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,

L. Wang, X. Chen, J. Zhao, and K. He, “Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,” Advances in neural information processing systems , vol. 37, pp. 124 420–124 450, 2024

work page 2024
[9]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu, “RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation,” Mar. 2025, arXiv:2410.07864 [cs]. [Online]. Available: http://arxiv.org/abs/2410.07864

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Segment any- thing,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Loet al., “Segment any- thing,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023, pp. 3992–4003

work page 2023
[11]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al. , “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[12]

Sigmoid loss for language image pre-training,

X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid loss for language image pre-training,” inProceedings of the IEEE/CVF international conference on computer vision , 2023, pp. 11 975– 11 986

work page 2023
[13]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,” Transactions on Machine Learning Research Journal , pp. 1–31, 2024

work page 2024
[14]

GPT-4 Technical Report

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks, M. Brundag...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar et al. , “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. 19

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Droid: A large-scale in-the-wild robot manipulation dataset,

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y. J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y. Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. Lu,...

work page 2024
[17]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid, B. B...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems

T. AgiBot-World, “AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems.”

work page
[19]

Openvla: An open-source vision-language-action model,

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “Openvla: An open-source vision-language-action model,” in 8th Annual Conference on Robot Learning

work page
[20]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y. Lu, H. Michalewski, I. Mordatch, K. Perts...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

𝜋0.5: a vision-language-action model with open-world generalization,

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y. Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. Vu...

work page
[22]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

[Online]. Available: https://arxiv.org/abs/2504.16054

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Magma: A foundation model for multimodal ai agents,

J. Yang, R. Tan, Q. Wu, R. Zheng, B. Peng, Y. Liang, Y. Gu, M. Cai, S. Ye, J. Janget al., “Magma: A foundation model for multimodal ai agents,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 203–14 214

work page 2025
[24]

A generalist agent,

S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-maron, M. Gim´enez, Y. Sulsky, J. Kay, J. T. Springenberg et al. , “A generalist agent,” Transactions on Machine Learning Research

work page
[25]

Palm-e: An embodied multimodal language model,

D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “Palm-e: An embodied multimodal language model,”

work page
[26]

PaLM-E: An Embodied Multimodal Language Model

[Online]. Available: https://arxiv.org/abs/2303.03378

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Robotic Control via Embodied Chain-of-Thought Reasoning

M. Zawalski, W. Chen, K. Pertsch, O. Mees, C. Finn, and S. Levine, “Robotic control via embodied chain-of-thought reasoning,” 2025. [Online]. Available: https://arxiv.org/abs/2407.08693

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

On the opportunities and risks of foundation models,

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Go...

work page
[29]

On the Opportunities and Risks of Foundation Models

[Online]. Available: https://arxiv.org/abs/2108.07258

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Gemini Robotics: Bringing AI into the Physical World,

G. R. Team, “Gemini Robotics: Bringing AI into the Physical World,” Tech. Rep., Mar. 2025. [Online]. Available: https://deepmind.google/discover/blog/ gemini-robotics-brings-ai-into-the-physical-world/

work page 2025
[31]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, 20 K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, K.-H. Lee, S. Levine, Y. Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “OpenVLA: An Open-Source Vision-Language- Action Model,” Sep. 2024, arXiv:2406.09246 [cs]. [Online]. Available: http://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Language models are few-shot learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 1877–1901

work page 2020
[34]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshimaet al., “The pile: An 800gb dataset of diverse text for language modeling,” arXiv preprint arXiv:2101.00027, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2020
[35]

Documenting large webtext corpora: A case study on the colossal clean crawled corpus,

J. Dodge, M. Sap, A. Marasovi ´c, W. Agnew, G. Ilharco, D. Groen- eveld, M. Mitchell, and M. Gardner, “Documenting large webtext corpora: A case study on the colossal clean crawled corpus,”arXiv preprint arXiv:2104.08758, 2021

work page arXiv 2021
[36]

Laion-5b: An open large-scale dataset for training next generation image-text models,

C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in neural information processing systems, vol. 35, pp. 25 278–25 294, 2022

work page 2022
[37]

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

C. Schuhmann, R. Vencu, R. Beaumont, R. Kaczmarczyk, C. Mullis, A. Katta, T. Coombes, J. Jitsev, and A. Komatsuzaki, “Laion-400m: Open dataset of clip-filtered 400 million image-text pairs,” arXiv preprint arXiv:2111.02114, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[38]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” in NeurIPS, 2023

work page 2023
[39]

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

F. Ebert, Y. Yang, K. Schmeckpeper, B. Bucher, G. Georgakis, K. Daniilidis, C. Finn, and S. Levine, “Bridge data: Boosting generalization of robotic skills with cross-domain datasets,” 2021. [Online]. Available: https://arxiv.org/abs/2109.13396

work page internal anchor Pith review Pith/arXiv arXiv 2021
[40]

Rh20t: A robotic dataset for learning diverse skills in one-shot,

H.-S. Fang, H. Fang, Z. Tang, J. Liu, J. Wang, H. Zhu, and C. Lu, “Rh20t: A robotic dataset for learning diverse skills in one-shot,” in RSS 2023 Workshop on Learning for Task and Motion Planning , 2023

work page 2023
[41]

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

AgiBot-World-Contributors, Q. Bu, J. Cai, L. Chen, X. Cui, Y. Ding, S. Feng, S. Gao, X. He, X. Huang, S. Jiang, Y. Jiang, C. Jing, H. Li, J. Li, C. Liu, Y. Liu, Y. Lu, J. Luo, P. Luo, Y. Mu, Y. Niu, Y. Pan, J. Pang, Y. Qiao, G. Ren, C. Ruan, J. Shan, Y. Shen, C. Shi, M. Shi, M. Shi, C. Sima, J. Song, H. Wang, W. Wang, D. Wei, C. Xie, G. Xu, J. Yan, C. Yan...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,

H. Geng, F. Wang, S. Wei, Y. Li, B. Wang, B. An, C. T. Cheng, H. Lou, P. Li, Y.-J. Wang, Y. Liang, D. Goetting, C. Xu, H. Chen, Y. Qian, Y. Geng, J. Mao, W. Wan, M. Zhang, J. Lyu, S. Zhao, J. Zhang, J. Zhang, C. Zhao, H. Lu, Y. Ding, R. Gong, Y. Wang, Y. Kuang, R. Wu, B. Jia, C. Sferrazza, H. Dong, S. Huang, K. Sreenath, Y. Wang, J. Malik, and P. Abbeel, ...

work page 2025
[43]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y. Guo, H. Mazhar et al., “Orbit: A unified simulation framework for interactive robot learning environments,” IEEE Robotics and Automation Letters , vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023
[44]

arXiv preprint arXiv:2410.00425 (2024)

S. Tao, F. Xiang, A. Shukla, Y. Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y. Liu, T. kai Chan, Y. Gao, X. Li, T. Mu, N. Xiao, A. Gurha, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai,” 2024. [Online]. Available: https://arxiv.org/abs/2410.00425

work page arXiv 2024
[45]

Robogen: Towards unleashing infinite data for automated robot learning via generative simulation.arXiv preprint arXiv:2311.01455, 2023

Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K. Fragkiadaki, Z. Erickson, D. Held, and C. Gan, “Robogen: Towards unleashing infinite data for automated robot learning via generative simula- tion,” arXiv preprint arXiv:2311.01455, 2023

work page arXiv 2023
[46]

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

S. Nasiriany, A. Maddukuri, L. Zhang, A. Parikh, A. Lo, A. Joshi, A. Mandlekar, and Y. Zhu, “Robocasa: Large-scale simulation of everyday tasks for generalist robots,”arXiv preprint arXiv:2406.02523, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[47]

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

A. Mandlekar, S. Nasiriany, B. Wen, I. Akinola, Y. Narang, L. Fan, Y. Zhu, and D. Fox, “Mimicgen: A data generation system for scalable robot learning using human demonstrations,” arXiv preprint arXiv:2310.17596, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[48]

Rlbench: The robot learning benchmark & learning environment,

S. James, Z. Ma, D. R. Arrojo, and A. J. Davison, “Rlbench: The robot learning benchmark & learning environment,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3019–3026, 2020

work page 2020
[49]

Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,

A. Wei, A. Agarwal, B. Chen, R. Bosworth, N. Pfaff, and R. Tedrake, “Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,” 2025. [Online]. Available: https://arxiv.org/abs/2503.22634

work page arXiv 2025
[50]

Sim-and-real co-training: A simple recipe for vision-based robotic manipulation,

A. Maddukuri, Z. Jiang, L. Y. Chen, S. Nasiriany, Y. Xie, Y. Fang, W. Huang, Z. Wang, Z. Xu, N. Chernyadev, S. Reed, K. Goldberg, A. Mandlekar, L. Fan, and Y. Zhu, “Sim-and-real co-training: A simple recipe for vision-based robotic manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24361

work page arXiv 2025
[51]

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,” 2024. [Online]. Available: https://arxiv.org/abs/2402.10329

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

Legato: Cross-embodiment imitation using a grasping tool,

M. Seo, H. A. Park, S. Yuan, Y. Zhu, and L. Sentis, “Legato: Cross-embodiment imitation using a grasping tool,”IEEE Robotics and Automation Letters, vol. 10, no. 3, p. 2854–2861, Mar. 2025. [Online]. Available: http://dx.doi.org/10.1109/LRA.2025.3535182

work page doi:10.1109/lra.2025.3535182 2025
[53]

Egomimic: Scaling imitation learning via egocentric video,

S. Kareer, D. Patel, R. Punamiya, P. Mathur, S. Cheng, C. Wang, J. Hoffman, and D. Xu, “Egomimic: Scaling imitation learning via egocentric video,” 2024. [Online]. Available: https://arxiv.org/abs/2410.24221

work page arXiv 2024
[54]

Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,

H. Fang, H.-S. Fang, Y. Wang, J. Ren, J. Chen, R. Zhang, W. Wang, and C. Lu, “Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,” 2024. [Online]. Available: https://arxiv.org/abs/2309.14975

work page arXiv 2024
[55]

Robot learning as an empirical science: Best practices for policy evaluation,

H. Kress-Gazit, K. Hashimoto, N. Kuppuswamy, P. Shah, P. Hor- gan, G. Richardson, S. Feng, and B. Burchfiel, “Robot learning as an empirical science: Best practices for policy evaluation,” arXiv preprint arXiv:2409.09491, 2024

work page arXiv 2024
[56]

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,

T. Mu, Z. Ling, F. Xiang, D. Yang, X. Li, S. Tao, Z. Huang, Z. Jia, and H. Su, “Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,” 2021. [Online]. Available: https://arxiv.org/abs/2107.14483

work page arXiv 2021
[57]

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,

T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” 2021. [Online]. Available: https://arxiv.org/abs/1910.10897

work page arXiv 2021
[58]

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Y. Zhu, J. Wong, A. Mandlekar, R. Mart ´ın-Mart´ın, A. Joshi, S. Nasiriany, Y. Zhu, and K. Lin, “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv preprint arXiv:2009.12293, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[59]

Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. Vainio, Z. Lian, C. Gokmen, S. Buch, C. K. Liu, S. Savarese, H. Gweon, J. Wu, and L. Fei-Fei, “Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,” 2021. [Online]. Available: https://arxiv.org/abs/2108.03332 21

work page arXiv 2021
[60]

Robothor: An open simulation-to-real embodied ai platform,

M. Deitke, W. Han, A. Herrasti, A. Kembhavi, E. Kolve, R. Mottaghi, J. Salvador, D. Schwenk, E. VanderBilt, M. Wallingford, L. Weihs, M. Yatskar, and A. Farhadi, “Robothor: An open simulation-to-real embodied ai platform,” 2020. [Online]. Available: https://arxiv.org/abs/2004.06799

work page arXiv 2020
[61]

Sim2real predictivity: Does evaluation in simulation predict real- world performance?

A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2real predictivity: Does evaluation in simulation predict real- world performance?” IEEE Robotics and Automation Letters , vol. 5, no. 4, p. 6670–6677, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1109/LRA.2020.3013848

work page doi:10.1109/lra.2020.3013848 2020
[62]

VR-Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control

J. Zhang, L. Tai, P. Yun, Y. Xiong, M. Liu, J. Boedecker, and W. Burgard, “Vr-goggles for robots: Real-to-sim domain adaptation for visual control,” 2019. [Online]. Available: https://arxiv.org/abs/1802.00265

work page internal anchor Pith review Pith/arXiv arXiv 2019
[63]

Evaluating Real-World Robot Manipulation Policies in Simulation

X. Li, K. Hsu, J. Gu, K. Pertsch, O. Mees, H. R. Walke, C. Fu, I. Lunawat, I. Sieh, S. Kirmani, S. Levine, J. Wu, C. Finn, H. Su, Q. Vuong, and T. Xiao, “Evaluating real-world robot manipulation policies in simulation,” arXiv preprint arXiv:2405.05941, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

Asid: Active exploration for system identification in robotic manipulation,

M. Memmel, A. Wagenmaker, C. Zhu, P. Yin, D. Fox, and A. Gupta, “Asid: Active exploration for system identification in robotic manipulation,” 2024. [Online]. Available: https: //arxiv.org/abs/2404.12308

work page arXiv 2024
[65]

Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,

N. Pfaff, E. Fu, J. Binagia, P. Isola, and R. Tedrake, “Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,” 2025. [Online]. Available: https: //arxiv.org/abs/2503.00370

work page arXiv 2025
[66]

Rb2: Robotic manipulation benchmarking with a twist,

S. Dasari, J. Wang, J. Hong, S. Bahl, Y. Lin, A. Wang, A. Thankaraj, K. Chahal, B. Calli, S. Gupta, D. Held, L. Pinto, D. Pathak, V. Kumar, and A. Gupta, “Rb2: Robotic manipulation benchmarking with a twist,” 2022. [Online]. Available: https://arxiv.org/abs/2203.08098

work page arXiv 2022
[67]

Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,

A. S. Morgan, K. Hang, W. G. Bircher, F. M. Alladkani, A. Gandhi, B. Calli, and A. M. Dollar, “Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 454–461, 2019

work page 2019
[68]

Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,

M. Heo, Y. Lee, D. Lee, and J. J. Lim, “Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,” The International Journal of Robotics Research , p. 02783649241304789, 2023

work page 2023
[69]

Benchmarking protocols for evaluating small parts robotic assembly systems,

K. Kimble, K. Van Wyk, J. Falco, E. Messina, Y. Sun, M. Shibata, W. Uemura, and Y. Yokokohji, “Benchmarking protocols for evaluating small parts robotic assembly systems,” IEEE robotics and automation letters , vol. 5, no. 2, pp. 883–889, 2020

work page 2020
[70]

Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,

N. Khargonkar, S. H. Allu, Y. Lu, B. Prabhakaran, Y. Xiang et al., “Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,” in 2024 IEEE International Confer- ence on Robotics and Automation (ICRA) . IEEE, 2024, pp. 8258–8264

work page 2024
[71]

Bench- marking protocol for grasp planning algorithms,

Y. Bekiroglu, N. Marturi, M. A. Roa, K. J. M. Adjigble, T. Pardi, C. Grimm, R. Balasubramanian, K. Hang, and R. Stolkin, “Bench- marking protocol for grasp planning algorithms,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 315–322, 2019

work page 2019
[72]

Graspa 1.0: Graspa is a robot arm grasping performance benchmark,

F. Bottarel, G. Vezzani, U. Pattacini, and L. Natale, “Graspa 1.0: Graspa is a robot arm grasping performance benchmark,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 836–843, 2020

work page 2020
[73]

Benchmark for bimanual robotic manipulation of semi-deformable objects,

K. Chatzilygeroudis, B. Fichera, I. Lauzana, F. Bu, K. Yao, F. Khadivar, and A. Billard, “Benchmark for bimanual robotic manipulation of semi-deformable objects,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2443–2450, 2020

work page 2020
[74]

Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,

Z. Liu, W. Liu, Y. Qin, F. Xiang, M. Gou, S. Xin, M. A. Roa, B. Calli, H. Su, Y. Sunet al., “Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 486–493, 2021

work page 2021
[75]

Real robot challenge: A robotics competition in the cloud,

S. Bauer, M. W¨ uthrich, F. Widmaier, A. Buchholz, S. Stark, A. Goyal, T. Steinbrenner, J. Akpo, S. Joshi, V. Berenz et al. , “Real robot challenge: A robotics competition in the cloud,” in NeurIPS 2021 Competitions and Demonstrations Track. PMLR, 2022, pp. 190–204

work page 2021
[76]

Train offline, test online: A real robot learning benchmark,

G. Zhou, V. Dean, M. K. Srirama, A. Rajeswaran, J. Pari, K. Hatch, A. Jain, T. Yu, P. Abbeel, L. Pintoet al., “Train offline, test online: A real robot learning benchmark,” in 2023 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2023, pp. 9197–9203

work page 2023
[77]

Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world.arXiv preprint arXiv:2503.24278, 2025

Z. Zhou, P. Atreya, Y. L. Tan, K. Pertsch, and S. Levine, “Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24278

work page arXiv 2025
[78]

Is your imitation learning policy better than mine? policy comparison with near-optimal stopping,

D. Snyder, A. J. Hancock, A. Badithela, E. Dixon, P. Miller, R. A. Ambrus, A. Majumdar, M. Itkina, and H. Nishimura, “Is your imitation learning policy better than mine? policy comparison with near-optimal stopping,”arXiv preprint arXiv:2503.10966, 2025

work page arXiv 2025
[79]

Deep reinforcement learning at the edge of the statistical precipice,

R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,” Advances in neural information processing systems, vol. 34, pp. 29 304–29 320, 2021

work page 2021
[80]

Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,

S. Greenland, S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman, “Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,” European journal of epidemiology , vol. 31, no. 4, pp. 337–350, 2016

work page 2016

Showing first 80 references.

[1] [1]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Journal of Robotics Research, 2024

work page 2024

[2] [2]

Learning fine-grained bimanual manipulation with low-cost hardware,

T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” in Robotics: Science and Systems XIX . Robotics: Science and Systems Foundation, 2023. [Online]. Available: https: //roboticsproceedings.org/rss19/p078.pdf

work page 2023

[3] [3]

Aloha unleashed: A simple recipe for robot dexterity,

T. Z. Zhao, J. Tompson, D. Driess, P. Florence, K. Ghasemipour, C. Finn, and A. Wahid, “Aloha unleashed: A simple recipe for robot dexterity,” in8th Annual Conference on Robot Learning , 2024

work page 2024

[4] [4]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y. L. Tan, L. Y. Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine, “Octo: An open-source generalist robot policy,” 2024. [Online]. Available: https://arxiv.org/abs/2405.12213

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky, “ 𝜋0: A vision-language-action flow model for general robot control,” 2024. [Online]. Availabl...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA, :, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y. Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y. L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y. Xie, Y. Xu, Z. Xu, S. Ye, Z. Yu...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Gemini Robotics: Bringing AI into the Physical World

G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakrishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chiang, K. Choromanski, D. D’ Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, T. ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,

L. Wang, X. Chen, J. Zhao, and K. He, “Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,” Advances in neural information processing systems , vol. 37, pp. 124 420–124 450, 2024

work page 2024

[9] [9]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu, “RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation,” Mar. 2025, arXiv:2410.07864 [cs]. [Online]. Available: http://arxiv.org/abs/2410.07864

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Segment any- thing,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Loet al., “Segment any- thing,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023, pp. 3992–4003

work page 2023

[11] [11]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al. , “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021

[12] [12]

Sigmoid loss for language image pre-training,

X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid loss for language image pre-training,” inProceedings of the IEEE/CVF international conference on computer vision , 2023, pp. 11 975– 11 986

work page 2023

[13] [13]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,” Transactions on Machine Learning Research Journal , pp. 1–31, 2024

work page 2024

[14] [14]

GPT-4 Technical Report

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks, M. Brundag...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar et al. , “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. 19

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Droid: A large-scale in-the-wild robot manipulation dataset,

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y. J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y. Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. Lu,...

work page 2024

[17] [17]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid, B. B...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems

T. AgiBot-World, “AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems.”

work page

[19] [19]

Openvla: An open-source vision-language-action model,

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “Openvla: An open-source vision-language-action model,” in 8th Annual Conference on Robot Learning

work page

[20] [20]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y. Lu, H. Michalewski, I. Mordatch, K. Perts...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

𝜋0.5: a vision-language-action model with open-world generalization,

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y. Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. Vu...

work page

[22] [22]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

[Online]. Available: https://arxiv.org/abs/2504.16054

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

Magma: A foundation model for multimodal ai agents,

J. Yang, R. Tan, Q. Wu, R. Zheng, B. Peng, Y. Liang, Y. Gu, M. Cai, S. Ye, J. Janget al., “Magma: A foundation model for multimodal ai agents,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 203–14 214

work page 2025

[24] [24]

A generalist agent,

S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-maron, M. Gim´enez, Y. Sulsky, J. Kay, J. T. Springenberg et al. , “A generalist agent,” Transactions on Machine Learning Research

work page

[25] [25]

Palm-e: An embodied multimodal language model,

D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “Palm-e: An embodied multimodal language model,”

work page

[26] [26]

PaLM-E: An Embodied Multimodal Language Model

[Online]. Available: https://arxiv.org/abs/2303.03378

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Robotic Control via Embodied Chain-of-Thought Reasoning

M. Zawalski, W. Chen, K. Pertsch, O. Mees, C. Finn, and S. Levine, “Robotic control via embodied chain-of-thought reasoning,” 2025. [Online]. Available: https://arxiv.org/abs/2407.08693

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

On the opportunities and risks of foundation models,

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Go...

work page

[29] [29]

On the Opportunities and Risks of Foundation Models

[Online]. Available: https://arxiv.org/abs/2108.07258

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Gemini Robotics: Bringing AI into the Physical World,

G. R. Team, “Gemini Robotics: Bringing AI into the Physical World,” Tech. Rep., Mar. 2025. [Online]. Available: https://deepmind.google/discover/blog/ gemini-robotics-brings-ai-into-the-physical-world/

work page 2025

[31] [31]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, 20 K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, K.-H. Lee, S. Levine, Y. Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “OpenVLA: An Open-Source Vision-Language- Action Model,” Sep. 2024, arXiv:2406.09246 [cs]. [Online]. Available: http://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

Language models are few-shot learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 1877–1901

work page 2020

[34] [34]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshimaet al., “The pile: An 800gb dataset of diverse text for language modeling,” arXiv preprint arXiv:2101.00027, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2020

[35] [35]

Documenting large webtext corpora: A case study on the colossal clean crawled corpus,

J. Dodge, M. Sap, A. Marasovi ´c, W. Agnew, G. Ilharco, D. Groen- eveld, M. Mitchell, and M. Gardner, “Documenting large webtext corpora: A case study on the colossal clean crawled corpus,”arXiv preprint arXiv:2104.08758, 2021

work page arXiv 2021

[36] [36]

Laion-5b: An open large-scale dataset for training next generation image-text models,

C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in neural information processing systems, vol. 35, pp. 25 278–25 294, 2022

work page 2022

[37] [37]

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

C. Schuhmann, R. Vencu, R. Beaumont, R. Kaczmarczyk, C. Mullis, A. Katta, T. Coombes, J. Jitsev, and A. Komatsuzaki, “Laion-400m: Open dataset of clip-filtered 400 million image-text pairs,” arXiv preprint arXiv:2111.02114, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[38] [38]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” in NeurIPS, 2023

work page 2023

[39] [39]

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

F. Ebert, Y. Yang, K. Schmeckpeper, B. Bucher, G. Georgakis, K. Daniilidis, C. Finn, and S. Levine, “Bridge data: Boosting generalization of robotic skills with cross-domain datasets,” 2021. [Online]. Available: https://arxiv.org/abs/2109.13396

work page internal anchor Pith review Pith/arXiv arXiv 2021

[40] [40]

Rh20t: A robotic dataset for learning diverse skills in one-shot,

H.-S. Fang, H. Fang, Z. Tang, J. Liu, J. Wang, H. Zhu, and C. Lu, “Rh20t: A robotic dataset for learning diverse skills in one-shot,” in RSS 2023 Workshop on Learning for Task and Motion Planning , 2023

work page 2023

[41] [41]

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

AgiBot-World-Contributors, Q. Bu, J. Cai, L. Chen, X. Cui, Y. Ding, S. Feng, S. Gao, X. He, X. Huang, S. Jiang, Y. Jiang, C. Jing, H. Li, J. Li, C. Liu, Y. Liu, Y. Lu, J. Luo, P. Luo, Y. Mu, Y. Niu, Y. Pan, J. Pang, Y. Qiao, G. Ren, C. Ruan, J. Shan, Y. Shen, C. Shi, M. Shi, M. Shi, C. Sima, J. Song, H. Wang, W. Wang, D. Wei, C. Xie, G. Xu, J. Yan, C. Yan...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[42] [42]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,

H. Geng, F. Wang, S. Wei, Y. Li, B. Wang, B. An, C. T. Cheng, H. Lou, P. Li, Y.-J. Wang, Y. Liang, D. Goetting, C. Xu, H. Chen, Y. Qian, Y. Geng, J. Mao, W. Wan, M. Zhang, J. Lyu, S. Zhao, J. Zhang, J. Zhang, C. Zhao, H. Lu, Y. Ding, R. Gong, Y. Wang, Y. Kuang, R. Wu, B. Jia, C. Sferrazza, H. Dong, S. Huang, K. Sreenath, Y. Wang, J. Malik, and P. Abbeel, ...

work page 2025

[43] [43]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y. Guo, H. Mazhar et al., “Orbit: A unified simulation framework for interactive robot learning environments,” IEEE Robotics and Automation Letters , vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023

[44] [44]

arXiv preprint arXiv:2410.00425 (2024)

S. Tao, F. Xiang, A. Shukla, Y. Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y. Liu, T. kai Chan, Y. Gao, X. Li, T. Mu, N. Xiao, A. Gurha, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai,” 2024. [Online]. Available: https://arxiv.org/abs/2410.00425

work page arXiv 2024

[45] [45]

Robogen: Towards unleashing infinite data for automated robot learning via generative simulation.arXiv preprint arXiv:2311.01455, 2023

Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K. Fragkiadaki, Z. Erickson, D. Held, and C. Gan, “Robogen: Towards unleashing infinite data for automated robot learning via generative simula- tion,” arXiv preprint arXiv:2311.01455, 2023

work page arXiv 2023

[46] [46]

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

S. Nasiriany, A. Maddukuri, L. Zhang, A. Parikh, A. Lo, A. Joshi, A. Mandlekar, and Y. Zhu, “Robocasa: Large-scale simulation of everyday tasks for generalist robots,”arXiv preprint arXiv:2406.02523, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[47] [47]

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

A. Mandlekar, S. Nasiriany, B. Wen, I. Akinola, Y. Narang, L. Fan, Y. Zhu, and D. Fox, “Mimicgen: A data generation system for scalable robot learning using human demonstrations,” arXiv preprint arXiv:2310.17596, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[48] [48]

Rlbench: The robot learning benchmark & learning environment,

S. James, Z. Ma, D. R. Arrojo, and A. J. Davison, “Rlbench: The robot learning benchmark & learning environment,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3019–3026, 2020

work page 2020

[49] [49]

Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,

A. Wei, A. Agarwal, B. Chen, R. Bosworth, N. Pfaff, and R. Tedrake, “Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,” 2025. [Online]. Available: https://arxiv.org/abs/2503.22634

work page arXiv 2025

[50] [50]

Sim-and-real co-training: A simple recipe for vision-based robotic manipulation,

A. Maddukuri, Z. Jiang, L. Y. Chen, S. Nasiriany, Y. Xie, Y. Fang, W. Huang, Z. Wang, Z. Xu, N. Chernyadev, S. Reed, K. Goldberg, A. Mandlekar, L. Fan, and Y. Zhu, “Sim-and-real co-training: A simple recipe for vision-based robotic manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24361

work page arXiv 2025

[51] [51]

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,” 2024. [Online]. Available: https://arxiv.org/abs/2402.10329

work page internal anchor Pith review Pith/arXiv arXiv 2024

[52] [52]

Legato: Cross-embodiment imitation using a grasping tool,

M. Seo, H. A. Park, S. Yuan, Y. Zhu, and L. Sentis, “Legato: Cross-embodiment imitation using a grasping tool,”IEEE Robotics and Automation Letters, vol. 10, no. 3, p. 2854–2861, Mar. 2025. [Online]. Available: http://dx.doi.org/10.1109/LRA.2025.3535182

work page doi:10.1109/lra.2025.3535182 2025

[53] [53]

Egomimic: Scaling imitation learning via egocentric video,

S. Kareer, D. Patel, R. Punamiya, P. Mathur, S. Cheng, C. Wang, J. Hoffman, and D. Xu, “Egomimic: Scaling imitation learning via egocentric video,” 2024. [Online]. Available: https://arxiv.org/abs/2410.24221

work page arXiv 2024

[54] [54]

Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,

H. Fang, H.-S. Fang, Y. Wang, J. Ren, J. Chen, R. Zhang, W. Wang, and C. Lu, “Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,” 2024. [Online]. Available: https://arxiv.org/abs/2309.14975

work page arXiv 2024

[55] [55]

Robot learning as an empirical science: Best practices for policy evaluation,

H. Kress-Gazit, K. Hashimoto, N. Kuppuswamy, P. Shah, P. Hor- gan, G. Richardson, S. Feng, and B. Burchfiel, “Robot learning as an empirical science: Best practices for policy evaluation,” arXiv preprint arXiv:2409.09491, 2024

work page arXiv 2024

[56] [56]

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,

T. Mu, Z. Ling, F. Xiang, D. Yang, X. Li, S. Tao, Z. Huang, Z. Jia, and H. Su, “Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,” 2021. [Online]. Available: https://arxiv.org/abs/2107.14483

work page arXiv 2021

[57] [57]

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,

T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” 2021. [Online]. Available: https://arxiv.org/abs/1910.10897

work page arXiv 2021

[58] [58]

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Y. Zhu, J. Wong, A. Mandlekar, R. Mart ´ın-Mart´ın, A. Joshi, S. Nasiriany, Y. Zhu, and K. Lin, “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv preprint arXiv:2009.12293, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009

[59] [59]

Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. Vainio, Z. Lian, C. Gokmen, S. Buch, C. K. Liu, S. Savarese, H. Gweon, J. Wu, and L. Fei-Fei, “Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,” 2021. [Online]. Available: https://arxiv.org/abs/2108.03332 21

work page arXiv 2021

[60] [60]

Robothor: An open simulation-to-real embodied ai platform,

M. Deitke, W. Han, A. Herrasti, A. Kembhavi, E. Kolve, R. Mottaghi, J. Salvador, D. Schwenk, E. VanderBilt, M. Wallingford, L. Weihs, M. Yatskar, and A. Farhadi, “Robothor: An open simulation-to-real embodied ai platform,” 2020. [Online]. Available: https://arxiv.org/abs/2004.06799

work page arXiv 2020

[61] [61]

Sim2real predictivity: Does evaluation in simulation predict real- world performance?

A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2real predictivity: Does evaluation in simulation predict real- world performance?” IEEE Robotics and Automation Letters , vol. 5, no. 4, p. 6670–6677, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1109/LRA.2020.3013848

work page doi:10.1109/lra.2020.3013848 2020

[62] [62]

VR-Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control

J. Zhang, L. Tai, P. Yun, Y. Xiong, M. Liu, J. Boedecker, and W. Burgard, “Vr-goggles for robots: Real-to-sim domain adaptation for visual control,” 2019. [Online]. Available: https://arxiv.org/abs/1802.00265

work page internal anchor Pith review Pith/arXiv arXiv 2019

[63] [63]

Evaluating Real-World Robot Manipulation Policies in Simulation

X. Li, K. Hsu, J. Gu, K. Pertsch, O. Mees, H. R. Walke, C. Fu, I. Lunawat, I. Sieh, S. Kirmani, S. Levine, J. Wu, C. Finn, H. Su, Q. Vuong, and T. Xiao, “Evaluating real-world robot manipulation policies in simulation,” arXiv preprint arXiv:2405.05941, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[64] [64]

Asid: Active exploration for system identification in robotic manipulation,

M. Memmel, A. Wagenmaker, C. Zhu, P. Yin, D. Fox, and A. Gupta, “Asid: Active exploration for system identification in robotic manipulation,” 2024. [Online]. Available: https: //arxiv.org/abs/2404.12308

work page arXiv 2024

[65] [65]

Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,

N. Pfaff, E. Fu, J. Binagia, P. Isola, and R. Tedrake, “Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,” 2025. [Online]. Available: https: //arxiv.org/abs/2503.00370

work page arXiv 2025

[66] [66]

Rb2: Robotic manipulation benchmarking with a twist,

S. Dasari, J. Wang, J. Hong, S. Bahl, Y. Lin, A. Wang, A. Thankaraj, K. Chahal, B. Calli, S. Gupta, D. Held, L. Pinto, D. Pathak, V. Kumar, and A. Gupta, “Rb2: Robotic manipulation benchmarking with a twist,” 2022. [Online]. Available: https://arxiv.org/abs/2203.08098

work page arXiv 2022

[67] [67]

Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,

A. S. Morgan, K. Hang, W. G. Bircher, F. M. Alladkani, A. Gandhi, B. Calli, and A. M. Dollar, “Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 454–461, 2019

work page 2019

[68] [68]

Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,

M. Heo, Y. Lee, D. Lee, and J. J. Lim, “Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,” The International Journal of Robotics Research , p. 02783649241304789, 2023

work page 2023

[69] [69]

Benchmarking protocols for evaluating small parts robotic assembly systems,

K. Kimble, K. Van Wyk, J. Falco, E. Messina, Y. Sun, M. Shibata, W. Uemura, and Y. Yokokohji, “Benchmarking protocols for evaluating small parts robotic assembly systems,” IEEE robotics and automation letters , vol. 5, no. 2, pp. 883–889, 2020

work page 2020

[70] [70]

Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,

N. Khargonkar, S. H. Allu, Y. Lu, B. Prabhakaran, Y. Xiang et al., “Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,” in 2024 IEEE International Confer- ence on Robotics and Automation (ICRA) . IEEE, 2024, pp. 8258–8264

work page 2024

[71] [71]

Bench- marking protocol for grasp planning algorithms,

Y. Bekiroglu, N. Marturi, M. A. Roa, K. J. M. Adjigble, T. Pardi, C. Grimm, R. Balasubramanian, K. Hang, and R. Stolkin, “Bench- marking protocol for grasp planning algorithms,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 315–322, 2019

work page 2019

[72] [72]

Graspa 1.0: Graspa is a robot arm grasping performance benchmark,

F. Bottarel, G. Vezzani, U. Pattacini, and L. Natale, “Graspa 1.0: Graspa is a robot arm grasping performance benchmark,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 836–843, 2020

work page 2020

[73] [73]

Benchmark for bimanual robotic manipulation of semi-deformable objects,

K. Chatzilygeroudis, B. Fichera, I. Lauzana, F. Bu, K. Yao, F. Khadivar, and A. Billard, “Benchmark for bimanual robotic manipulation of semi-deformable objects,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2443–2450, 2020

work page 2020

[74] [74]

Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,

Z. Liu, W. Liu, Y. Qin, F. Xiang, M. Gou, S. Xin, M. A. Roa, B. Calli, H. Su, Y. Sunet al., “Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 486–493, 2021

work page 2021

[75] [75]

Real robot challenge: A robotics competition in the cloud,

S. Bauer, M. W¨ uthrich, F. Widmaier, A. Buchholz, S. Stark, A. Goyal, T. Steinbrenner, J. Akpo, S. Joshi, V. Berenz et al. , “Real robot challenge: A robotics competition in the cloud,” in NeurIPS 2021 Competitions and Demonstrations Track. PMLR, 2022, pp. 190–204

work page 2021

[76] [76]

Train offline, test online: A real robot learning benchmark,

G. Zhou, V. Dean, M. K. Srirama, A. Rajeswaran, J. Pari, K. Hatch, A. Jain, T. Yu, P. Abbeel, L. Pintoet al., “Train offline, test online: A real robot learning benchmark,” in 2023 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2023, pp. 9197–9203

work page 2023

[77] [77]

Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world.arXiv preprint arXiv:2503.24278, 2025

Z. Zhou, P. Atreya, Y. L. Tan, K. Pertsch, and S. Levine, “Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24278

work page arXiv 2025

[78] [78]

Is your imitation learning policy better than mine? policy comparison with near-optimal stopping,

D. Snyder, A. J. Hancock, A. Badithela, E. Dixon, P. Miller, R. A. Ambrus, A. Majumdar, M. Itkina, and H. Nishimura, “Is your imitation learning policy better than mine? policy comparison with near-optimal stopping,”arXiv preprint arXiv:2503.10966, 2025

work page arXiv 2025

[79] [79]

Deep reinforcement learning at the edge of the statistical precipice,

R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,” Advances in neural information processing systems, vol. 34, pp. 29 304–29 320, 2021

work page 2021

[80] [80]

Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,

S. Greenland, S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman, “Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,” European journal of epidemiology , vol. 31, no. 4, pp. 337–350, 2016

work page 2016