A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

Aditya Bhat; Aimee Goncalves; Alejandro Castro; Alex Alspach; Allison Henry; Andrew Beaulieu; Aykut Onol; Basile Van Hoorick; Benjamin Burchfiel; Blake Wulfe

REVIEW 1 major objections 2 minor 59 cited by

Multi-task pretraining makes robot policies more successful, robust, and data-efficient than single-task training for dexterous manipulation.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-25 04:29 UTC pith:ZAXTO6RH

load-bearing objection Multi-task pretraining on mixed sim and real data beats single-task baselines on success, robustness, and few-shot adaptation for dexterous tasks, with predictable scaling; the blind trials are the main strength. the 1 major comments →

arxiv 2507.05331 v1 pith:ZAXTO6RH submitted 2025-07-07 cs.RO

A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

TRI LBM Team , Jose Barreiros , Andrew Beaulieu , Aditya Bhat , Rick Cory , Eric Cousineau , Hongkai Dai , Ching-Hsin Fang

show 74 more authors

Kunimatsu Hashimoto Muhammad Zubair Irshad Masha Itkina Naveen Kuppuswamy Kuan-Hui Lee Katherine Liu Dale McConachie Ian McMahon Haruki Nishimura Calder Phillips-Grafflin Charles Richter Paarth Shah Krishnan Srinivasan Blake Wulfe Chen Xu Mengchao Zhang Alex Alspach Maya Angeles Kushal Arora Vitor Campagnolo Guizilini Alejandro Castro Dian Chen Ting-Sheng Chu Sam Creasey Sean Curtis Richard Denitto Emma Dixon Eric Dusel Matthew Ferreira Aimee Goncalves Grant Gould Damrong Guoy Swati Gupta Xuchen Han Kyle Hatch Brendan Hathaway Allison Henry Hillel Hochsztein Phoebe Horgan Shun Iwase Donovon Jackson Siddharth Karamcheti Sedrick Keh Joseph Masterjohn Jean Mercat Patrick Miller Paul Mitiguy Tony Nguyen Jeremy Nimmer Yuki Noguchi Reko Ong Aykut Onol Owen Pfannenstiehl Richard Poyner Leticia Priebe Mendes Rocha Gordon Richardson Christopher Rodriguez Derick Seale Michael Sherman Mariah Smith-Jones David Tago Pavel Tokmakov Matthew Tran Basile Van Hoorick Igor Vasiljevic Sergey Zakharov Mark Zolotas Rares Ambrus Kerri Fetzer-Borelli Benjamin Burchfiel Hadas Kress-Gazit Siyuan Feng Stacie Ford Russ Tedrake

This is my paper

classification cs.RO

keywords large behavior modelsmultitask learningdexterous manipulationimitation learningrobot foundation modelsdiffusion policypretrainingsample efficiency

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates large behavior models by extending the Diffusion Policy approach over a mix of simulated and real robot data for multitask dexterous manipulation. It compares these models against single-task baselines using blind, randomized trials in controlled settings. Multi-task pretraining improves success rates and robustness while allowing new complex tasks to be taught faster with far less data. Performance rises in a predictable way as the scale and diversity of the pretraining data increase. The work supplies a validated evaluation pipeline to support these comparisons with statistical confidence.

Core claim

Multi-task pretraining on a corpus of robot data produces policies that are more successful and robust than single-task policies, allow quicker teaching of new complex tasks with a fraction of the data, and show performance that improves predictably with greater pretraining scale and diversity.

What carries the argument

An evaluation pipeline that analyzes multitask policies with statistical confidence through blind randomized trials on simulated and real-world data.

Load-bearing premise

The selected tasks and data composition give an unbiased test of general multitask benefits that would apply to other dexterous manipulation problems.

What would settle it

A follow-up study that trains and tests single-task and multi-task policies on a fresh corpus of tasks chosen without regard to the original data distribution and finds no advantage for multi-task pretraining.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Multi-task policies achieve higher success rates and greater robustness than single-task baselines across the evaluated tasks.
New tasks can be taught with substantially less data when starting from a multi-task pretrained model.
Performance gains continue in a predictable manner as pretraining data volume and task diversity increase.
The advantages appear in both simulation and real-world blind trials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scaling pattern may apply to other robot learning settings that currently rely on task-specific training.
Collecting larger and more varied robot datasets could accelerate progress toward general manipulation capabilities.
The results motivate experiments that combine these behavior models with language or vision inputs for further gains.
Future work could test whether the observed data-efficiency benefits persist when the new task lies far outside the pretraining distribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

Multi-task pretraining on mixed sim and real data beats single-task baselines on success, robustness, and few-shot adaptation for dexterous tasks, with predictable scaling; the blind trials are the main strength.

read the letter

The main takeaway is that multi-task pretraining on a diverse corpus of simulated and real robot data produces policies that succeed more often, handle perturbations better, and pick up new complex tasks with far less data than single-task training from scratch. Performance also rises steadily as pretraining scale and variety increase. They reach these conclusions by extending Diffusion Policy across the corpus and running blind randomized trials in both simulation and real hardware, with statistical confidence intervals on the comparisons. That setup is stronger than the usual robot learning paper, which often relies on small numbers of hand-picked runs or no randomization at all. The quantified gains in robustness and data efficiency for dexterous manipulation are the concrete new evidence here, even if the underlying multi-task versus single-task idea is familiar from other domains. The evaluation pipeline itself is a useful contribution for anyone trying to measure these models reliably. The soft spot is the task corpus and data composition. If the chosen tasks share primitives, objects, or state distributions, the multi-task model can exploit that overlap while single-task baselines cannot, which would inflate the reported advantages. The abstract gives little detail on task selection criteria, filtering steps, or explicit checks for independence, so it is hard to judge how much the results depend on the specific mix. The stress-test note correctly flags this as the least secure part of the central claim. This paper is for groups working on scaling imitation learning or robot foundation models. Anyone running experiments on policy pretraining will find the scaling observations and the evaluation method directly usable. It deserves peer review because the empirical controls are solid enough for referees to test the claims rather than just accept them at face value.

Referee Report

1 major / 2 minor

Summary. The paper extends the Diffusion Policy framework to train Large Behavior Models (LBMs) via multi-task pretraining on a corpus of simulated and real-world dexterous manipulation data. Through blind randomized trials with statistical analysis, it claims that multi-task pretraining yields higher success rates and robustness than single-task baselines, enables faster adaptation to novel tasks with substantially less data, and exhibits predictable performance gains as pretraining scale and diversity increase.

Significance. If the empirical claims hold, the work supplies statistically grounded evidence that multi-task pretraining confers concrete advantages in sample efficiency and robustness for robot manipulation policies. The use of blind randomized trials and controlled real-world experiments is a notable strength that reduces experimenter bias and supports reproducibility in the field.

major comments (1)

[§4] §4 (Evaluation Pipeline) and the task corpus description: the central claim that multi-task benefits are general requires explicit evidence that tasks are sufficiently independent (e.g., non-overlapping state distributions or skill primitives). Without reported controls or ablation on task selection criteria, it remains possible that shared structure in the corpus favors multi-task training over single-task baselines.

minor comments (2)

[Abstract] Abstract: provide one additional sentence summarizing the exact number of tasks, total demonstrations, and filtering criteria used in the pretraining corpus.
[§5] Figure captions and §5: ensure all success-rate plots include the number of trials per condition and the exact statistical test employed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. The concern about task independence is well-taken, and we address it directly below.

read point-by-point responses

Referee: [§4] §4 (Evaluation Pipeline) and the task corpus description: the central claim that multi-task benefits are general requires explicit evidence that tasks are sufficiently independent (e.g., non-overlapping state distributions or skill primitives). Without reported controls or ablation on task selection criteria, it remains possible that shared structure in the corpus favors multi-task training over single-task baselines.

Authors: We agree that stronger evidence of task independence would better support the generality claim. Our corpus comprises 20 tasks drawn from distinct sources (simulation and real-world) with varied objects, initial states, and skill primitives (e.g., in-hand reorientation, tool use, and bimanual coordination). Performance scaling with both dataset size and diversity (Figure 7) provides indirect support that gains are not solely due to overlap. Nevertheless, we will add to §4 an analysis of pairwise state-distribution distances (using maximum mean discrepancy on proprioceptive and visual features) across tasks, plus an ablation that removes the most similar task pairs and re-trains. These additions will appear in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of trained policies with no derivations or self-referential reductions.

full rationale

The paper performs direct empirical evaluation of multi-task vs. single-task policies using imitation learning on a corpus of simulated and real data, with blind randomized trials. No equations, fitted parameters, or derivations are presented that reduce reported performance gains (success rates, robustness, adaptation speed) to quantities defined by the paper's own inputs or self-citations. The central claims rest on experimental measurements rather than any self-definitional, fitted-input, or uniqueness-theorem structure. Self-citations (e.g., to Diffusion Policy) are external and not load-bearing for the comparison results. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard imitation learning assumptions and the validity of the experimental design rather than new free parameters, axioms, or invented entities.

axioms (1)

domain assumption The diffusion policy architecture can be extended to multitask pretraining while preserving its core learning properties.
The paper builds directly on extending the Diffusion Policy paradigm to LBMs without additional justification in the abstract.

pith-pipeline@v0.9.0 · 6118 in / 1220 out tokens · 43791 ms · 2026-05-25T04:29:50.491496+00:00 · methodology

0 comments

read the original abstract

Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnered significant enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting both the pace of development and inhibiting a nuanced understanding of current capabilities. In this paper, we rigorously evaluate multitask robot manipulation policies, referred to as Large Behavior Models (LBMs), by extending the Diffusion Policy paradigm across a corpus of simulated and real-world robot data. We propose and validate an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compare against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We find that multi-task pretraining makes the policies more successful and robust, and enables teaching complex new tasks more quickly, using a fraction of the data when compared to single-task baselines. Moreover, performance predictably increases as pretraining scale and diversity grows. Project page: https://toyotaresearchinstitute.github.io/lbm1/

discussion (0)

Forward citations

Cited by 59 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics
cs.RO 2026-04 conditional novelty 8.0

Open-H-Embodiment is the largest open multi-embodiment medical robotics dataset, used to train GR00T-H, the first open vision-language-action model that achieves end-to-end suturing completion where prior models fail.
Latent Memory Palace: Reasoning for Control as Autoregressive Variational Inference
cs.LG 2026-07 conditional novelty 7.0

Variable-length autoregressive latent sequences, trained as variational inference with a PPO-style objective, give robot policies adaptive test-time compute and yield a reusable action tokenizer.
Adapting Generalist Robot Policies with Semantic Reinforcement Learning
cs.RO 2026-06 unverdicted novelty 7.0

SARL optimizes language prompt inputs to generalist vision-language-action policies through online RL to solve complex long-horizon tasks by composing existing skills.
One Body, Two Minds: Variable Autonomy Approach for a Co-embodied Robotic Hand
cs.RO 2026-06 unverdicted novelty 7.0

A co-embodied wearable robotic hand uses variable autonomy with a visuomotor diffusion policy for grasping and head gestures for actuation, yielding 23.3% faster task times and 93.6% success in a 44-person study.
Improving Robotic Generalist Policies via Flow Reversal Steering
cs.RO 2026-06 unverdicted novelty 7.0

Flow Reversal Steering steers flow matching generalist policies by reversing suboptimal actions to nearby better modes, enabling improved zero-shot control, quick distillation, and RL bootstrapping in robotic manipulation.
Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics
cs.RO 2026-06 unverdicted novelty 7.0

Ambient Diffusion Policy enables better imitation learning from suboptimal robot data by leveraging spectral properties to restrict data usage to specific diffusion times.
Dynamic Execution Horizon Prediction for Chunk-based Robot Policies
cs.RO 2026-06 unverdicted novelty 7.0

DEHP adds an online-RL horizon predictor to frozen chunk policies, yielding higher success on precise and long-horizon robot manipulation by adapting chunk length to task stage.
PhAIL: A Real-Robot VLA Benchmark and Distributional Methodology
cs.RO 2026-05 unverdicted novelty 7.0

PhAIL provides an open benchmark and distributional evaluation method for real-robot VLA policies using time-to-success CDF, HRT scoring, and KS significance tests.
Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics
cs.RO 2026-04 unverdicted novelty 7.0

A consortium released the largest open medical robotics dataset spanning 50+ institutions and used it to train an open VLA model achieving 25% full suturing completion and a multi-embodiment surgical world model.
${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
cs.LG 2026-04 unverdicted novelty 7.0

π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.
Large Video Planner Enables Generalizable Robot Control
cs.RO 2025-12 conditional novelty 7.0

A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
Tactile and Vision Conditioned Contact-Centric Control for Whole-Arm Manipulation
cs.RO 2026-07 conditional novelty 6.0

A hybrid contact-centric MPC controller with tactile-vision state, Jacobian-biased sampling, and learned-plus-analytical rollouts improves multi-contact whole-arm task success and force regulation over strong baselines.
FlowDAgger: Human-in-the-Loop Adaptation of Generative Robot Policies in Latent Space
cs.RO 2026-07 conditional novelty 6.0

Human corrective actions can be inverted into noise-space targets that train a lightweight latent policy to steer frozen flow/diffusion robot models from a handful of interventions while preserving pretrained skills.
Structured 4D Latent Predictive Model for Robot Planning
cs.RO 2026-07 unverdicted novelty 6.0

A 4D latent predictive model encodes scenes holistically to generate 3D-consistent futures that an inverse dynamics module converts into robot actions, outperforming video-based planners on manipulation tasks.
GROW$^2$: Grounding Which and Where for Robot Tool Use
cs.RO 2026-06 unverdicted novelty 6.0

GROW² hierarchically grounds open-world tool affordances by using VLMs for semantic selection of objects and parts followed by geometric localization with vision foundation models.
SimFoundry: Modular and Automated Scene Generation for Policy Learning and Evaluation
cs.RO 2026-06 conditional novelty 6.0

An automated real-to-sim pipeline builds digital twins and affordance-preserving cousins from video, yielding sim evaluations that correlate with real robot policy success and zero-shot sim-to-real gains.
SimFoundry: Modular and Automated Scene Generation for Policy Learning and Evaluation
cs.RO 2026-06 unverdicted novelty 6.0

SimFoundry automates zero-shot real-to-sim scene generation from video, producing digital twins and cousins that enable policy training with 0.911 mean Pearson correlation to real-world results and 17-40% success gain...
Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots
cs.RO 2026-06 unverdicted novelty 6.0

A relative wrist translation bridging action with a vision-language-action model using interleaved tokens and attention masking transfers human manipulation skills to robots more effectively than 6DoF actions.
Scalable Behavior Cloning with Open Data, Training, and Evaluation
cs.RO 2026-06 unverdicted novelty 6.0

Releases the largest open teleoperation dataset for robot manipulation together with hardware, simulation, and training infrastructure to support scalable behavior cloning.
Robot Critics that Sweat the Small Stuff
cs.RO 2026-06 unverdicted novelty 6.0

Fine-tuning VLMs with pairwise progress supervision from policy rollouts improves fine-grained failure detection and boosts robot manipulation success by 11% real-world and 5.9% in simulation.
Training and Evaluating Diffusion Policies with Long Context Lengths
cs.RO 2026-06 conditional novelty 6.0

Naive long-context Diffusion Policies succeed with UNet+Cross-Attention and sufficient data; variable-history training cuts sample complexity in the low-data regime.
Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands
cs.RO 2026-06 unverdicted novelty 6.0

A cross-embodiment force-position interface with system-identified torque calibration enables a flow-matching policy to perform transferable compliant grasping on heterogeneous dexterous hands.
GHOST: Hierarchical Sub-Goal Policies for Generalizing Robot Manipulation
cs.RO 2026-06 unverdicted novelty 6.0

GHOST improves generalization in robot manipulation via hierarchical factorization into 3D sub-goal prediction from RGB-D views and a goal-conditioned low-level controller, enabling human video integration without act...
What Matters When Cotraining Robot Manipulation Policies on Everyday Human Videos?
cs.RO 2026-06 unverdicted novelty 6.0

Cotraining on 532 everyday human videos with accurate hand labels improves robot policies by 29.7% when networks specialize to human versus robot embodiments.
MoDex: A Diffusion Policy for Sequential Multi-Object Dexterous Grasping
cs.RO 2026-06 unverdicted novelty 6.0

MoDex is a diffusion policy conditioned on opposition space and point cloud, trained first by imitation learning then RL fine-tuning, that reports higher success rates than baselines for sequential multi-object dexter...
Closed-Loop Neural Activation Control in Vision-Language-Action Models
cs.AI 2026-05 unverdicted novelty 6.0

CTRL-STEER applies PID or RL-based feedback control to adaptively steer motion-aligned residual directions in VLA models, yielding more stable regulation and better task success on LIBERO benchmarks than fixed steering.
Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring
cs.RO 2026-05 unverdicted novelty 6.0

Hide-and-Seek uses contrastive objectives on trajectories to localize failure signals in VLA models from trajectory-level supervision alone.
Instrumentation for Imitation Learning: Enhancing Training Datasets for Clothes Hanger Insertion
cs.RO 2026-05 unverdicted novelty 6.0

Instrumented objects boost diffusion policy success in robotic hanger insertion by 14-25 percentage points over vision-only baselines, and augmenting datasets with instrumented expert rollouts lets a vision-only stude...
Semantically Structured Mixture-of-Experts for Compositional Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

SMoDP routes action chunks in a diffusion policy to semantically specialized experts via a VLM-supervised skill predictor and dual contrastive alignment, achieving better efficiency and compositional transfer than baselines.
Safe and Steerable Geometric Motion Policies for Robotic Dexterous Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

SafePBDS uses pullback control barrier functions and a task manifold action interface to generate certifiably safe, steerable motions on high-DOF robots from objectives defined on arbitrary geometric spaces.
Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning
cs.RO 2026-05 unverdicted novelty 6.0

ZPRL adapts frozen flow-matching imitation policies via RL perturbations on a task-relevant bottleneck latent, yielding 33.7% higher average success on four real-world manipulation tasks than action-residual baselines.
Distributionally Robust Control via Stein Variational Inference for Contact-Rich Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

Introduces a Stein variational inference-based deterministic formulation for distributionally robust control in contact-rich robotic manipulation, reporting up to 3x improved robustness under parametric uncertainty.
From a Single Demonstration to a General Policy for Contact-Rich Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

A one-shot LfD framework abstracts a single demonstration into environmental-constraint primitives, then uses self-exploration, human corrections, and compliant recovery to produce a policy that generalizes across pos...
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
cs.LG 2026-05 unverdicted novelty 6.0

Replacing Gaussian noise with a temporally grounded prior from recent actions straightens flow-matching paths and improves success rates in robotic manipulation and prior-space RL.
Long-Horizon Manipulation via Trace-Conditioned VLA Planning
cs.RO 2026-04 unverdicted novelty 6.0

LoHo-Manip enables robust long-horizon robot manipulation by using a receding-horizon VLM manager to output progress-aware subtask sequences and 2D visual traces that condition a VLA executor for automatic replanning.
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies
cs.RO 2026-04 unverdicted novelty 6.0

Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
cs.RO 2026-03 unverdicted novelty 6.0

Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations
cs.RO 2026-03 unverdicted novelty 6.0

HoMMI learns whole-body mobile manipulation policies from robot-free human demonstrations by augmenting UMI with egocentric sensing and bridging the embodiment gap through an agnostic visual representation, relaxed he...
World Action Models are Zero-shot Policies
cs.RO 2026-02 unverdicted novelty 6.0

DreamZero uses a 14B video diffusion model as a World Action Model to achieve over 2x better zero-shot generalization on real robots than state-of-the-art VLAs, real-time 7Hz closed-loop control, and cross-embodiment ...
Learning Native Continuation for Action Chunking Flow Policies
cs.RO 2026-02 unverdicted novelty 6.0

Legato trains flow-based VLA policies with schedule-shaped action-noise mixtures and randomized conditions to achieve smoother trajectories and ~10% faster task completion than real-time chunking across five real-worl...
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
cs.RO 2025-11 unverdicted novelty 6.0

SPEAR-1 combines a 3D-enriched VLM with embodied control to match or exceed existing robotic foundation models using 20 times fewer robot demonstrations.
Video Generators are Robot Policies
cs.RO 2025-08 conditional novelty 6.0

Training models to generate videos of robot actions produces policies that generalize better to new objects and tasks while using far less demonstration data than standard behavior cloning.
Native Video-Action Pretraining for Generalizable Robot Control
cs.RO 2026-07 conditional novelty 5.0

A video-action foundation model pretrained natively for embodiment achieves few-shot generalization and 225 Hz real-time closed-loop robot control.
Dynamic Execution Horizon Prediction for Chunk-based Robot Policies
cs.RO 2026-06 unverdicted novelty 5.0

A frozen chunk-based robot policy plus a lightweight RL-trained horizon predictor raises success on high-precision and long-horizon manipulation by adapting open-loop length on the fly.
CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
cs.RO 2026-06 unverdicted novelty 5.0

CLASP combines TP-KMPs with VLMs for language-guided skill selection, covariance-weighted composition, and active learning requests, reporting 73.3-100% success on a 7-DoF manipulator.
ConTrack: Constrained Hand Motion Tracking with Adaptive Trade-off Control
cs.RO 2026-06 unverdicted novelty 5.0

ConTrack introduces a constrained RL method with online dual-variable adaptation and adaptive resets for improved long-horizon hand tracking in simulation and on real robots.
HumanoidMimicGen: Data Generation for Loco-Manipulation via Whole-Body Planning
cs.RO 2026-05 unverdicted novelty 5.0

HumanoidMimicGen automatically generates large loco-manipulation datasets from few source demonstrations using whole-body planning, enabling visuomotor policies that outperform real-data-only training by 20% on a new ...
On the Generalization Capabilities, Design Choices and Limitations of Keypoint Imitation Learning
cs.RO 2026-05 conditional novelty 5.0

KIL using foundation model keypoints reaches 75% success on five manipulation tasks, beating RGB (47%) but matching S2-diffusion (73%), with generalization tests on unseen objects via over 2000 real-world rollouts.
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models
cs.RO 2026-04 unverdicted novelty 5.0

VLA Foundry provides a single training stack for VLA models and releases open models that match prior closed-source performance or outperform baselines on multi-task manipulation in simulation.
Causal World Modeling for Robot Control
cs.CV 2026-01 unverdicted novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.
Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input
cs.RO 2025-12 conditional novelty 5.0

A four-stage RL system with teacher-student distillation and online constrained adaptation enables humanoid robots to achieve robust ball-kicking accuracy under noisy perception in simulation and on physical hardware.
Contact-Rich Robotic Assembly in Construction via Diffusion Policy Learning
cs.RO 2025-11 unverdicted novelty 5.0

Diffusion policies achieve 100% success on nominal mortise-tenon timber assembly and 75% average success under randomized 10 mm perturbations using force/torque sensing on an industrial robot.
GR-3 Technical Report
cs.RO 2025-07 unverdicted novelty 5.0

GR-3 is a VLA model that generalizes to novel objects, environments, and abstract instructions, outperforms the π0 baseline, and integrates with the new ByteMini bi-manual mobile robot.
Human2Any: Human-to-Robot Transfer via Constraint-Aware Compositional Planning
cs.RO 2026-06 unverdicted novelty 4.0

Human2Any transfers human video demonstrations to robots by representing tasks as object-object interactions and composing learned priors with robot-side planning.
A Practical Recipe Towards Improving Sim-and-Real Correlation for VLA Evaluation
cs.RO 2026-06 unverdicted novelty 4.0

Authors perform a cross-simulator, cross-policy empirical study of sim-to-real correlation for VLA policies and distill guidance on using simulation for policy improvement.
Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards
cs.LG 2026-06 unverdicted novelty 4.0

Coherent IRL learns dense rewards from demos to enable sample-efficient off-policy improvement of large behavior-cloned policies on sparse robotic manipulation tasks.
Learning to Fold: prizewinning solution at LeHome Challenge 2026 (1st place online, 2nd offline)
cs.RO 2026-06 unverdicted novelty 3.0

A competition entry for bimanual garment folding won 1st in simulation and 2nd in reality by making a VLA policy predict its own value quantities to drive advantage estimation, failure detection, and action selection.
Towards Shared Embodied Intelligence in Humanoid Robots through Optimization Development and Testing of the Human Aware ergoCub Robot
cs.RO 2026-05 unverdicted novelty 3.0

An architecture for humanoid robots that optimizes hardware and physical intelligence parameters with respect to human ergonomic metrics, demonstrated via the ergoCub robot.
Understanding the Impact of Geometric Foundation Models on Vision-Language-Action Models
cs.CV 2026-05 unverdicted novelty 3.0

The paper quantifies the geometric gap in current VLAs via linear probing and compares three architectures for injecting geometry from GFMs while analyzing impacts of data, cameras, and reconstruction quality.

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages · cited by 56 Pith papers · 28 internal anchors

[1]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Journal of Robotics Research, 2024

work page 2024
[2]

Learning fine-grained bimanual manipulation with low-cost hardware,

T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” in Robotics: Science and Systems XIX . Robotics: Science and Systems Foundation, 2023. [Online]. Available: https: //roboticsproceedings.org/rss19/p078.pdf

work page 2023
[3]

Aloha unleashed: A simple recipe for robot dexterity,

T. Z. Zhao, J. Tompson, D. Driess, P. Florence, K. Ghasemipour, C. Finn, and A. Wahid, “Aloha unleashed: A simple recipe for robot dexterity,” in8th Annual Conference on Robot Learning , 2024

work page 2024
[4]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y. L. Tan, L. Y. Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine, “Octo: An open-source generalist robot policy,” 2024. [Online]. Available: https://arxiv.org/abs/2405.12213

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky, “ 𝜋0: A vision-language-action flow model for general robot control,” 2024. [Online]. Availabl...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA, :, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y. Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y. L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y. Xie, Y. Xu, Z. Xu, S. Ye, Z. Yu...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Gemini Robotics: Bringing AI into the Physical World

G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakrishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chiang, K. Choromanski, D. D’ Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, T. ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,

L. Wang, X. Chen, J. Zhao, and K. He, “Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,” Advances in neural information processing systems , vol. 37, pp. 124 420–124 450, 2024

work page 2024
[9]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu, “RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation,” Mar. 2025, arXiv:2410.07864 [cs]. [Online]. Available: http://arxiv.org/abs/2410.07864

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Segment any- thing,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Loet al., “Segment any- thing,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023, pp. 3992–4003

work page 2023
[11]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al. , “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[12]

Sigmoid loss for language image pre-training,

X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid loss for language image pre-training,” inProceedings of the IEEE/CVF international conference on computer vision , 2023, pp. 11 975– 11 986

work page 2023
[13]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,” Transactions on Machine Learning Research Journal , pp. 1–31, 2024

work page 2024
[14]

GPT-4 Technical Report

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks, M. Brundag...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar et al. , “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. 19

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Droid: A large-scale in-the-wild robot manipulation dataset,

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y. J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y. Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. Lu,...

work page 2024
[17]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid, B. B...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems

T. AgiBot-World, “AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems.”

work page
[19]

Openvla: An open-source vision-language-action model,

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “Openvla: An open-source vision-language-action model,” in 8th Annual Conference on Robot Learning

work page
[20]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y. Lu, H. Michalewski, I. Mordatch, K. Perts...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

𝜋0.5: a vision-language-action model with open-world generalization,

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y. Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. Vu...

work page
[22]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

[Online]. Available: https://arxiv.org/abs/2504.16054

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Magma: A foundation model for multimodal ai agents,

J. Yang, R. Tan, Q. Wu, R. Zheng, B. Peng, Y. Liang, Y. Gu, M. Cai, S. Ye, J. Janget al., “Magma: A foundation model for multimodal ai agents,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 203–14 214

work page 2025
[24]

A generalist agent,

S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-maron, M. Gim´enez, Y. Sulsky, J. Kay, J. T. Springenberg et al. , “A generalist agent,” Transactions on Machine Learning Research

work page
[25]

Palm-e: An embodied multimodal language model,

D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “Palm-e: An embodied multimodal language model,”

work page
[26]

PaLM-E: An Embodied Multimodal Language Model

[Online]. Available: https://arxiv.org/abs/2303.03378

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Robotic Control via Embodied Chain-of-Thought Reasoning

M. Zawalski, W. Chen, K. Pertsch, O. Mees, C. Finn, and S. Levine, “Robotic control via embodied chain-of-thought reasoning,” 2025. [Online]. Available: https://arxiv.org/abs/2407.08693

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

On the opportunities and risks of foundation models,

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Go...

work page
[29]

On the Opportunities and Risks of Foundation Models

[Online]. Available: https://arxiv.org/abs/2108.07258

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Gemini Robotics: Bringing AI into the Physical World,

G. R. Team, “Gemini Robotics: Bringing AI into the Physical World,” Tech. Rep., Mar. 2025. [Online]. Available: https://deepmind.google/discover/blog/ gemini-robotics-brings-ai-into-the-physical-world/

work page 2025
[31]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, 20 K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, K.-H. Lee, S. Levine, Y. Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “OpenVLA: An Open-Source Vision-Language- Action Model,” Sep. 2024, arXiv:2406.09246 [cs]. [Online]. Available: http://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Language models are few-shot learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 1877–1901

work page 2020
[34]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshimaet al., “The pile: An 800gb dataset of diverse text for language modeling,” arXiv preprint arXiv:2101.00027, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2020
[35]

Documenting large webtext corpora: A case study on the colossal clean crawled corpus,

J. Dodge, M. Sap, A. Marasovi ´c, W. Agnew, G. Ilharco, D. Groen- eveld, M. Mitchell, and M. Gardner, “Documenting large webtext corpora: A case study on the colossal clean crawled corpus,”arXiv preprint arXiv:2104.08758, 2021

work page arXiv 2021
[36]

Laion-5b: An open large-scale dataset for training next generation image-text models,

C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in neural information processing systems, vol. 35, pp. 25 278–25 294, 2022

work page 2022
[37]

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

C. Schuhmann, R. Vencu, R. Beaumont, R. Kaczmarczyk, C. Mullis, A. Katta, T. Coombes, J. Jitsev, and A. Komatsuzaki, “Laion-400m: Open dataset of clip-filtered 400 million image-text pairs,” arXiv preprint arXiv:2111.02114, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[38]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” in NeurIPS, 2023

work page 2023
[39]

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

F. Ebert, Y. Yang, K. Schmeckpeper, B. Bucher, G. Georgakis, K. Daniilidis, C. Finn, and S. Levine, “Bridge data: Boosting generalization of robotic skills with cross-domain datasets,” 2021. [Online]. Available: https://arxiv.org/abs/2109.13396

work page internal anchor Pith review Pith/arXiv arXiv 2021
[40]

Rh20t: A robotic dataset for learning diverse skills in one-shot,

H.-S. Fang, H. Fang, Z. Tang, J. Liu, J. Wang, H. Zhu, and C. Lu, “Rh20t: A robotic dataset for learning diverse skills in one-shot,” in RSS 2023 Workshop on Learning for Task and Motion Planning , 2023

work page 2023
[41]

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

AgiBot-World-Contributors, Q. Bu, J. Cai, L. Chen, X. Cui, Y. Ding, S. Feng, S. Gao, X. He, X. Huang, S. Jiang, Y. Jiang, C. Jing, H. Li, J. Li, C. Liu, Y. Liu, Y. Lu, J. Luo, P. Luo, Y. Mu, Y. Niu, Y. Pan, J. Pang, Y. Qiao, G. Ren, C. Ruan, J. Shan, Y. Shen, C. Shi, M. Shi, M. Shi, C. Sima, J. Song, H. Wang, W. Wang, D. Wei, C. Xie, G. Xu, J. Yan, C. Yan...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,

H. Geng, F. Wang, S. Wei, Y. Li, B. Wang, B. An, C. T. Cheng, H. Lou, P. Li, Y.-J. Wang, Y. Liang, D. Goetting, C. Xu, H. Chen, Y. Qian, Y. Geng, J. Mao, W. Wan, M. Zhang, J. Lyu, S. Zhao, J. Zhang, J. Zhang, C. Zhao, H. Lu, Y. Ding, R. Gong, Y. Wang, Y. Kuang, R. Wu, B. Jia, C. Sferrazza, H. Dong, S. Huang, K. Sreenath, Y. Wang, J. Malik, and P. Abbeel, ...

work page 2025
[43]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y. Guo, H. Mazhar et al., “Orbit: A unified simulation framework for interactive robot learning environments,” IEEE Robotics and Automation Letters , vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023
[44]

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

S. Tao, F. Xiang, A. Shukla, Y. Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y. Liu, T. kai Chan, Y. Gao, X. Li, T. Mu, N. Xiao, A. Gurha, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai,” 2024. [Online]. Available: https://arxiv.org/abs/2410.00425

work page Pith review arXiv 2024
[45]

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K. Fragkiadaki, Z. Erickson, D. Held, and C. Gan, “Robogen: Towards unleashing infinite data for automated robot learning via generative simula- tion,” arXiv preprint arXiv:2311.01455, 2023

work page Pith review arXiv 2023
[46]

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

S. Nasiriany, A. Maddukuri, L. Zhang, A. Parikh, A. Lo, A. Joshi, A. Mandlekar, and Y. Zhu, “Robocasa: Large-scale simulation of everyday tasks for generalist robots,”arXiv preprint arXiv:2406.02523, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[47]

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

A. Mandlekar, S. Nasiriany, B. Wen, I. Akinola, Y. Narang, L. Fan, Y. Zhu, and D. Fox, “Mimicgen: A data generation system for scalable robot learning using human demonstrations,” arXiv preprint arXiv:2310.17596, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[48]

Rlbench: The robot learning benchmark & learning environment,

S. James, Z. Ma, D. R. Arrojo, and A. J. Davison, “Rlbench: The robot learning benchmark & learning environment,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3019–3026, 2020

work page 2020
[49]

Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,

A. Wei, A. Agarwal, B. Chen, R. Bosworth, N. Pfaff, and R. Tedrake, “Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,” 2025. [Online]. Available: https://arxiv.org/abs/2503.22634

work page arXiv 2025
[50]

Sim-and-real co- training: A simple recipe for vision-based robotic manipulation,

A. Maddukuri, Z. Jiang, L. Y. Chen, S. Nasiriany, Y. Xie, Y. Fang, W. Huang, Z. Wang, Z. Xu, N. Chernyadev, S. Reed, K. Goldberg, A. Mandlekar, L. Fan, and Y. Zhu, “Sim-and-real co-training: A simple recipe for vision-based robotic manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24361

work page arXiv 2025
[51]

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,” 2024. [Online]. Available: https://arxiv.org/abs/2402.10329

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

Legato: Cross-embodiment imitation using a grasping tool,

M. Seo, H. A. Park, S. Yuan, Y. Zhu, and L. Sentis, “Legato: Cross-embodiment imitation using a grasping tool,”IEEE Robotics and Automation Letters, vol. 10, no. 3, p. 2854–2861, Mar. 2025. [Online]. Available: http://dx.doi.org/10.1109/LRA.2025.3535182

work page doi:10.1109/lra.2025.3535182 2025
[53]

EgoMimic: Scaling Imitation Learning via Egocentric Video

S. Kareer, D. Patel, R. Punamiya, P. Mathur, S. Cheng, C. Wang, J. Hoffman, and D. Xu, “Egomimic: Scaling imitation learning via egocentric video,” 2024. [Online]. Available: https://arxiv.org/abs/2410.24221

work page Pith review arXiv 2024
[54]

Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,

H. Fang, H.-S. Fang, Y. Wang, J. Ren, J. Chen, R. Zhang, W. Wang, and C. Lu, “Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,” 2024. [Online]. Available: https://arxiv.org/abs/2309.14975

work page arXiv 2024
[55]

Kress-Gazit, K

H. Kress-Gazit, K. Hashimoto, N. Kuppuswamy, P. Shah, P. Hor- gan, G. Richardson, S. Feng, and B. Burchfiel, “Robot learning as an empirical science: Best practices for policy evaluation,” arXiv preprint arXiv:2409.09491, 2024

work page arXiv 2024
[56]

Maniskill: Generalizable manipulation skill benchmark with large- scale demonstrations.arXiv preprint arXiv:2107.14483, 2021

T. Mu, Z. Ling, F. Xiang, D. Yang, X. Li, S. Tao, Z. Huang, Z. Jia, and H. Su, “Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,” 2021. [Online]. Available: https://arxiv.org/abs/2107.14483

work page arXiv 2021
[57]

Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learn- ing,

T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” 2021. [Online]. Available: https://arxiv.org/abs/1910.10897

work page arXiv 2021
[58]

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Y. Zhu, J. Wong, A. Mandlekar, R. Mart ´ın-Mart´ın, A. Joshi, S. Nasiriany, Y. Zhu, and K. Lin, “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv preprint arXiv:2009.12293, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[59]

Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, and Li Fei-Fei

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. Vainio, Z. Lian, C. Gokmen, S. Buch, C. K. Liu, S. Savarese, H. Gweon, J. Wu, and L. Fei-Fei, “Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,” 2021. [Online]. Available: https://arxiv.org/abs/2108.03332 21

work page arXiv 2021
[60]

arXiv preprint arXiv:2004.06799 (2020),�� 3

M. Deitke, W. Han, A. Herrasti, A. Kembhavi, E. Kolve, R. Mottaghi, J. Salvador, D. Schwenk, E. VanderBilt, M. Wallingford, L. Weihs, M. Yatskar, and A. Farhadi, “Robothor: An open simulation-to-real embodied ai platform,” 2020. [Online]. Available: https://arxiv.org/abs/2004.06799

work page arXiv 2020
[61]

Sim2real predictivity: Does evaluation in simulation predict real- world performance?

A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2real predictivity: Does evaluation in simulation predict real- world performance?” IEEE Robotics and Automation Letters , vol. 5, no. 4, p. 6670–6677, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1109/LRA.2020.3013848

work page doi:10.1109/lra.2020.3013848 2020
[62]

VR-Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control

J. Zhang, L. Tai, P. Yun, Y. Xiong, M. Liu, J. Boedecker, and W. Burgard, “Vr-goggles for robots: Real-to-sim domain adaptation for visual control,” 2019. [Online]. Available: https://arxiv.org/abs/1802.00265

work page internal anchor Pith review Pith/arXiv arXiv 2019
[63]

Evaluating Real-World Robot Manipulation Policies in Simulation

X. Li, K. Hsu, J. Gu, K. Pertsch, O. Mees, H. R. Walke, C. Fu, I. Lunawat, I. Sieh, S. Kirmani, S. Levine, J. Wu, C. Finn, H. Su, Q. Vuong, and T. Xiao, “Evaluating real-world robot manipulation policies in simulation,” arXiv preprint arXiv:2405.05941, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

Asid: Active exploration for system identification in robotic manipu- lation,

M. Memmel, A. Wagenmaker, C. Zhu, P. Yin, D. Fox, and A. Gupta, “Asid: Active exploration for system identification in robotic manipulation,” 2024. [Online]. Available: https: //arxiv.org/abs/2404.12308

work page arXiv 2024
[65]

Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,

N. Pfaff, E. Fu, J. Binagia, P. Isola, and R. Tedrake, “Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,” 2025. [Online]. Available: https: //arxiv.org/abs/2503.00370

work page arXiv 2025
[66]

Rb2: Robotic manipulation benchmarking with a twist,

S. Dasari, J. Wang, J. Hong, S. Bahl, Y. Lin, A. Wang, A. Thankaraj, K. Chahal, B. Calli, S. Gupta, D. Held, L. Pinto, D. Pathak, V. Kumar, and A. Gupta, “Rb2: Robotic manipulation benchmarking with a twist,” 2022. [Online]. Available: https://arxiv.org/abs/2203.08098

work page arXiv 2022
[67]

Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,

A. S. Morgan, K. Hang, W. G. Bircher, F. M. Alladkani, A. Gandhi, B. Calli, and A. M. Dollar, “Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 454–461, 2019

work page 2019
[68]

Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,

M. Heo, Y. Lee, D. Lee, and J. J. Lim, “Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,” The International Journal of Robotics Research , p. 02783649241304789, 2023

work page 2023
[69]

Benchmarking protocols for evaluating small parts robotic assembly systems,

K. Kimble, K. Van Wyk, J. Falco, E. Messina, Y. Sun, M. Shibata, W. Uemura, and Y. Yokokohji, “Benchmarking protocols for evaluating small parts robotic assembly systems,” IEEE robotics and automation letters , vol. 5, no. 2, pp. 883–889, 2020

work page 2020
[70]

Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,

N. Khargonkar, S. H. Allu, Y. Lu, B. Prabhakaran, Y. Xiang et al., “Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,” in 2024 IEEE International Confer- ence on Robotics and Automation (ICRA) . IEEE, 2024, pp. 8258–8264

work page 2024
[71]

Bench- marking protocol for grasp planning algorithms,

Y. Bekiroglu, N. Marturi, M. A. Roa, K. J. M. Adjigble, T. Pardi, C. Grimm, R. Balasubramanian, K. Hang, and R. Stolkin, “Bench- marking protocol for grasp planning algorithms,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 315–322, 2019

work page 2019
[72]

Graspa 1.0: Graspa is a robot arm grasping performance benchmark,

F. Bottarel, G. Vezzani, U. Pattacini, and L. Natale, “Graspa 1.0: Graspa is a robot arm grasping performance benchmark,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 836–843, 2020

work page 2020
[73]

Benchmark for bimanual robotic manipulation of semi-deformable objects,

K. Chatzilygeroudis, B. Fichera, I. Lauzana, F. Bu, K. Yao, F. Khadivar, and A. Billard, “Benchmark for bimanual robotic manipulation of semi-deformable objects,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2443–2450, 2020

work page 2020
[74]

Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,

Z. Liu, W. Liu, Y. Qin, F. Xiang, M. Gou, S. Xin, M. A. Roa, B. Calli, H. Su, Y. Sunet al., “Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 486–493, 2021

work page 2021
[75]

Real robot challenge: A robotics competition in the cloud,

S. Bauer, M. W¨ uthrich, F. Widmaier, A. Buchholz, S. Stark, A. Goyal, T. Steinbrenner, J. Akpo, S. Joshi, V. Berenz et al. , “Real robot challenge: A robotics competition in the cloud,” in NeurIPS 2021 Competitions and Demonstrations Track. PMLR, 2022, pp. 190–204

work page 2021
[76]

Train offline, test online: A real robot learning benchmark,

G. Zhou, V. Dean, M. K. Srirama, A. Rajeswaran, J. Pari, K. Hatch, A. Jain, T. Yu, P. Abbeel, L. Pintoet al., “Train offline, test online: A real robot learning benchmark,” in 2023 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2023, pp. 9197–9203

work page 2023
[77]

Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world.arXiv preprint arXiv:2503.24278, 2025

Z. Zhou, P. Atreya, Y. L. Tan, K. Pertsch, and S. Levine, “Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24278

work page arXiv 2025
[78]

org/abs/2503.10966

D. Snyder, A. J. Hancock, A. Badithela, E. Dixon, P. Miller, R. A. Ambrus, A. Majumdar, M. Itkina, and H. Nishimura, “Is your imitation learning policy better than mine? policy comparison with near-optimal stopping,”arXiv preprint arXiv:2503.10966, 2025

work page arXiv 2025
[79]

Deep reinforcement learning at the edge of the statistical precipice,

R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,” Advances in neural information processing systems, vol. 34, pp. 29 304–29 320, 2021

work page 2021
[80]

Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,

S. Greenland, S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman, “Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,” European journal of epidemiology , vol. 31, no. 4, pp. 337–350, 2016

work page 2016

Showing first 80 references.

[1] [1]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Journal of Robotics Research, 2024

work page 2024

[2] [2]

Learning fine-grained bimanual manipulation with low-cost hardware,

T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” in Robotics: Science and Systems XIX . Robotics: Science and Systems Foundation, 2023. [Online]. Available: https: //roboticsproceedings.org/rss19/p078.pdf

work page 2023

[3] [3]

Aloha unleashed: A simple recipe for robot dexterity,

T. Z. Zhao, J. Tompson, D. Driess, P. Florence, K. Ghasemipour, C. Finn, and A. Wahid, “Aloha unleashed: A simple recipe for robot dexterity,” in8th Annual Conference on Robot Learning , 2024

work page 2024

[4] [4]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y. L. Tan, L. Y. Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine, “Octo: An open-source generalist robot policy,” 2024. [Online]. Available: https://arxiv.org/abs/2405.12213

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, L. X. Shi, J. Tanner, Q. Vuong, A. Walling, H. Wang, and U. Zhilinsky, “ 𝜋0: A vision-language-action flow model for general robot control,” 2024. [Online]. Availabl...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA, :, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y. Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y. L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y. Xie, Y. Xu, Z. Xu, S. Ye, Z. Yu...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Gemini Robotics: Bringing AI into the Physical World

G. R. Team, S. Abeyruwan, J. Ainslie, J.-B. Alayrac, M. G. Arenas, T. Armstrong, A. Balakrishna, R. Baruch, M. Bauza, M. Blokzijl, S. Bohez, K. Bousmalis, A. Brohan, T. Buschmann, A. Byravan, S. Cabi, K. Caluwaerts, F. Casarini, O. Chang, J. E. Chen, X. Chen, H.-T. L. Chiang, K. Choromanski, D. D’ Ambrosio, S. Dasari, T. Davchev, C. Devin, N. D. Palo, T. ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,

L. Wang, X. Chen, J. Zhao, and K. He, “Scaling proprioceptive- visual learning with heterogeneous pre-trained transformers,” Advances in neural information processing systems , vol. 37, pp. 124 420–124 450, 2024

work page 2024

[9] [9]

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu, “RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation,” Mar. 2025, arXiv:2410.07864 [cs]. [Online]. Available: http://arxiv.org/abs/2410.07864

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Segment any- thing,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Loet al., “Segment any- thing,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023, pp. 3992–4003

work page 2023

[11] [11]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al. , “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021

[12] [12]

Sigmoid loss for language image pre-training,

X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid loss for language image pre-training,” inProceedings of the IEEE/CVF international conference on computer vision , 2023, pp. 11 975– 11 986

work page 2023

[13] [13]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,” Transactions on Machine Learning Research Journal , pp. 1–31, 2024

work page 2024

[14] [14]

GPT-4 Technical Report

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks, M. Brundag...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar et al. , “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. 19

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Droid: A large-scale in-the-wild robot manipulation dataset,

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y. J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y. Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. Lu,...

work page 2024

[17] [17]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

E. Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid, B. B...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems

T. AgiBot-World, “AgiBot World Colosseo: Large-scale Manipu- lation Platform for Scalable and Intelligent Embodied Systems.”

work page

[19] [19]

Openvla: An open-source vision-language-action model,

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “Openvla: An open-source vision-language-action model,” in 8th Annual Conference on Robot Learning

work page

[20] [20]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, P. Florence, C. Fu, M. G. Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, L. Lee, T.-W. E. Lee, S. Levine, Y. Lu, H. Michalewski, I. Mordatch, K. Perts...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

𝜋0.5: a vision-language-action model with open-world generalization,

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y. Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. Vu...

work page

[22] [22]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

[Online]. Available: https://arxiv.org/abs/2504.16054

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

Magma: A foundation model for multimodal ai agents,

J. Yang, R. Tan, Q. Wu, R. Zheng, B. Peng, Y. Liang, Y. Gu, M. Cai, S. Ye, J. Janget al., “Magma: A foundation model for multimodal ai agents,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 203–14 214

work page 2025

[24] [24]

A generalist agent,

S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-maron, M. Gim´enez, Y. Sulsky, J. Kay, J. T. Springenberg et al. , “A generalist agent,” Transactions on Machine Learning Research

work page

[25] [25]

Palm-e: An embodied multimodal language model,

D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “Palm-e: An embodied multimodal language model,”

work page

[26] [26]

PaLM-E: An Embodied Multimodal Language Model

[Online]. Available: https://arxiv.org/abs/2303.03378

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Robotic Control via Embodied Chain-of-Thought Reasoning

M. Zawalski, W. Chen, K. Pertsch, O. Mees, C. Finn, and S. Levine, “Robotic control via embodied chain-of-thought reasoning,” 2025. [Online]. Available: https://arxiv.org/abs/2407.08693

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

On the opportunities and risks of foundation models,

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Go...

work page

[29] [29]

On the Opportunities and Risks of Foundation Models

[Online]. Available: https://arxiv.org/abs/2108.07258

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Gemini Robotics: Bringing AI into the Physical World,

G. R. Team, “Gemini Robotics: Bringing AI into the Physical World,” Tech. Rep., Mar. 2025. [Online]. Available: https://deepmind.google/discover/blog/ gemini-robotics-brings-ai-into-the-physical-world/

work page 2025

[31] [31]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, 20 K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, K.-H. Lee, S. Levine, Y. Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, Q. Vuong, T. Kollar, B. Burchfiel, R. Tedrake, D. Sadigh, S. Levine, P. Liang, and C. Finn, “OpenVLA: An Open-Source Vision-Language- Action Model,” Sep. 2024, arXiv:2406.09246 [cs]. [Online]. Available: http://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

Language models are few-shot learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 1877–1901

work page 2020

[34] [34]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshimaet al., “The pile: An 800gb dataset of diverse text for language modeling,” arXiv preprint arXiv:2101.00027, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2020

[35] [35]

Documenting large webtext corpora: A case study on the colossal clean crawled corpus,

J. Dodge, M. Sap, A. Marasovi ´c, W. Agnew, G. Ilharco, D. Groen- eveld, M. Mitchell, and M. Gardner, “Documenting large webtext corpora: A case study on the colossal clean crawled corpus,”arXiv preprint arXiv:2104.08758, 2021

work page arXiv 2021

[36] [36]

Laion-5b: An open large-scale dataset for training next generation image-text models,

C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman et al., “Laion-5b: An open large-scale dataset for training next generation image-text models,” Advances in neural information processing systems, vol. 35, pp. 25 278–25 294, 2022

work page 2022

[37] [37]

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

C. Schuhmann, R. Vencu, R. Beaumont, R. Kaczmarczyk, C. Mullis, A. Katta, T. Coombes, J. Jitsev, and A. Komatsuzaki, “Laion-400m: Open dataset of clip-filtered 400 million image-text pairs,” arXiv preprint arXiv:2111.02114, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[38] [38]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” in NeurIPS, 2023

work page 2023

[39] [39]

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

F. Ebert, Y. Yang, K. Schmeckpeper, B. Bucher, G. Georgakis, K. Daniilidis, C. Finn, and S. Levine, “Bridge data: Boosting generalization of robotic skills with cross-domain datasets,” 2021. [Online]. Available: https://arxiv.org/abs/2109.13396

work page internal anchor Pith review Pith/arXiv arXiv 2021

[40] [40]

Rh20t: A robotic dataset for learning diverse skills in one-shot,

H.-S. Fang, H. Fang, Z. Tang, J. Liu, J. Wang, H. Zhu, and C. Lu, “Rh20t: A robotic dataset for learning diverse skills in one-shot,” in RSS 2023 Workshop on Learning for Task and Motion Planning , 2023

work page 2023

[41] [41]

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

AgiBot-World-Contributors, Q. Bu, J. Cai, L. Chen, X. Cui, Y. Ding, S. Feng, S. Gao, X. He, X. Huang, S. Jiang, Y. Jiang, C. Jing, H. Li, J. Li, C. Liu, Y. Liu, Y. Lu, J. Luo, P. Luo, Y. Mu, Y. Niu, Y. Pan, J. Pang, Y. Qiao, G. Ren, C. Ruan, J. Shan, Y. Shen, C. Shi, M. Shi, M. Shi, C. Sima, J. Song, H. Wang, W. Wang, D. Wei, C. Xie, G. Xu, J. Yan, C. Yan...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[42] [42]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,

H. Geng, F. Wang, S. Wei, Y. Li, B. Wang, B. An, C. T. Cheng, H. Lou, P. Li, Y.-J. Wang, Y. Liang, D. Goetting, C. Xu, H. Chen, Y. Qian, Y. Geng, J. Mao, W. Wan, M. Zhang, J. Lyu, S. Zhao, J. Zhang, J. Zhang, C. Zhao, H. Lu, Y. Ding, R. Gong, Y. Wang, Y. Kuang, R. Wu, B. Jia, C. Sferrazza, H. Dong, S. Huang, K. Sreenath, Y. Wang, J. Malik, and P. Abbeel, ...

work page 2025

[43] [43]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y. Guo, H. Mazhar et al., “Orbit: A unified simulation framework for interactive robot learning environments,” IEEE Robotics and Automation Letters , vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023

[44] [44]

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

S. Tao, F. Xiang, A. Shukla, Y. Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y. Liu, T. kai Chan, Y. Gao, X. Li, T. Mu, N. Xiao, A. Gurha, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai,” 2024. [Online]. Available: https://arxiv.org/abs/2410.00425

work page Pith review arXiv 2024

[45] [45]

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K. Fragkiadaki, Z. Erickson, D. Held, and C. Gan, “Robogen: Towards unleashing infinite data for automated robot learning via generative simula- tion,” arXiv preprint arXiv:2311.01455, 2023

work page Pith review arXiv 2023

[46] [46]

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

S. Nasiriany, A. Maddukuri, L. Zhang, A. Parikh, A. Lo, A. Joshi, A. Mandlekar, and Y. Zhu, “Robocasa: Large-scale simulation of everyday tasks for generalist robots,”arXiv preprint arXiv:2406.02523, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[47] [47]

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

A. Mandlekar, S. Nasiriany, B. Wen, I. Akinola, Y. Narang, L. Fan, Y. Zhu, and D. Fox, “Mimicgen: A data generation system for scalable robot learning using human demonstrations,” arXiv preprint arXiv:2310.17596, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[48] [48]

Rlbench: The robot learning benchmark & learning environment,

S. James, Z. Ma, D. R. Arrojo, and A. J. Davison, “Rlbench: The robot learning benchmark & learning environment,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3019–3026, 2020

work page 2020

[49] [49]

Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,

A. Wei, A. Agarwal, B. Chen, R. Bosworth, N. Pfaff, and R. Tedrake, “Empirical analysis of sim-and-real cotraining of diffusion policies for planar pushing from pixels,” 2025. [Online]. Available: https://arxiv.org/abs/2503.22634

work page arXiv 2025

[50] [50]

Sim-and-real co- training: A simple recipe for vision-based robotic manipulation,

A. Maddukuri, Z. Jiang, L. Y. Chen, S. Nasiriany, Y. Xie, Y. Fang, W. Huang, Z. Wang, Z. Xu, N. Chernyadev, S. Reed, K. Goldberg, A. Mandlekar, L. Fan, and Y. Zhu, “Sim-and-real co-training: A simple recipe for vision-based robotic manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24361

work page arXiv 2025

[51] [51]

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,” 2024. [Online]. Available: https://arxiv.org/abs/2402.10329

work page internal anchor Pith review Pith/arXiv arXiv 2024

[52] [52]

Legato: Cross-embodiment imitation using a grasping tool,

M. Seo, H. A. Park, S. Yuan, Y. Zhu, and L. Sentis, “Legato: Cross-embodiment imitation using a grasping tool,”IEEE Robotics and Automation Letters, vol. 10, no. 3, p. 2854–2861, Mar. 2025. [Online]. Available: http://dx.doi.org/10.1109/LRA.2025.3535182

work page doi:10.1109/lra.2025.3535182 2025

[53] [53]

EgoMimic: Scaling Imitation Learning via Egocentric Video

S. Kareer, D. Patel, R. Punamiya, P. Mathur, S. Cheng, C. Wang, J. Hoffman, and D. Xu, “Egomimic: Scaling imitation learning via egocentric video,” 2024. [Online]. Available: https://arxiv.org/abs/2410.24221

work page Pith review arXiv 2024

[54] [54]

Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,

H. Fang, H.-S. Fang, Y. Wang, J. Ren, J. Chen, R. Zhang, W. Wang, and C. Lu, “Airexo: Low-cost exoskeletons for learning whole-arm manipulation in the wild,” 2024. [Online]. Available: https://arxiv.org/abs/2309.14975

work page arXiv 2024

[55] [55]

Kress-Gazit, K

H. Kress-Gazit, K. Hashimoto, N. Kuppuswamy, P. Shah, P. Hor- gan, G. Richardson, S. Feng, and B. Burchfiel, “Robot learning as an empirical science: Best practices for policy evaluation,” arXiv preprint arXiv:2409.09491, 2024

work page arXiv 2024

[56] [56]

Maniskill: Generalizable manipulation skill benchmark with large- scale demonstrations.arXiv preprint arXiv:2107.14483, 2021

T. Mu, Z. Ling, F. Xiang, D. Yang, X. Li, S. Tao, Z. Huang, Z. Jia, and H. Su, “Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,” 2021. [Online]. Available: https://arxiv.org/abs/2107.14483

work page arXiv 2021

[57] [57]

Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learn- ing,

T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” 2021. [Online]. Available: https://arxiv.org/abs/1910.10897

work page arXiv 2021

[58] [58]

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Y. Zhu, J. Wong, A. Mandlekar, R. Mart ´ın-Mart´ın, A. Joshi, S. Nasiriany, Y. Zhu, and K. Lin, “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv preprint arXiv:2009.12293, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009

[59] [59]

Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, and Li Fei-Fei

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. Vainio, Z. Lian, C. Gokmen, S. Buch, C. K. Liu, S. Savarese, H. Gweon, J. Wu, and L. Fei-Fei, “Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments,” 2021. [Online]. Available: https://arxiv.org/abs/2108.03332 21

work page arXiv 2021

[60] [60]

arXiv preprint arXiv:2004.06799 (2020),�� 3

M. Deitke, W. Han, A. Herrasti, A. Kembhavi, E. Kolve, R. Mottaghi, J. Salvador, D. Schwenk, E. VanderBilt, M. Wallingford, L. Weihs, M. Yatskar, and A. Farhadi, “Robothor: An open simulation-to-real embodied ai platform,” 2020. [Online]. Available: https://arxiv.org/abs/2004.06799

work page arXiv 2020

[61] [61]

Sim2real predictivity: Does evaluation in simulation predict real- world performance?

A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2real predictivity: Does evaluation in simulation predict real- world performance?” IEEE Robotics and Automation Letters , vol. 5, no. 4, p. 6670–6677, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1109/LRA.2020.3013848

work page doi:10.1109/lra.2020.3013848 2020

[62] [62]

VR-Goggles for Robots: Real-to-sim Domain Adaptation for Visual Control

J. Zhang, L. Tai, P. Yun, Y. Xiong, M. Liu, J. Boedecker, and W. Burgard, “Vr-goggles for robots: Real-to-sim domain adaptation for visual control,” 2019. [Online]. Available: https://arxiv.org/abs/1802.00265

work page internal anchor Pith review Pith/arXiv arXiv 2019

[63] [63]

Evaluating Real-World Robot Manipulation Policies in Simulation

X. Li, K. Hsu, J. Gu, K. Pertsch, O. Mees, H. R. Walke, C. Fu, I. Lunawat, I. Sieh, S. Kirmani, S. Levine, J. Wu, C. Finn, H. Su, Q. Vuong, and T. Xiao, “Evaluating real-world robot manipulation policies in simulation,” arXiv preprint arXiv:2405.05941, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[64] [64]

Asid: Active exploration for system identification in robotic manipu- lation,

M. Memmel, A. Wagenmaker, C. Zhu, P. Yin, D. Fox, and A. Gupta, “Asid: Active exploration for system identification in robotic manipulation,” 2024. [Online]. Available: https: //arxiv.org/abs/2404.12308

work page arXiv 2024

[65] [65]

Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,

N. Pfaff, E. Fu, J. Binagia, P. Isola, and R. Tedrake, “Scalable real2sim: Physics-aware asset generation via robotic pick-and-place setups,” 2025. [Online]. Available: https: //arxiv.org/abs/2503.00370

work page arXiv 2025

[66] [66]

Rb2: Robotic manipulation benchmarking with a twist,

S. Dasari, J. Wang, J. Hong, S. Bahl, Y. Lin, A. Wang, A. Thankaraj, K. Chahal, B. Calli, S. Gupta, D. Held, L. Pinto, D. Pathak, V. Kumar, and A. Gupta, “Rb2: Robotic manipulation benchmarking with a twist,” 2022. [Online]. Available: https://arxiv.org/abs/2203.08098

work page arXiv 2022

[67] [67]

Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,

A. S. Morgan, K. Hang, W. G. Bircher, F. M. Alladkani, A. Gandhi, B. Calli, and A. M. Dollar, “Benchmarking cluttered robot pick- and-place manipulation with the box and blocks test,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 454–461, 2019

work page 2019

[68] [68]

Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,

M. Heo, Y. Lee, D. Lee, and J. J. Lim, “Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation,” The International Journal of Robotics Research , p. 02783649241304789, 2023

work page 2023

[69] [69]

Benchmarking protocols for evaluating small parts robotic assembly systems,

K. Kimble, K. Van Wyk, J. Falco, E. Messina, Y. Sun, M. Shibata, W. Uemura, and Y. Yokokohji, “Benchmarking protocols for evaluating small parts robotic assembly systems,” IEEE robotics and automation letters , vol. 5, no. 2, pp. 883–889, 2020

work page 2020

[70] [70]

Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,

N. Khargonkar, S. H. Allu, Y. Lu, B. Prabhakaran, Y. Xiang et al., “Scenereplica: Benchmarking real-world robot manipulation by creating replicable scenes,” in 2024 IEEE International Confer- ence on Robotics and Automation (ICRA) . IEEE, 2024, pp. 8258–8264

work page 2024

[71] [71]

Bench- marking protocol for grasp planning algorithms,

Y. Bekiroglu, N. Marturi, M. A. Roa, K. J. M. Adjigble, T. Pardi, C. Grimm, R. Balasubramanian, K. Hang, and R. Stolkin, “Bench- marking protocol for grasp planning algorithms,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 315–322, 2019

work page 2019

[72] [72]

Graspa 1.0: Graspa is a robot arm grasping performance benchmark,

F. Bottarel, G. Vezzani, U. Pattacini, and L. Natale, “Graspa 1.0: Graspa is a robot arm grasping performance benchmark,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 836–843, 2020

work page 2020

[73] [73]

Benchmark for bimanual robotic manipulation of semi-deformable objects,

K. Chatzilygeroudis, B. Fichera, I. Lauzana, F. Bu, K. Yao, F. Khadivar, and A. Billard, “Benchmark for bimanual robotic manipulation of semi-deformable objects,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2443–2450, 2020

work page 2020

[74] [74]

Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,

Z. Liu, W. Liu, Y. Qin, F. Xiang, M. Gou, S. Xin, M. A. Roa, B. Calli, H. Su, Y. Sunet al., “Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 486–493, 2021

work page 2021

[75] [75]

Real robot challenge: A robotics competition in the cloud,

S. Bauer, M. W¨ uthrich, F. Widmaier, A. Buchholz, S. Stark, A. Goyal, T. Steinbrenner, J. Akpo, S. Joshi, V. Berenz et al. , “Real robot challenge: A robotics competition in the cloud,” in NeurIPS 2021 Competitions and Demonstrations Track. PMLR, 2022, pp. 190–204

work page 2021

[76] [76]

Train offline, test online: A real robot learning benchmark,

G. Zhou, V. Dean, M. K. Srirama, A. Rajeswaran, J. Pari, K. Hatch, A. Jain, T. Yu, P. Abbeel, L. Pintoet al., “Train offline, test online: A real robot learning benchmark,” in 2023 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2023, pp. 9197–9203

work page 2023

[77] [77]

Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world.arXiv preprint arXiv:2503.24278, 2025

Z. Zhou, P. Atreya, Y. L. Tan, K. Pertsch, and S. Levine, “Autoeval: Autonomous evaluation of generalist robot manipulation policies in the real world,” 2025. [Online]. Available: https://arxiv.org/abs/2503.24278

work page arXiv 2025

[78] [78]

org/abs/2503.10966

D. Snyder, A. J. Hancock, A. Badithela, E. Dixon, P. Miller, R. A. Ambrus, A. Majumdar, M. Itkina, and H. Nishimura, “Is your imitation learning policy better than mine? policy comparison with near-optimal stopping,”arXiv preprint arXiv:2503.10966, 2025

work page arXiv 2025

[79] [79]

Deep reinforcement learning at the edge of the statistical precipice,

R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,” Advances in neural information processing systems, vol. 34, pp. 29 304–29 320, 2021

work page 2021

[80] [80]

Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,

S. Greenland, S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman, “Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations,” European journal of epidemiology , vol. 31, no. 4, pp. 337–350, 2016

work page 2016