Scalable Multi-Task Data Generation via Reinforcement Learning for Language-Conditioned Bimanual Dexterous Manipulation

Georgia Chalvatzaki; Jan Peters; Puze Liu; Yufeng Jin; Zechu Li

arxiv: 2606.22471 · v2 · pith:V6O2UVESnew · submitted 2026-06-21 · 💻 cs.RO

Scalable Multi-Task Data Generation via Reinforcement Learning for Language-Conditioned Bimanual Dexterous Manipulation

Zechu Li , Yufeng Jin , Puze Liu , Jan Peters , Georgia Chalvatzaki This is my paper

Pith reviewed 2026-06-30 10:43 UTC · model grok-4.3

classification 💻 cs.RO

keywords reinforcement learningdata generationbimanual manipulationdexterous manipulationlanguage-conditioned policiesdomain randomizationmulti-task learningsynthetic datasets

0 comments

The pith

A reinforcement learning pipeline with generalizable rewards and domain randomization generates scalable synthetic datasets that improve generalization for language-conditioned bimanual dexterous manipulation policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an RL-based pipeline to create large-scale synthetic datasets for bimanual dexterous manipulation, addressing the shortage of high-quality training data for generalist robot policies. It combines a generalizable reward design, domain randomization, and language-conditioned annotations to produce diverse, robot-executable trajectories without task-specific engineering. This matters because human teleoperation methods limit task diversity and scalability, while simulation can generate more data. Experiments show the resulting datasets enable multi-task policies with significantly better generalization on three representative tasks. A sympathetic reader would see value in a method that scales data creation for complex two-handed robot skills.

Core claim

The paper claims that a systematic RL-based data generation pipeline integrating generalizable reward design, effective domain randomization, and language-conditioned task annotations synthesizes diverse, high-quality datasets for dexterous bimanual manipulation and enables training of language-conditioned multi-task policies that improve generalization across tasks.

What carries the argument

The RL-based data generation pipeline that integrates generalizable reward design, domain randomization, and language-conditioned task annotations to produce robot-executable trajectories.

If this is right

The generated datasets enable training of policies that generalize better across the three representative manipulation tasks.
Language conditioning supports multi-task policy learning from a unified synthetic dataset.
The pipeline scales data generation beyond the limits of human teleoperation methods.
It reduces the requirement for handcrafted task-specific rewards during data synthesis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pipeline could extend to generating data for additional bimanual tasks or different robot morphologies by varying the randomization parameters.
Combining the synthetic data with limited real-world demonstrations might further improve real-robot performance.
The method suggests a path toward creating much larger datasets through parallel simulation runs.

Load-bearing premise

A single generalizable reward design combined with domain randomization can produce robot-executable trajectories across diverse tasks without task-specific reward engineering.

What would settle it

An experiment in which policies trained on the generated data show no improvement in generalization on the three manipulation tasks compared to training without it, or where the trajectories cannot be executed on the physical robot.

Figures

Figures reproduced from arXiv: 2606.22471 by Georgia Chalvatzaki, Jan Peters, Puze Liu, Yufeng Jin, Zechu Li.

**Figure 1.** Figure 1: We propose Reinforcement Learning as Data Collector (RLDC), a scalable pipeline for generating synthetic datasets with task-specialized RL [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Four reward terms. (Top-Left) A predefined hand pose associated [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Network architecture. The raw point cloud is cropped to the region [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: (Left) Visualization of three bimanual dexterous manipulation tasks in simulation. (Right) Performance of the proposed reward design compared [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of real-world results. The first two rows show the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

A key bottleneck in training generalist policies for bimanual dexterous manipulation is the lack of large-scale, high-quality datasets. Synthetic data generation in simulation provides a scalable alternative to human video demonstrations by overcoming challenges such as morphology mismatch, missing physical interactions, and the generation of robot actions. However, existing approaches based on human teleoperation offer limited task diversity, as object-centric trajectory matching often neglects the feasibility of robot execution. Reinforcement learning (RL) enables broader scalability but is often constrained by handcrafted, task-specific rewards. In this work, we propose a systematic RL-based data generation pipeline that integrates generalizable reward design, effective domain randomization, and language-conditioned task annotations. This pipeline synthesizes diverse, high-quality datasets for dexterous bimanual manipulation and enables training of language-conditioned multi-task policies. Our experiments show that the generated data significantly improves generalization across three representative manipulation tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an RL pipeline for generating synthetic bimanual dexterous data with generalizable rewards and language labels, but the abstract supplies no numbers or comparisons to judge the results.

read the letter

The main takeaway is that this paper describes an RL-based pipeline to generate large-scale synthetic data for training language-conditioned policies on bimanual dexterous tasks, using generalizable rewards and domain randomization to avoid task-specific engineering. This addresses a recognized bottleneck in the field.

It does well in framing the problem around data scarcity and in combining several techniques—RL generation, randomization, and language labels—into one system that aims for multi-task applicability. The claim that this leads to better generalization on three tasks is the central result, and the pipeline seems designed to produce robot-executable trajectories.

The soft spot is that the abstract gives no actual numbers, no baselines, and no specifics on the tasks or metrics, so it's impossible to tell how strong the evidence is or whether the generalizable reward really works without hidden task-specific elements. The experiments need to be examined closely in the full paper to see if the data actually supports the generalization.

This is for robotics researchers focused on dexterous manipulation and sim-to-real transfer. Someone looking for new data generation ideas might find it useful if the results check out.

I'd send it to peer review because the topic is timely and the approach is systematic, even if the current presentation is thin on evidence. The full manuscript likely has the details that would allow a proper evaluation.

Referee Report

1 major / 0 minor

Summary. The paper proposes a systematic RL-based data generation pipeline for language-conditioned bimanual dexterous manipulation. The pipeline combines generalizable reward design, domain randomization, and language-conditioned task annotations to synthesize diverse, high-quality synthetic datasets in simulation. These datasets are then used to train language-conditioned multi-task policies, with the central claim being that the generated data significantly improves generalization across three representative manipulation tasks.

Significance. If the experimental results hold with proper quantitative support, the work could meaningfully address the data bottleneck for training generalist policies in complex bimanual dexterous manipulation by providing a scalable simulation-based alternative to human demonstrations that avoids morphology mismatch and task-specific reward engineering.

major comments (1)

[Abstract] Abstract: the central claim that 'the generated data significantly improves generalization across three representative manipulation tasks' is asserted without any metrics, baselines, task descriptions, statistical details, or quantitative results. This absence is load-bearing because the soundness of the generalization claim cannot be assessed from the provided information.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below. The manuscript contains full experimental details, but we agree the abstract can be strengthened for clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the generated data significantly improves generalization across three representative manipulation tasks' is asserted without any metrics, baselines, task descriptions, statistical details, or quantitative results. This absence is load-bearing because the soundness of the generalization claim cannot be assessed from the provided information.

Authors: We agree the abstract, as a concise summary, does not include specific metrics or details. The full manuscript (Section 4: Experiments) provides task descriptions for the three bimanual manipulation tasks, baselines, quantitative success rates, generalization metrics across held-out conditions, and statistical results over multiple seeds. To directly address the concern, we will revise the abstract to incorporate key quantitative results (e.g., relative improvements in multi-task success rates) while preserving its length. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an RL-based pipeline for generating synthetic data using generalizable reward design, domain randomization, and language-conditioned annotations to produce datasets for bimanual dexterous manipulation tasks. The central claim rests on experimental results demonstrating improved generalization across three tasks, with no equations, fitted parameters, or predictions shown that reduce by construction to prior inputs or self-citations. The weakest assumption is explicitly presented as the method's contribution rather than an unexamined premise, and the derivation chain is self-contained against external benchmarks with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No concrete free parameters, axioms, or invented entities can be identified from the abstract alone; the approach implicitly assumes standard RL convergence and simulation-to-reality transfer properties.

pith-pipeline@v0.9.1-grok · 5703 in / 943 out tokens · 48222 ms · 2026-06-30T10:43:47.133682+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 27 canonical work pages · 9 internal anchors

[1]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichteret al., “π 0: A vision- language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xuet al., “Octo: An open-source generalist robot policy,”arXiv preprint arXiv:2405.12213, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketiet al., “Open- vla: An open-source vision-language-action model,”arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Tinyvla: Towards fast, data-efficient vision- language-action models for robotic manipulation,

J. Wen, Y . Zhu, J. Li, M. Zhu, Z. Tang, K. Wu, Z. Xu, N. Liu, R. Cheng, C. Shenet al., “Tinyvla: Towards fast, data-efficient vision- language-action models for robotic manipulation,”IEEE Robotics and Automation Letters, 2025

2025
[5]

Flower: Democratizing generalist robot policies with efficient vision-language-action flow policies,

M. Reuss, H. Zhou, M. R ¨uhle, ¨O. E. Ya ˘gmurlu, F. Otto, and R. Lioutikov, “Flower: Democratizing generalist robot policies with efficient vision-language-action flow policies,”arXiv preprint arXiv:2509.04996, 2025

work page arXiv 2025
[6]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, 2024

2024
[7]

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,

Z. Fu, T. Z. Zhao, and C. Finn, “Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” in Conference on Robot Learning (CoRL), 2024

2024
[8]

Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,

N. Funk, J. Urain, J. Carvalho, V . Prasad, G. Chalvatzaki, and J. Peters, “Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,”arXiv preprint arXiv:2409.04576, 2024

work page arXiv 2024
[9]

Anyteleop: A general vision-based dexterous robot arm- hand teleoperation system,

Y . Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y .-W. Chao, and D. Fox, “Anyteleop: A general vision-based dexterous robot arm- hand teleoperation system,”arXiv preprint arXiv:2307.04577, 2023

work page arXiv 2023
[10]

Humanoid policy human policy,

R.-Z. Qiu, S. Yang, X. Cheng, C. Chawla, J. Li, T. He, G. Yan, D. J. Yoon, R. Hoque, L. Paulsenet al., “Humanoid policy human policy,” arXiv preprint arXiv:2503.13441, 2025

work page arXiv 2025
[11]

Crossing the human-robot embodiment gap with sim-to-real rl using one human demonstration,

T. G. W. Lum, O. Y . Lee, C. K. Liu, and J. Bohg, “Crossing the human-robot embodiment gap with sim-to-real rl using one human demonstration,”arXiv preprint arXiv:2504.12609, 2025

work page arXiv 2025
[12]

Hermes: Human-to-robot embodied learning from multi-source motion data for mobile dexterous manipulation,

Z. Yuan, T. Wei, L. Gu, P. Hua, T. Liang, Y . Chen, and H. Xu, “Hermes: Human-to-robot embodied learning from multi-source motion data for mobile dexterous manipulation,”arXiv preprint arXiv:2508.20085, 2025

work page arXiv 2025
[13]

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

T. Chen, Z. Chen, B. Chen, Z. Cai, Y . Liu, Q. Liang, Z. Li, X. Lin, Y . Ge, Z. Guet al., “Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation,”arXiv preprint arXiv:2506.18088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

S. Nasiriany, A. Maddukuri, L. Zhang, A. Parikh, A. Lo, A. Joshi, A. Mandlekar, and Y . Zhu, “Robocasa: Large-scale simulation of ev- eryday tasks for generalist robots,”arXiv preprint arXiv:2406.02523, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,

H. Geng, F. Wang, S. Wei, Y . Li, B. Wang, B. An, C. T. Cheng, H. Lou, P. Li, Y .-J. Wanget al., “Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,” arXiv preprint arXiv:2504.18904, 2025

work page arXiv 2025
[16]

Behavior: Bench- mark for everyday household activities in virtual, interactive, and ecological environments,

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. E. Vainio, Z. Lian, C. Gokmen, S. Buch, K. Liuet al., “Behavior: Bench- mark for everyday household activities in virtual, interactive, and ecological environments,” inConference on robot learning. PMLR, 2022, pp. 477–490

2022
[17]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning,

Z. Jiang, Y . Xie, K. Lin, Z. Xu, W. Wan, A. Mandlekar, L. J. Fan, and Y . Zhu, “Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 16 923–16 930

2025
[18]

Cyberdemo: Augmenting simulated human demon- stration for real-world dexterous manipulation,

J. Wang, Y . Qin, K. Kuang, Y . Korkmaz, A. Gurumoorthy, H. Su, and X. Wang, “Cyberdemo: Augmenting simulated human demon- stration for real-world dexterous manipulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 17 952–17 963

2024
[19]

Morphologically symmetric reinforcement learning for ambidextrous bimanual manipulation,

Z. Li, Y . Jin, D. O. Apraez, C. Semini, P. Liu, and G. Chalvatzaki, “Morphologically symmetric reinforcement learning for ambidextrous bimanual manipulation,”arXiv preprint arXiv:2505.05287, 2025

work page arXiv 2025
[20]

Twisting lids off with two hands,

T. Lin, Z.-H. Yin, H. Qi, P. Abbeel, and J. Malik, “Twisting lids off with two hands,” in8th Annual Conference on Robot Learning (CoRL), 2024. [Online]. Available: https://openreview.net/forum?id= 3wBqoPfoeJ

2024
[21]

Visual dexterity: In-hand dexterous manipulation from depth,

T. Chen, M. Tippur, S. Wu, V . Kumar, E. Adelson, and P. Agrawal, “Visual dexterity: In-hand dexterous manipulation from depth,” inIcml workshop on new frontiers in learning, control, and dynamical systems, 2023

2023
[22]

In-hand object rotation via rapid motor adaptation,

H. Qi, A. Kumar, R. Calandra, Y . Ma, and J. Malik, “In-hand object rotation via rapid motor adaptation,” inConference on Robot Learning. PMLR, 2023, pp. 1722–1732

2023
[23]

Vegetable peeling: A case study in constrained dexterous manipulation,

T. Chen, E. Cousineau, N. Kuppuswamy, and P. Agrawal, “Vegetable peeling: A case study in constrained dexterous manipulation,”arXiv preprint arXiv:2407.07884, 2024

work page arXiv 2024
[24]

Rldg: Robotic general- ist policy distillation via reinforcement learning,

C. Xu, Q. Li, J. Luo, and S. Levine, “Rldg: Robotic general- ist policy distillation via reinforcement learning,”arXiv preprint arXiv:2412.09858, 2024

work page arXiv 2024
[25]

Reward function design in reinforcement learning,

J. Eschmann, “Reward function design in reinforcement learning,” Reinforcement learning algorithms: Analysis and Applications, pp. 25– 33, 2021

2021
[26]

X-sim: Cross-embodiment learning via real-to-sim-to- real,

P. Dan, K. Kedia, A. Chao, E. W. Duan, M. A. Pace, W.-C. Ma, and S. Choudhury, “X-sim: Cross-embodiment learning via real-to-sim-to- real,”arXiv preprint arXiv:2505.07096, 2025

work page arXiv 2025
[27]

Vividex: Learning vision-based dexterous manipulation from human videos,

Z. Chen, S. Chen, E. Arlaud, I. Laptev, and C. Schmid, “Vividex: Learning vision-based dexterous manipulation from human videos,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 3336–3343

2025
[28]

Sim-to-real reinforcement learning for vision-based dexterous manipulation on humanoids,

T. Lin, K. Sachdev, L. Fan, J. Malik, and Y . Zhu, “Sim-to-real reinforcement learning for vision-based dexterous manipulation on humanoids,”arXiv preprint arXiv:2502.20396, 2025

work page arXiv 2025
[29]

Dextrah-g: Pixels-to- action dexterous arm-hand grasping with geometric fabrics,

T. G. W. Lum, M. Matak, V . Makoviychuk, A. Handa, A. Allshire, T. Hermans, N. D. Ratliff, and K. Van Wyk, “Dextrah-g: Pixels-to- action dexterous arm-hand grasping with geometric fabrics,”arXiv preprint arXiv:2407.02274, 2024

work page arXiv 2024
[30]

Dextrah- rgb: Visuomotor policies to grasp anything with dexterous hands,

R. Singh, A. Allshire, A. Handa, N. Ratliff, and K. Van Wyk, “Dextrah- rgb: Visuomotor policies to grasp anything with dexterous hands,” arXiv preprint arXiv:2412.01791, 2024

work page arXiv 2024
[31]

Learn- ing to manipulate anywhere: A visual generalizable framework for reinforcement learning,

Z. Yuan, T. Wei, S. Cheng, G. Zhang, Y . Chen, and H. Xu, “Learn- ing to manipulate anywhere: A visual generalizable framework for reinforcement learning,”arXiv preprint arXiv:2407.15815, 2024

work page arXiv 2024
[32]

Se (3)- diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,

J. Urain, N. Funk, J. Peters, and G. Chalvatzaki, “Se (3)- diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5923–5930

2023
[33]

Synergies Be- tween Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations,

Z. Jiang, Y . Zhu, M. Svetlik, K. Fang, and Y . Zhu, “Synergies Be- tween Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations,” inProceedings of Robotics: Science and Systems, Virtual, July 2021

2021
[34]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

GPT-4o System Card

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[37]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space,

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,”Advances in neural information processing systems, vol. 30, 2017

2017
[38]

Rethinking bimanual robotic manipulation: Learning with decoupled interaction framework,

J.-J. Jiang, X.-M. Wu, Y .-X. He, L.-A. Zeng, Y .-L. Wei, D. Zhang, and W.-S. Zheng, “Rethinking bimanual robotic manipulation: Learning with decoupled interaction framework,”arXiv preprint arXiv:2503.09186, 2025

work page arXiv 2025
[39]

Sigmoid loss for language image pre-training,

X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid loss for language image pre-training,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 11 975–11 986

2023
[40]

On the continuity of rotation representations in neural networks,

Y . Zhou, C. Barnes, J. Lu, J. Yang, and H. Li, “On the continuity of rotation representations in neural networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5745–5753

2019
[41]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

2023
[42]

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations,”arXiv preprint arXiv:2403.03954, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[43]

Generalizable humanoid manipulation with 3d diffusion policies,

Y . Ze, Z. Chen, W. Wang, T. Chen, X. He, Y . Yuan, X. B. Peng, and J. Wu, “Generalizable humanoid manipulation with 3d diffusion policies,”arXiv preprint arXiv:2410.10803, 2024

work page arXiv 2024
[44]

Reconciling reality through simulation: A real- to-sim-to-real approach for robust manipulation,

M. Torne, A. Simeonov, Z. Li, A. Chan, T. Chen, A. Gupta, and P. Agrawal, “Reconciling reality through simulation: A real- to-sim-to-real approach for robust manipulation,”arXiv preprint arXiv:2403.03949, 2024

work page arXiv 2024

[1] [1]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichteret al., “π 0: A vision- language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Octo: An Open-Source Generalist Robot Policy

O. M. Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xuet al., “Octo: An open-source generalist robot policy,”arXiv preprint arXiv:2405.12213, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

OpenVLA: An Open-Source Vision-Language-Action Model

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketiet al., “Open- vla: An open-source vision-language-action model,”arXiv preprint arXiv:2406.09246, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

Tinyvla: Towards fast, data-efficient vision- language-action models for robotic manipulation,

J. Wen, Y . Zhu, J. Li, M. Zhu, Z. Tang, K. Wu, Z. Xu, N. Liu, R. Cheng, C. Shenet al., “Tinyvla: Towards fast, data-efficient vision- language-action models for robotic manipulation,”IEEE Robotics and Automation Letters, 2025

2025

[5] [5]

Flower: Democratizing generalist robot policies with efficient vision-language-action flow policies,

M. Reuss, H. Zhou, M. R ¨uhle, ¨O. E. Ya ˘gmurlu, F. Otto, and R. Lioutikov, “Flower: Democratizing generalist robot policies with efficient vision-language-action flow policies,”arXiv preprint arXiv:2509.04996, 2025

work page arXiv 2025

[6] [6]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, 2024

2024

[7] [7]

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,

Z. Fu, T. Z. Zhao, and C. Finn, “Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” in Conference on Robot Learning (CoRL), 2024

2024

[8] [8]

Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,

N. Funk, J. Urain, J. Carvalho, V . Prasad, G. Chalvatzaki, and J. Peters, “Actionflow: Equivariant, accurate, and efficient policies with spatially symmetric flow matching,”arXiv preprint arXiv:2409.04576, 2024

work page arXiv 2024

[9] [9]

Anyteleop: A general vision-based dexterous robot arm- hand teleoperation system,

Y . Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y .-W. Chao, and D. Fox, “Anyteleop: A general vision-based dexterous robot arm- hand teleoperation system,”arXiv preprint arXiv:2307.04577, 2023

work page arXiv 2023

[10] [10]

Humanoid policy human policy,

R.-Z. Qiu, S. Yang, X. Cheng, C. Chawla, J. Li, T. He, G. Yan, D. J. Yoon, R. Hoque, L. Paulsenet al., “Humanoid policy human policy,” arXiv preprint arXiv:2503.13441, 2025

work page arXiv 2025

[11] [11]

Crossing the human-robot embodiment gap with sim-to-real rl using one human demonstration,

T. G. W. Lum, O. Y . Lee, C. K. Liu, and J. Bohg, “Crossing the human-robot embodiment gap with sim-to-real rl using one human demonstration,”arXiv preprint arXiv:2504.12609, 2025

work page arXiv 2025

[12] [12]

Hermes: Human-to-robot embodied learning from multi-source motion data for mobile dexterous manipulation,

Z. Yuan, T. Wei, L. Gu, P. Hua, T. Liang, Y . Chen, and H. Xu, “Hermes: Human-to-robot embodied learning from multi-source motion data for mobile dexterous manipulation,”arXiv preprint arXiv:2508.20085, 2025

work page arXiv 2025

[13] [13]

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

T. Chen, Z. Chen, B. Chen, Z. Cai, Y . Liu, Q. Liang, Z. Li, X. Lin, Y . Ge, Z. Guet al., “Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation,”arXiv preprint arXiv:2506.18088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

S. Nasiriany, A. Maddukuri, L. Zhang, A. Parikh, A. Lo, A. Joshi, A. Mandlekar, and Y . Zhu, “Robocasa: Large-scale simulation of ev- eryday tasks for generalist robots,”arXiv preprint arXiv:2406.02523, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,

H. Geng, F. Wang, S. Wei, Y . Li, B. Wang, B. An, C. T. Cheng, H. Lou, P. Li, Y .-J. Wanget al., “Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning,” arXiv preprint arXiv:2504.18904, 2025

work page arXiv 2025

[16] [16]

Behavior: Bench- mark for everyday household activities in virtual, interactive, and ecological environments,

S. Srivastava, C. Li, M. Lingelbach, R. Mart ´ın-Mart´ın, F. Xia, K. E. Vainio, Z. Lian, C. Gokmen, S. Buch, K. Liuet al., “Behavior: Bench- mark for everyday household activities in virtual, interactive, and ecological environments,” inConference on robot learning. PMLR, 2022, pp. 477–490

2022

[17] [17]

Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning,

Z. Jiang, Y . Xie, K. Lin, Z. Xu, W. Wan, A. Mandlekar, L. J. Fan, and Y . Zhu, “Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 16 923–16 930

2025

[18] [18]

Cyberdemo: Augmenting simulated human demon- stration for real-world dexterous manipulation,

J. Wang, Y . Qin, K. Kuang, Y . Korkmaz, A. Gurumoorthy, H. Su, and X. Wang, “Cyberdemo: Augmenting simulated human demon- stration for real-world dexterous manipulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 17 952–17 963

2024

[19] [19]

Morphologically symmetric reinforcement learning for ambidextrous bimanual manipulation,

Z. Li, Y . Jin, D. O. Apraez, C. Semini, P. Liu, and G. Chalvatzaki, “Morphologically symmetric reinforcement learning for ambidextrous bimanual manipulation,”arXiv preprint arXiv:2505.05287, 2025

work page arXiv 2025

[20] [20]

Twisting lids off with two hands,

T. Lin, Z.-H. Yin, H. Qi, P. Abbeel, and J. Malik, “Twisting lids off with two hands,” in8th Annual Conference on Robot Learning (CoRL), 2024. [Online]. Available: https://openreview.net/forum?id= 3wBqoPfoeJ

2024

[21] [21]

Visual dexterity: In-hand dexterous manipulation from depth,

T. Chen, M. Tippur, S. Wu, V . Kumar, E. Adelson, and P. Agrawal, “Visual dexterity: In-hand dexterous manipulation from depth,” inIcml workshop on new frontiers in learning, control, and dynamical systems, 2023

2023

[22] [22]

In-hand object rotation via rapid motor adaptation,

H. Qi, A. Kumar, R. Calandra, Y . Ma, and J. Malik, “In-hand object rotation via rapid motor adaptation,” inConference on Robot Learning. PMLR, 2023, pp. 1722–1732

2023

[23] [23]

Vegetable peeling: A case study in constrained dexterous manipulation,

T. Chen, E. Cousineau, N. Kuppuswamy, and P. Agrawal, “Vegetable peeling: A case study in constrained dexterous manipulation,”arXiv preprint arXiv:2407.07884, 2024

work page arXiv 2024

[24] [24]

Rldg: Robotic general- ist policy distillation via reinforcement learning,

C. Xu, Q. Li, J. Luo, and S. Levine, “Rldg: Robotic general- ist policy distillation via reinforcement learning,”arXiv preprint arXiv:2412.09858, 2024

work page arXiv 2024

[25] [25]

Reward function design in reinforcement learning,

J. Eschmann, “Reward function design in reinforcement learning,” Reinforcement learning algorithms: Analysis and Applications, pp. 25– 33, 2021

2021

[26] [26]

X-sim: Cross-embodiment learning via real-to-sim-to- real,

P. Dan, K. Kedia, A. Chao, E. W. Duan, M. A. Pace, W.-C. Ma, and S. Choudhury, “X-sim: Cross-embodiment learning via real-to-sim-to- real,”arXiv preprint arXiv:2505.07096, 2025

work page arXiv 2025

[27] [27]

Vividex: Learning vision-based dexterous manipulation from human videos,

Z. Chen, S. Chen, E. Arlaud, I. Laptev, and C. Schmid, “Vividex: Learning vision-based dexterous manipulation from human videos,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 3336–3343

2025

[28] [28]

Sim-to-real reinforcement learning for vision-based dexterous manipulation on humanoids,

T. Lin, K. Sachdev, L. Fan, J. Malik, and Y . Zhu, “Sim-to-real reinforcement learning for vision-based dexterous manipulation on humanoids,”arXiv preprint arXiv:2502.20396, 2025

work page arXiv 2025

[29] [29]

Dextrah-g: Pixels-to- action dexterous arm-hand grasping with geometric fabrics,

T. G. W. Lum, M. Matak, V . Makoviychuk, A. Handa, A. Allshire, T. Hermans, N. D. Ratliff, and K. Van Wyk, “Dextrah-g: Pixels-to- action dexterous arm-hand grasping with geometric fabrics,”arXiv preprint arXiv:2407.02274, 2024

work page arXiv 2024

[30] [30]

Dextrah- rgb: Visuomotor policies to grasp anything with dexterous hands,

R. Singh, A. Allshire, A. Handa, N. Ratliff, and K. Van Wyk, “Dextrah- rgb: Visuomotor policies to grasp anything with dexterous hands,” arXiv preprint arXiv:2412.01791, 2024

work page arXiv 2024

[31] [31]

Learn- ing to manipulate anywhere: A visual generalizable framework for reinforcement learning,

Z. Yuan, T. Wei, S. Cheng, G. Zhang, Y . Chen, and H. Xu, “Learn- ing to manipulate anywhere: A visual generalizable framework for reinforcement learning,”arXiv preprint arXiv:2407.15815, 2024

work page arXiv 2024

[32] [32]

Se (3)- diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,

J. Urain, N. Funk, J. Peters, and G. Chalvatzaki, “Se (3)- diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5923–5930

2023

[33] [33]

Synergies Be- tween Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations,

Z. Jiang, Y . Zhu, M. Svetlik, K. Fang, and Y . Zhu, “Synergies Be- tween Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations,” inProceedings of Robotics: Science and Systems, Virtual, July 2021

2021

[34] [34]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[35] [35]

GPT-4o System Card

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radfordet al., “Gpt-4o system card,”arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[37] [37]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space,

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,”Advances in neural information processing systems, vol. 30, 2017

2017

[38] [38]

Rethinking bimanual robotic manipulation: Learning with decoupled interaction framework,

J.-J. Jiang, X.-M. Wu, Y .-X. He, L.-A. Zeng, Y .-L. Wei, D. Zhang, and W.-S. Zheng, “Rethinking bimanual robotic manipulation: Learning with decoupled interaction framework,”arXiv preprint arXiv:2503.09186, 2025

work page arXiv 2025

[39] [39]

Sigmoid loss for language image pre-training,

X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid loss for language image pre-training,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 11 975–11 986

2023

[40] [40]

On the continuity of rotation representations in neural networks,

Y . Zhou, C. Barnes, J. Lu, J. Yang, and H. Li, “On the continuity of rotation representations in neural networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5745–5753

2019

[41] [41]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

2023

[42] [42]

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations,”arXiv preprint arXiv:2403.03954, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[43] [43]

Generalizable humanoid manipulation with 3d diffusion policies,

Y . Ze, Z. Chen, W. Wang, T. Chen, X. He, Y . Yuan, X. B. Peng, and J. Wu, “Generalizable humanoid manipulation with 3d diffusion policies,”arXiv preprint arXiv:2410.10803, 2024

work page arXiv 2024

[44] [44]

Reconciling reality through simulation: A real- to-sim-to-real approach for robust manipulation,

M. Torne, A. Simeonov, Z. Li, A. Chan, T. Chen, A. Gupta, and P. Agrawal, “Reconciling reality through simulation: A real- to-sim-to-real approach for robust manipulation,”arXiv preprint arXiv:2403.03949, 2024

work page arXiv 2024