RISE: Self-Improving Robot Policy with Compositional World Model
Pith reviewed 2026-05-16 02:23 UTC · model grok-4.3
The pith
A compositional world model lets robot policies self-improve through imagined rollouts without physical interaction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RISE integrates a controllable dynamics model that predicts multi-view future states with a progress value model that estimates advantages from those imagined trajectories, forming a closed-loop pipeline that continuously updates the policy in imaginary space.
What carries the argument
Compositional World Model that separates controllable dynamics prediction of multi-view futures from progress value estimation to generate reliable advantages for policy improvement.
If this is right
- Policy updates occur continuously without physical resets or environment interaction.
- Absolute success rates rise by more than 35 percent on dynamic brick sorting, 45 percent on backpack packing, and 35 percent on box closing.
- Distinct architectures can be chosen for state prediction and value estimation while still producing coherent advantages.
- The same pipeline scales across multiple contact-rich manipulation tasks without task-specific retraining of the full system.
Where Pith is reading between the lines
- The separation of dynamics and value heads could allow independent scaling of each component as larger pre-trained vision models become available.
- If the world model remains accurate over longer horizons, the same loop might support multi-step planning rather than single-step advantage estimation.
- Success on these three tasks suggests the method could transfer to other reset-free settings such as mobile manipulation in unstructured homes.
Load-bearing premise
The world model must produce accurate enough future predictions and progress values that the resulting advantages actually improve the real robot policy.
What would settle it
If repeated real-world rollouts after imagined policy updates show no measurable increase in task success rates on the brick sorting, backpack packing, or box closing benchmarks, the central claim would be falsified.
Figures
read the original abstract
Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces RISE, a scalable robotic RL framework that uses a Compositional World Model to generate imaginary rollouts: a controllable dynamics model predicts multi-view future states while a separate progress value model produces advantages for policy updates. The closed-loop pipeline performs all improvement in imagination without physical interactions or resets. The central empirical claim is that this yields large absolute gains over prior art on three real-world contact-rich tasks (+35% dynamic brick sorting, +45% backpack packing, +35% box closing).
Significance. If the world-model predictions remain accurate enough to produce informative advantages, the approach would meaningfully reduce the safety, cost, and reset barriers that currently limit on-policy RL for physical robots. The separation of dynamics and value modeling into distinct architectures is a clean design choice that could generalize. However, the absence of any reported prediction-error or value-correlation metrics makes it impossible to judge whether the claimed gains rest on reliable imagination or on unverified assumptions about compounding dynamics.
major comments (2)
- [Abstract] Abstract: the headline absolute gains (+35–45%) are stated without any reference to trial counts, statistical significance, baseline re-implementations, or controls for post-hoc task selection. These details are load-bearing for the central claim that RISE outperforms prior art.
- [Method] Compositional World Model (method description): the framework asserts that multi-view future predictions and progress-value estimates remain reliable enough to drive policy improvement, yet no quantitative checks—multi-step prediction MSE, value correlation with real returns, or measured sim-to-real gap—are supplied. In contact-rich tasks, even modest compounding errors would render the imagined advantages uninformative.
minor comments (1)
- [Abstract] Abstract: the phrase 'compositional design' is used without a concise definition or pointer to the precise architectural split between the dynamics and value components.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional experimental details and quantitative analyses where feasible.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline absolute gains (+35–45%) are stated without any reference to trial counts, statistical significance, baseline re-implementations, or controls for post-hoc task selection. These details are load-bearing for the central claim that RISE outperforms prior art.
Authors: We agree these details are essential for rigorous interpretation of the results. In the revised manuscript we have updated the abstract to report that all gains are averaged over 100 independent real-world trials per task with standard errors, that statistical significance was assessed via paired t-tests (p < 0.01 against each baseline), that all baselines were re-implemented from the original authors’ code or detailed descriptions, and that the three tasks were selected a priori from established contact-rich manipulation benchmarks rather than through post-hoc selection. revision: yes
-
Referee: [Method] Compositional World Model (method description): the framework asserts that multi-view future predictions and progress-value estimates remain reliable enough to drive policy improvement, yet no quantitative checks—multi-step prediction MSE, value correlation with real returns, or measured sim-to-real gap—are supplied. In contact-rich tasks, even modest compounding errors would render the imagined advantages uninformative.
Authors: We acknowledge the importance of these diagnostics. The revised manuscript now includes a dedicated subsection (4.3) reporting: (i) multi-step prediction MSE on held-out real trajectories for horizons matching the imagination length, (ii) Pearson correlation between the progress value model outputs and actual discounted returns collected from real rollouts, and (iii) a direct comparison of advantages computed in imagination versus those obtained from limited real-world rollouts, quantifying the sim-to-real gap. These metrics support that compounding errors remain within a range that preserves informative advantage signals, as evidenced by the consistent real-world policy gains. revision: yes
Circularity Check
No circularity: empirical framework with no self-referential derivations or fitted predictions
full rationale
The manuscript presents RISE as a scalable RL-via-imagination framework built around a compositional world model (controllable dynamics + progress value) that generates imaginary rollouts for policy updates. No equations, uniqueness theorems, or parameter-fitting steps are described that would reduce the reported real-world gains (+35–45% absolute) to quantities defined by the same model's outputs or self-citations. The performance claims rest on external task evaluations rather than internal redefinitions, leaving the derivation chain self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A learned compositional world model can produce multi-view future predictions and progress values accurate enough to drive policy improvement without physical interaction.
invented entities (1)
-
Compositional World Model
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages... A(ot,at,ℓ) = (1/H Σ V(ˆot+k,ℓ)) − V(ot,ℓ)
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Progress Value Model... Lprog = E[(V(ot,ℓ) − t/T)²] ... LTD = E[(V(ot,ℓ) − yt)²] with yt = rt + γV(ot+1,ℓ)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 8 Pith papers
-
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
-
SCAR: Self-Supervised Continuous Action Representation Learning
SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.
-
Reinforcing VLAs in Task-Agnostic World Models
RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.
-
Reinforcing VLAs in Task-Agnostic World Models
RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.
-
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformabl...
-
TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks
TAMEn supplies a cross-morphology wearable interface and pyramid-structured visuo-tactile data regime that raises bimanual manipulation success rates from 34% to 75% via closed-loop collection.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
World Model for Robot Learning: A Comprehensive Survey
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datase...
Reference graph
Works this paper leans on
-
[1]
World Simulation with Video Foundation Models for Physical AI
Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical ai.arXiv preprint arXiv:2511.00062, 2025. 4, 7, 8, 18
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
$\pi^{*}_{0.6}$: a VLA That Learns From Experience
Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Jared DiCarlo, Danny Driess, et al.π ∗ 0.6: a vla that learns from experience. arXiv preprint arXiv:2511.14759, 2025. 3, 5, 6, 8, 14, 16, 17, 18
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mido Assran, Adrien Bardes, David Fan, Quentin Gar- rido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-JEPA 2: Self-supervised video models enable un- derstanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025. 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Efficient online reinforcement learning with offline data
Philip J Ball, Laura Smith, Ilya Kostrikov, and Sergey Levine. Efficient online reinforcement learning with offline data. InICML, 2023. 8
work page 2023
-
[5]
Leonardo Barcellona, Andrii Zadaianchuk, Davide Al- legro, Samuele Papa, Stefano Ghidoni, and Efstratios Gavves. Dream to Manipulate: Compositional world models empowering robot imitation learning with imag- ination.arXiv preprint arXiv:2412.14957, 2024. 2, 8
-
[6]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Johan Bjorck, Fernando Casta ˜neda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. GR00T N1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. 18
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024. 1, 6, 18
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mo- hith Mothukuri, Suraj Nair, Karl Pertsch, Allen Z. Ren, Lucy...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
RT-1: Robotics transformer for real-world control at scale
Anthony Brohan, Noah Brown, Justice Carbajal, Yev- gen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. RT-1: Robotics transformer for real-world control at scale. InRSS, 2023. 1
work page 2023
-
[10]
Genie: Generative interactive environments
Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InICML,
-
[11]
Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Xindong He, Xu Huang, et al. AgiBot World Colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems. InIROS,
-
[12]
Univla: Learning to act anywhere with task-centric latent actions
Qingwen Bu, Yanting Yang, Jisong Cai, Shenyuan Gao, Guanghui Ren, Maoqing Yao, Ping Luo, and Hongyang Li. Univla: Learning to act anywhere with task-centric latent actions. InRSS, 2025. 18
work page 2025
-
[13]
Diwa: Diffusion policy adaptation with world models.arXiv preprint arXiv:2508.03645,
Akshay L Chandra, Iman Nematollahi, Chenguang Huang, Tim Welschehold, Wolfram Burgard, and Ab- hinav Valada. DiW A: Diffusion policy adaptation with world models.arXiv preprint arXiv:2508.03645, 2025. 5, 8
- [14]
-
[15]
Intelli- gent robot manipulation requires self-directed learning
Li Chen, Chonghao Sima, Kashyap Chitta, Antonio Loquercio, Ping Luo, Yi Ma, and Hongyang Li. Intelli- gent robot manipulation requires self-directed learning. OpenReview, 2026. URL https://openreview.net/forum? id=Seb7rprW1Y. Accessed: 2026-01-02. 2
work page 2026
-
[16]
Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Zixuan Li, Qiwei Liang, Xianliang Lin, Yiheng Ge, Zhenyu Gu, et al. RoboTwin 2.0: A scalable data generator and benchmark with strong domain ran- domization for robust bimanual robotic manipulation. arXiv preprint arXiv:2506.18088, 2025. 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
arXiv preprint arXiv:2506.08440 , year=
Zengjue Chen, Runliang Niu, He Kong, Qi Wang, Qianli Xing, and Zipei Fan. TGRPO: Fine- tuning vision-language-action model via trajectory-wise group relative policy optimization.arXiv preprint arXiv:2506.08440, 2025. 8
-
[18]
Dif- fusion Policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion Policy: Visuomotor policy learning via action diffusion. InRSS, 2023. 18
work page 2023
-
[19]
Universal Manipulation Interface: In-the- wild robot teaching without in-the-wild robots
Cheng Chi, Zhenjia Xu, Chuer Pan, Eric Cousineau, Benjamin Burchfiel, Siyuan Feng, Russ Tedrake, and Shuran Song. Universal Manipulation Interface: In-the- wild robot teaching without in-the-wild robots. InRSS,
-
[20]
Tenenbaum, Leslie Pack Kaelbling, Andy Zeng, and Jonathan Tompson
Yilun Du, Sherry Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Pack Kaelbling, Andy Zeng, and Jonathan Tompson. Video Language Planning. InICLR, 2024. 2, 8
work page 2024
-
[21]
Scaling rectified flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024. 15
work page 2024
-
[22]
MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting
Kuan Fang, Fangchen Liu, Pieter Abbeel, and Sergey Levine. MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting. InRSS, 2024. 1
work page 2024
-
[23]
Diffusion guidance is a controllable policy im- provement operator.arXiv preprint arXiv:2505.23458,
Kevin Frans, Seohong Park, Pieter Abbeel, and Sergey Levine. Diffusion guidance is a controllable policy im- provement operator.arXiv preprint arXiv:2505.23458,
-
[24]
Adaworld: Learning adaptable world models with latent actions.arXiv preprint arXiv:2503.18938, 2025
Shenyuan Gao, Siyuan Zhou, Yilun Du, Jun Zhang, and Chuang Gan. AdaWorld: Learning adaptable world models with latent actions.arXiv preprint arXiv:2503.18938, 2025. 8
-
[25]
Self-improving embodied foundation models
Seyed Kamyar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson, Pannag Sanketi, and Igor Mordatch. Self-improving embodied foundation models.arXiv preprint arXiv:2509.15155, 2025. 2, 8, 18
-
[26]
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation.arXiv preprint arXiv:2510.10125, 2025. 2, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Recurrent World Models Facilitate Policy Evolution
David Ha and J ¨urgen Schmidhuber. Recurrent World Models Facilitate Policy Evolution. InNeurIPS, 2018. 2, 8
work page 2018
-
[28]
LTX-Video: Realtime Video Latent Diffusion
Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, et al. LTX- Video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024. 2, 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to Control: Learning Behaviors by Latent Imagination.arXiv preprint arXiv:1912.01603, 2019. 2, 8
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[30]
Mastering Atari with Discrete World Models
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering Atari with Discrete World Models. InICLR, 2021. 2, 8
work page 2021
-
[31]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Tim- othy Lillicrap. Mastering Diverse Domains through World Models.arXiv preprint arXiv:2301.04104, 2023. 2, 8
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Training Agents Inside of Scalable World Models
Danijar Hafner, Wilson Yan, and Timothy Lillicrap. Training agents inside of scalable world models.arXiv preprint arXiv:2509.24527, 2025. 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, and Sergey Levine
Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, and Aravind Rajeswaran. MoDem: Accelerating visual model-based reinforcement learning with demonstrations.arXiv preprint arXiv:2212.05698,
-
[34]
Temporal difference learning for model predictive control
Nicklas Hansen, Hao Su, and Xiaolong Wang. Temporal difference learning for model predictive control. In ICML, 2022. 8
work page 2022
-
[35]
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, and Xiaolong Wang. TD- MPC2: Scalable, robust world models for continuous control.arXiv preprint arXiv:2310.16828, 2023. 8
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
Nicklas Hansen, Jyothir SV , Vlad Sobal, Yann LeCun, Xiaolong Wang, and Hao Su. Hierarchical world models as visual whole-body humanoid controllers.arXiv preprint arXiv:2405.18418, 2024. 8
-
[37]
Zheyuan Hu, Robyn Wu, Naveen Enock, Jasmine Li, Riya Kadakia, Zackory Erickson, and Aviral Ku- mar. RaC: Robot learning for long-horizon tasks by scaling recovery and correction.arXiv preprint arXiv:2509.07953, 2025. 2
-
[38]
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self Forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv preprint arXiv:2506.08009, 2025. 5
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Chia-Yu Hung, Navonil Majumder, Haoyuan Deng, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, and Soujanya Poria. NORA-1.5: A vision-language-action model trained us- ing world model-and action-based preference rewards. arXiv preprint arXiv:2511.14659, 2025. 8
-
[40]
Vetrov, and Andrew Gordon Wilson
Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. Av- eraging weights leads to wider optima and better gen- eralization. InUAI, 2018. 5
work page 2018
-
[41]
DreamGen: Un- locking generalization in robot learning through video world models
Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, et al. DreamGen: Un- locking generalization in robot learning through video world models. InCoRL, 2025. 8, 18
work page 2025
-
[42]
Whole- BodyVLA: Towards unified latent vla for whole-body loco-manipulation control
Haoran Jiang, Jin Chen, Qingwen Bu, Li Chen, Modi Shi, Yanjie Zhang, Delong Li, Chuanzhe Suo, Chuang Wang, Zhihui Peng, and Hongyang Li. Whole- BodyVLA: Towards unified latent vla for whole-body loco-manipulation control. InICLR, 2026. 18
work page 2026
-
[43]
Galaxea open-world dataset and G0 dual-system VLA model.arXiv preprint arXiv:2509.00576, 2025
Tao Jiang, Tianyuan Yuan, Yicheng Liu, Chenhao Lu, Jianning Cui, Xiao Liu, Shuiqi Cheng, Jiyang Gao, Huazhe Xu, and Hang Zhao. Galaxea open-world dataset and g0 dual-system vla model.arXiv preprint arXiv:2509.00576, 2025. 4, 6, 18
-
[44]
Zhennan Jiang, Kai Liu, Yuxin Qin, Shuai Tian, Yu- peng Zheng, Mingcai Zhou, Chao Yu, Haoran Li, and Dongbin Zhao. World4RL: Diffusion world models for policy refinement with reinforcement learning for robotic manipulation.arXiv preprint arXiv:2509.19080,
-
[45]
HG-DAgger: Interactive imitation learning with human experts
Michael Kelly, Chelsea Sidrane, Katherine Driggs- Campbell, and Mykel J Kochenderfer. HG-DAgger: Interactive imitation learning with human experts. In ICRA, 2019. 2, 6
work page 2019
-
[46]
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Paul Foster, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. OpenVLA: An open-source vision-language- action model. InCoRL, 2024. 1, 18
work page 2024
-
[47]
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Moo Jin Kim, Chelsea Finn, and Percy Liang. Fine-tuning vision-language-action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645,
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
Aviral Kumar, Xue Bin Peng, and Sergey Levine. Reward-conditioned policies.arXiv preprint arXiv:1912.13465, 2019. 6, 8
-
[49]
MoDem-V2: Visuo-motor world models for real-world robot manipulation
Patrick Lancaster, Nicklas Hansen, Aravind Rajeswaran, and Vikash Kumar. MoDem-V2: Visuo-motor world models for real-world robot manipulation. InICRA,
-
[50]
A path towards autonomous machine intelligence.Open Review, 2022
Yann LeCun. A path towards autonomous machine intelligence.Open Review, 2022. 2, 8
work page 2022
-
[51]
Rl-100: Performant robotic manipulation with real-world reinforcement learning, 2025
Kun Lei, Huanyu Li, Dongjie Yu, Zhenyu Wei, Lingxiao Guo, Zhennan Jiang, Ziyu Wang, Shiyu Liang, and Huazhe Xu. RL-100: Performant robotic manipulation with real-world reinforcement learning.arXiv preprint arXiv:2510.14830, 2025. 8
-
[52]
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems.arXiv preprint arXiv:2005.01643, 2020. 2
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[53]
Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gok- men, Sanjana Srivastava, Roberto Mart ´ın-Mart´ın, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, et al. BEHA VIOR-1K: A human-centered, embodied ai benchmark with 1,000 everyday activities and realistic simulation.arXiv preprint arXiv:2403.09227, 2024. 18
work page internal anchor Pith review arXiv 2024
-
[54]
Chenhao Li, Andreas Krause, and Marco Hutter. Robotic World Model: A neural network simulator for robust policy optimization in robotics.arXiv preprint arXiv:2501.10100, 2025. 8
-
[55]
Li, S., Wu, K., Zhang, C., and Zhu, Y
Dacheng Li, Yunhao Fang, Yukang Chen, Shuo Yang, Shiyi Cao, Justin Wong, Michael Luo, Xiaolong Wang, Hongxu Yin, Joseph E Gonzalez, et al. WorldMod- elBench: Judging video generation models as world models.arXiv preprint arXiv:2502.20694, 2025. 2
-
[56]
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Haozhan Li, Yuxin Zuo, Jiale Yu, Yuhao Zhang, Zhao- hui Yang, Kaiyan Zhang, Xuekai Zhu, Yuchen Zhang, Tianxing Chen, Ganqu Cui, et al. SimpleVLA-RL: Scaling vla training via reinforcement learning.arXiv preprint arXiv:2509.09674, 2025. 2, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
A comprehensive survey on world models for embodied ai.arXiv preprint arXiv:2510.16732, 2025
Xinqing Li, Xin He, Le Zhang, Min Wu, Xiaoli Li, and Yun Liu. A comprehensive survey on world models for embodied ai.arXiv preprint arXiv:2510.16732, 2025. 2
-
[58]
Gr-rl: Going dexterous and precise for long-horizon robotic manipulation
Yunfei Li, Xiao Ma, Jiafeng Xu, Yu Cui, Zhongren Cui, Zhigang Han, Liqun Huang, Tao Kong, Yuxiao Liu, Hao Niu, et al. GR-RL: Going dexterous and precise for long-horizon robotic manipulation.arXiv preprint arXiv:2512.01801, 2025. 8
-
[59]
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, et al. Genie Envisioner: A unified world foundation platform for robotic manipulation. arXiv preprint arXiv:2508.05635, 2025. 2, 4, 7, 8, 17, 18
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[60]
LIBERO: Benchmark- ing knowledge transfer for lifelong robot learning
Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. LIBERO: Benchmark- ing knowledge transfer for lifelong robot learning. In NeurIPS, 2023. 2, 8
work page 2023
-
[61]
What can rl bring to vla generalization? an empirical study
Jijia Liu, Feng Gao, Bingwen Wei, Xinlei Chen, Qing- min Liao, Yi Wu, Chao Yu, and Yu Wang. What can rl bring to vla generalization? an empirical study. In NeurIPS, 2025. 2, 8
work page 2025
-
[62]
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. RDT-1B: A diffusion foundation model for bi- manual manipulation.arXiv preprint arXiv:2410.07864,
work page internal anchor Pith review Pith/arXiv arXiv
-
[63]
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu, Wenkai Guo, Chubin Zhang, Yuheng Zhou, Haonan Jiang, Zifeng Gao, Yansong Tang, and Ziwei Wang. VLA-RL: Towards masterful and general robotic manipulation with scalable reinforcement learn- ing.arXiv preprint arXiv:2505.18719, 2025. 2, 8
work page internal anchor Pith review arXiv 2025
-
[64]
SERL: A software suite for sample-efficient robotic reinforcement learning
Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, and Sergey Levine. SERL: A software suite for sample-efficient robotic reinforcement learning. InICRA, 2024. 2, 8
work page 2024
-
[65]
Jianlan Luo, Charles Xu, Jeffrey Wu, and Sergey Levine. Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning.Science Robotics, 2025. 2, 8, 17
work page 2025
-
[66]
Vision language models are in-context value learners
Yecheng Jason Ma, Joey Hejna, Chuyuan Fu, Dhruv Shah, Jacky Liang, Zhuo Xu, Sean Kirmani, Peng Xu, Danny Driess, Ted Xiao, et al. Vision language models are in-context value learners. InICLR, 2024. 2, 8
work page 2024
-
[67]
Oier Mees, Lukas Hermann, Erick Rosete-Beas, and Wolfram Burgard. CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.RA-L, 2022. 8
work page 2022
-
[68]
Structured world models from human videos
Russell Mendonca, Shikhar Bahl, and Deepak Pathak. Structured world models from human videos. InCoRL,
-
[69]
RoboTwin: Dual-arm robot benchmark with generative digital twins
Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, et al. RoboTwin: Dual-arm robot benchmark with generative digital twins. In CVPR, 2025. 8
work page 2025
-
[70]
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Man- dlekar, and Yuke Zhu. RoboCasa: Large-scale simu- lation of everyday tasks for generalist robots.arXiv preprint arXiv:2406.02523, 2024. 18
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[71]
Open X-Embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collabo- ration
Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. Open X-Embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collabo- ration. InICRA, 2024. 18
work page 2024
-
[72]
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning.arXiv preprint arXiv:1910.00177, 2019. 2
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[73]
A reduction of imitation learning and structured prediction to no-regret online learning
Stephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InAISTATS, 2011. 2, 6
work page 2011
-
[74]
Learned perceptive forward dynamics model for safe and platform-aware robotic navigation
Pascal Roth, Jonas Frey, Cesar Cadena, and Marco Hutter. Learned perceptive forward dynamics model for safe and platform-aware robotic navigation. InRSS,
-
[75]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy opti- mization algorithms.arXiv preprint arXiv:1707.06347,
work page internal anchor Pith review Pith/arXiv arXiv
-
[76]
Is Diversity All You Need for Scalable Robotic Manipulation?
Modi Shi, Li Chen, Jin Chen, Yuxiang Lu, Chiming Liu, Guanghui Ren, Ping Luo, Di Huang, Maoqing Yao, and Hongyang Li. Is diversity all you need for scalable robotic manipulation?arXiv preprint arXiv:2507.06219,
-
[77]
Richard S. Sutton. Learning to predict by the methods of temporal differences.Machine learning, 1988. 2, 5, 8
work page 1988
-
[78]
Richard S. Sutton. Dyna, an integrated architecture for learning, planning, and reacting.ACM Sigart Bulletin,
-
[79]
Fvd: A new metric for video generation
Thomas Unterthiner, Sjoerd Van Steenkiste, Karol Ku- rach, Rapha ¨el Marinier, Marcin Michalski, and Sylvain Gelly. Fvd: A new metric for video generation. 2019. 8
work page 2019
-
[80]
Steer- ing your diffusion policy with latent space reinforce- ment learning
Andrew Wagenmaker, Yunchu Zhang, Mitsuhiko Nakamoto, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, and Sergey Levine. Steer- ing your diffusion policy with latent space reinforce- ment learning. InCoRL, 2025. 2, 6, 8, 14, 16, 17
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.