hub

RISE: Self-Improving Robot Policy with Compositional World Model

· 2026 · cs.RO · arXiv 2602.11075

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open full Pith review browse 17 citing papers arXiv PDF

abstract

Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 other 1

citation-polarity summary

background 3 unclear 1

representative citing papers

Targeting World Models to Compromise Robot Learning Pipelines

cs.RO · 2026-06-08 · unverdicted · novelty 7.0

World models introduce a stealthy poisoning vector into robot learning pipelines where malicious prompts or dynamics in teleoperated data activate only during synthetic trajectory generation, enabling backdoors in downstream policies.

OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation

cs.RO · 2026-05-07 · unverdicted · novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.

RoboWorld: Fast and Reliable Neural Simulators for Generalist Robot Policy Evaluation

cs.RO · 2026-07-01 · unverdicted · novelty 6.0

RoboWorld introduces an automated pipeline using autoregressive video world models and task-progress VLM scoring, plus Step Forcing for long-horizon stability, to achieve high correlation with real robot policy evaluation.

STEAM: Self-Supervised Temporal Ensemble Advantage Modeling for Real-World Robot Learning

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

STEAM learns advantages from expert trajectories via self-supervised temporal ensemble modeling to improve policy learning on real robot tasks like bimanual folding and pick-and-place.

SCAR: Self-Supervised Continuous Action Representation Learning

cs.RO · 2026-05-13 · unverdicted · novelty 6.0

SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.

Reinforcing VLAs in Task-Agnostic World Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.

SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.

TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

TAMEn supplies a cross-morphology wearable interface and pyramid-structured visuo-tactile data regime that raises bimanual manipulation success rates from 34% to 75% via closed-loop collection.

How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position

cs.LG · 2026-06-13 · unverdicted · novelty 5.0

The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.

DexPIE: Stable Dexterous Policy Improvement from Real-World Experience

cs.RO · 2026-06-08 · unverdicted · novelty 5.0

DexPIE improves dexterous manipulation success rates by 37% over demo policies via real-world experience collection with adapted intervention, multi-stage DAgger, asynchronous relative-action inference, and optimality conditioning.

Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning

cs.RO · 2026-06-04 · unverdicted · novelty 5.0

Discrete-WAM unifies world modeling and policy learning for autonomous driving by representing observations, states, decisions, and actions as tokens in one space and using hierarchical token editing for planning.

GeoAlign: Beyond Semantics with State-Guided Spatial Alignment in VLA Models

cs.RO · 2026-06-02 · unverdicted · novelty 5.0

GeoAlign post-trains an RGB geometry branch on robot RGB-D data to produce GEP features that are queried by proprioceptive state to generate phase-dependent geometry tokens, yielding 99.0% on LIBERO, 85.3% on SimplerEnv-Fractal, and 78.8% on real ALOHA tasks.

World Models for Robotic Manipulation: A Survey

cs.RO · 2026-05-27 · accept · novelty 5.0

Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.

SANTS: A State-Adaptive Scheduler for World Action Models

cs.RO · 2026-05-27 · unverdicted · novelty 5.0

SANTS adaptively chooses denoising depth in video-based robot action diffusion policies using a state-dependent stopping hazard and noise ratio, trained via downstream action reward to reduce latency.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

World Model for Robot Learning: A Comprehensive Survey

cs.RO · 2026-04-30 · unverdicted · novelty 3.0

A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

DVG-WM: Disentangled Video Generation Enables Efficient Embodied World Model for Robotic Manipulation

cs.RO · 2026-06-30

citing papers explorer

Showing 17 of 17 citing papers.

Targeting World Models to Compromise Robot Learning Pipelines cs.RO · 2026-06-08 · unverdicted · none · ref 15 · internal anchor
World models introduce a stealthy poisoning vector into robot learning pipelines where malicious prompts or dynamics in teleoperated data activate only during synthetic trajectory generation, enabling backdoors in downstream policies.
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation cs.RO · 2026-05-07 · unverdicted · none · ref 86 · internal anchor
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
RoboWorld: Fast and Reliable Neural Simulators for Generalist Robot Policy Evaluation cs.RO · 2026-07-01 · unverdicted · none · ref 37 · internal anchor
RoboWorld introduces an automated pipeline using autoregressive video world models and task-progress VLM scoring, plus Step Forcing for long-horizon stability, to achieve high correlation with real robot policy evaluation.
STEAM: Self-Supervised Temporal Ensemble Advantage Modeling for Real-World Robot Learning cs.RO · 2026-06-29 · unverdicted · none · ref 6 · internal anchor
STEAM learns advantages from expert trajectories via self-supervised temporal ensemble modeling to improve policy learning on real robot tasks like bimanual folding and pick-and-place.
SCAR: Self-Supervised Continuous Action Representation Learning cs.RO · 2026-05-13 · unverdicted · none · ref 30 · internal anchor
SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.
Reinforcing VLAs in Task-Agnostic World Models cs.AI · 2026-05-12 · unverdicted · none · ref 39 · 2 links · internal anchor
RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds cs.RO · 2026-04-09 · unverdicted · none · ref 52 · internal anchor
SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.
TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks cs.RO · 2026-04-08 · unverdicted · none · ref 6 · internal anchor
TAMEn supplies a cross-morphology wearable interface and pyramid-structured visuo-tactile data regime that raises bimanual manipulation success rates from 34% to 75% via closed-loop collection.
How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position cs.LG · 2026-06-13 · unverdicted · none · ref 65 · internal anchor
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.
DexPIE: Stable Dexterous Policy Improvement from Real-World Experience cs.RO · 2026-06-08 · unverdicted · none · ref 40 · internal anchor
DexPIE improves dexterous manipulation success rates by 37% over demo policies via real-world experience collection with adapted intervention, multi-stage DAgger, asynchronous relative-action inference, and optimality conditioning.
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning cs.RO · 2026-06-04 · unverdicted · none · ref 88 · internal anchor
Discrete-WAM unifies world modeling and policy learning for autonomous driving by representing observations, states, decisions, and actions as tokens in one space and using hierarchical token editing for planning.
GeoAlign: Beyond Semantics with State-Guided Spatial Alignment in VLA Models cs.RO · 2026-06-02 · unverdicted · none · ref 12 · internal anchor
GeoAlign post-trains an RGB geometry branch on robot RGB-D data to produce GEP features that are queried by proprioceptive state to generate phase-dependent geometry tokens, yielding 99.0% on LIBERO, 85.3% on SimplerEnv-Fractal, and 78.8% on real ALOHA tasks.
World Models for Robotic Manipulation: A Survey cs.RO · 2026-05-27 · accept · none · ref 98 · internal anchor
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.
SANTS: A State-Adaptive Scheduler for World Action Models cs.RO · 2026-05-27 · unverdicted · none · ref 24 · internal anchor
SANTS adaptively chooses denoising depth in video-based robot action diffusion policies using a state-dependent stopping hazard and noise ratio, trained via downstream action reward to reduce latency.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 47 · internal anchor
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
World Model for Robot Learning: A Comprehensive Survey cs.RO · 2026-04-30 · unverdicted · none · ref 61 · internal anchor
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.
DVG-WM: Disentangled Video Generation Enables Efficient Embodied World Model for Robotic Manipulation cs.RO · 2026-06-30 · unreviewed · ref 45 · internal anchor

RISE: Self-Improving Robot Policy with Compositional World Model

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer