Canonical reference

Scalable Policy Evaluation with Video World Models

· 2025 · arXiv 2511.11520

Canonical reference. 80% of citing Pith papers cite this work as background.

12 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 12 citing papers

citation-role summary

background 4 other 1

citation-polarity summary

background 4 unclear 1

representative citing papers

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

EA-WM generates more accurate robot world rollouts by projecting actions as structured visual fields in camera space and using event-aware bidirectional fusion to better capture interaction dynamics.

PlayWorld: Learning Robot World Models from Autonomous Play

cs.RO · 2026-03-09 · unverdicted · novelty 7.0

PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

RoboWorld: Fast and Reliable Neural Simulators for Generalist Robot Policy Evaluation

cs.RO · 2026-07-01 · unverdicted · novelty 6.0

RoboWorld introduces an automated pipeline using autoregressive video world models and task-progress VLM scoring, plus Step Forcing for long-horizon stability, to achieve high correlation with real robot policy evaluation.

SC3-Eval: Evaluating Robot Foundation Models via Self-Consistent Video Generation

cs.RO · 2026-06-17 · unverdicted · novelty 6.0

SC3-Eval enforces three consistencies on a video model to produce policy rollouts that correlate 0.929 with real-world performance across seven vision-language-action policies and reproduce observed failure modes.

OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics

cs.RO · 2026-06-03 · unverdicted · novelty 6.0

OSCAR finetunes Cosmos-Predict2.5-2B on a deduplicated multi-embodiment robotics dataset with kinematic skeleton conditioning, claiming better action following and significant correlation between virtual and real robot policy evaluations.

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.

Reinforcing VLAs in Task-Agnostic World Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.

Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion

cs.RO · 2026-01-31 · unverdicted · novelty 6.0

MoE-based locomotion policy with RoboGauge metrics achieves reliable sim-to-real transfer, enabling robust quadrupedal walking on challenging unseen terrains up to 4 m/s.

How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position

cs.LG · 2026-06-13 · unverdicted · novelty 5.0

The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.

Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.

citing papers explorer

Showing 12 of 12 citing papers after filters.

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields cs.CV · 2026-05-07 · unverdicted · none · ref 22
EA-WM generates more accurate robot world rollouts by projecting actions as structured visual fields in camera space and using event-aware bidirectional fusion to better capture interaction dynamics.
PlayWorld: Learning Robot World Models from Autonomous Play cs.RO · 2026-03-09 · unverdicted · none · ref 57
PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos cs.RO · 2026-02-06 · unverdicted · none · ref 90
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
RoboWorld: Fast and Reliable Neural Simulators for Generalist Robot Policy Evaluation cs.RO · 2026-07-01 · unverdicted · none · ref 56
RoboWorld introduces an automated pipeline using autoregressive video world models and task-progress VLM scoring, plus Step Forcing for long-horizon stability, to achieve high correlation with real robot policy evaluation.
SC3-Eval: Evaluating Robot Foundation Models via Self-Consistent Video Generation cs.RO · 2026-06-17 · unverdicted · none · ref 5
SC3-Eval enforces three consistencies on a video model to produce policy rollouts that correlate 0.929 with real-world performance across seven vision-language-action policies and reproduce observed failure modes.
OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics cs.RO · 2026-06-03 · unverdicted · none · ref 27
OSCAR finetunes Cosmos-Predict2.5-2B on a deduplicated multi-embodiment robotics dataset with kinematic skeleton conditioning, claiming better action following and significant correlation between virtual and real robot policy evaluations.
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement cs.CV · 2026-05-29 · unverdicted · none · ref 112
StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.
Reinforcing VLAs in Task-Agnostic World Models cs.AI · 2026-05-12 · unverdicted · none · ref 33 · 2 links
RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model cs.RO · 2026-04-24 · unverdicted · none · ref 40
A discrete diffusion model tokenizes multimodal robotic data and uses a progress token to predict future states and task completion for scalable policy evaluation.
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion cs.RO · 2026-01-31 · unverdicted · none · ref 47
MoE-based locomotion policy with RoboGauge metrics achieves reliable sim-to-real transfer, enabling robust quadrupedal walking on challenging unseen terrains up to 4 m/s.
How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position cs.LG · 2026-06-13 · unverdicted · none · ref 53
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models cs.CV · 2026-05-07 · unverdicted · none · ref 62
Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.

Scalable Policy Evaluation with Video World Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer