hub Canonical reference

Gigaworld-0: World models as data engine to empower embodied ai

· 2025 · arXiv 2511.19861

Canonical reference. 78% of citing Pith papers cite this work as background.

26 Pith papers citing it

Background 78% of classified citations

read on arXiv browse 26 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9

citation-polarity summary

background 7 unclear 2

representative citing papers

Flow as Flow: Modeling Robot Velocity Fields as Probability Velocity Fields for Flow-Based Object Manipulation

cs.RO · 2026-06-22 · unverdicted · novelty 7.0

Flow as Flow models robot flows as probability flows using flow matching to generate velocity fields more efficiently than prior sparse keypoint approaches.

Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

AutoScale is a closed-loop data engine using Graph-RAE for scene representation and Cluster-GA for importance-based retrieval to improve real-synthetic co-training for autonomous driving.

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

cs.RO · 2026-04-21 · unverdicted · novelty 7.0 · 2 refs

RoboWM-Bench evaluates video world models by converting their manipulation video predictions into executable actions validated in simulation, showing that visual plausibility does not guarantee physical executability.

3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

3D-ALP achieves 0.65 success on memory-dependent 5-step robotic reach tasks versus near-zero for reactive baselines by anchoring MCTS planning to a persistent 3D camera-to-world frame.

Wh0: Generative World Models as Scalable Sources of Egocentric Human Hand Manipulation Data

cs.RO · 2026-06-20 · unverdicted · novelty 6.0

Wh0 generates scalable egocentric human manipulation videos with world models and converts them to boost pretrained VLA models' zero-shot dexterous task success from 8.3% to 38.9% on 18 real-world tasks.

Mem-World: Memory-Augmented Action-Conditioned World Models for Persistent Robot Manipulation

cs.CV · 2026-06-17 · unverdicted · novelty 6.0

Mem-World augments world models with W-VMem, a wrist-view-centered surfel memory, to generate persistent action-conditioned video rollouts that improve policy evaluation correlation by 14.5% and raise task success from 58% to 72%.

iMaC: Translating Actions into Motion and Contact Images for Embodied World Models

cs.RO · 2026-06-08 · unverdicted · novelty 6.0

iMaC introduces image-based action tokens in a dual-branch architecture to improve future state prediction and control in embodied world models over vector-based baselines.

$\omega$-EVA: Envision, Verify, and Act with Latent Interactive World Models

cs.RO · 2026-06-08 · unverdicted · novelty 6.0

ω-EVA is a three-stage latent world model framework that trains action-conditioned dynamics, a language-conditioned flow policy, and a tri-branch refiner to improve embodied action generation in simulation.

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

cs.RO · 2026-04-29 · unverdicted · novelty 6.0 · 2 refs

X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.

From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

Digital Cousins is a generative real-to-sim method that creates diverse high-fidelity simulation scenes from real panoramas to improve generalization in robot learning and evaluation.

Grounded World Model for Semantically Generalizable Planning

cs.RO · 2026-04-13 · conditional · novelty 6.0

A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.

VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

cs.RO · 2026-04-10 · unverdicted · novelty 6.0

VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

eess.IV · 2026-03-30 · unverdicted · novelty 6.0

Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

A dual-tower 4D embodied world model called RoboStereo reduces geometric hallucinations and delivers over 97% relative improvement on manipulation tasks via test-time augmentation, imitative learning, and open exploration.

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

cs.CV · 2026-06-26 · unverdicted · novelty 5.0

PhysisForcing applies trajectory and relational alignment losses to DiT features in video models, improving physical plausibility on R-Bench, PAI-Bench, and EZS-Bench while raising closed-loop robotic success rates from 16% to 24%.

How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position

cs.LG · 2026-06-13 · unverdicted · novelty 5.0

The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.

Making Foresight Actionable: Repurposing Representation Alignment in World Action Models

cs.CV · 2026-06-10 · unverdicted · novelty 5.0

AGRA is an Action-Grounded Representation Alignment objective that aligns intermediate video diffusion features with semantic representations to make world action model hidden states more useful for low-level robot control, improving localization, affordance, and robustness.

World Models for Robotic Manipulation: A Survey

cs.RO · 2026-05-27 · accept · novelty 5.0

Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.

WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

cs.RO · 2026-05-18 · unverdicted · novelty 5.0

WorldArena 2.0 extends embodied world model benchmarks to visuotactile perception, interactive policy training, and diverse real and simulated robotic platforms under a unified protocol.

From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.

STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation

cs.RO · 2026-04-29 · unverdicted · novelty 5.0

STARRY uses unified diffusion to align spatial-temporal world predictions with action generation plus GASAM for geometry-aware attention, reaching 93.82%/93.30% success on 50 bimanual tasks in simulation and raising real-world success from 42.5% to 70.8%.

GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation

cs.RO · 2026-05-26 · unverdicted · novelty 4.0

GE-Sim 2.0 is a video-based closed-loop simulator for robotic manipulation that adds state expert, world judge, and acceleration modules on top of prior video generation to support policy learning and evaluation.

RLDX-1 Technical Report

cs.RO · 2026-05-05 · unverdicted · novelty 4.0 · 2 refs

RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.

World Model for Robot Learning: A Comprehensive Survey

cs.RO · 2026-04-30 · unverdicted · novelty 3.0

A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

World Models for Robotic Manipulation: A Survey cs.RO · 2026-05-27 · accept · none · ref 88
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.

Gigaworld-0: World models as data engine to empower embodied ai

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer