Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Playerone: Egocentric world simulator
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
CrashTwin is a new benchmark framework that exposes physical violations in state-of-the-art world models during multi-agent collisions despite high visual quality.
E³C is a video diffusion model that disentangles persistent 3D scene structure via point-cloud memory from human dynamics via ego-exo pose controls for improved egocentric video generation on the Nymeria dataset.
DexSIM is a bi-directional video diffusion model with hand trajectory embedding and spatial memory cache for real-time dexterous hand-object simulation at 15 FPS.
Method converts exocentric videos to egocentric format via body-pose extraction and kinematics to improve egocentric world-model prediction and planning.
Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model
citing papers explorer
-
A Physics-Grounded Benchmark for Multi-Agent Dynamics in World Models
CrashTwin is a new benchmark framework that exposes physical violations in state-of-the-art world models during multi-agent collisions despite high visual quality.
-
E$^3$C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control
E³C is a video diffusion model that disentangles persistent 3D scene structure via point-cloud memory from human dynamics via ego-exo pose controls for improved egocentric video generation on the Nymeria dataset.
-
DexSIM: Real-time Dexterous Simulation with Unified Causal Video Diffusion
DexSIM is a bi-directional video diffusion model with hand trajectory embedding and spatial memory cache for real-time dexterous hand-object simulation at 15 FPS.
-
EgoExo-WM: Unlocking Exo Video for Ego World Models
Method converts exocentric videos to egocentric format via body-pose extraction and kinematics to improve egocentric world-model prediction and planning.
-
Continuous Latent Diffusion Language Model
Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model