Yan: Foundational interactive video generation

Deheng Ye, Fangyun Zhou, Jiacheng Lv, Jianqi Ma, Jun Zhang, Junyan Lv, Junyou Li, Minwen Deng, Mingyu Yang, Qiang Fu, et al · 2025 · arXiv 2508.08601

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

DisCo: World Models with Discrete Camera Motion Control

cs.CV · 2026-06-06 · unverdicted · novelty 6.0

DisCo uses discrete action primitives for camera control in video world models to achieve more reliable action following than continuous trajectories.

Streaming Video Generation with Streaming Force Control

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

StreamForce presents a unified causal model for force-controllable streaming video generation using a new force representation and distillation pipeline, claiming SOTA force adherence and 16.6 FPS performance.

minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

minWM supplies an end-to-end pipeline that fine-tunes bidirectional T2V/TI2V models with camera control then distills them via Causal Forcing into few-step autoregressive generators for low-latency rollout.

WorldKV: Efficient World Memory with World Retrieval and Compression

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

WorldKV enables persistent world memory in autoregressive video diffusion models by selectively retrieving and compressing KV-cache chunks, matching full-cache fidelity at roughly twice the throughput without training.

AstraNav-World: World Model for Foresight Control and Consistency

cs.CV · 2025-12-25 · unverdicted · novelty 6.0

AstraNav-World unifies diffusion video generation and vision-language action planning in a single bidirectional model that improves trajectory accuracy, success rates, and zero-shot real-world adaptation in embodied navigation.

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

cs.CV · 2026-06-05 · unverdicted · novelty 5.0

AnchorWorld proposes a simulation framework that adds exogenous viewpoint supervision for full-body grounding and anchor-view text customization for dynamic world evolution in egocentric settings.

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

DecMem proposes a decoupled memory system using sparse global and anchored local components to enable consistent minute-long controllable video generation in world models.

Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends

cs.CV · 2026-05-31 · unverdicted · novelty 2.0

This survey reviews trends, challenges, benchmarks, and future directions in action-conditioned interactive world modeling for video and 3D generation.

citing papers explorer

Showing 10 of 10 citing papers after filters.

Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 160
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos cs.RO · 2026-02-06 · unverdicted · none · ref 113
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
DisCo: World Models with Discrete Camera Motion Control cs.CV · 2026-06-06 · unverdicted · none · ref 48
DisCo uses discrete action primitives for camera control in video world models to achieve more reliable action following than continuous trajectories.
Streaming Video Generation with Streaming Force Control cs.CV · 2026-06-05 · unverdicted · none · ref 77
StreamForce presents a unified causal model for force-controllable streaming video generation using a new force representation and distillation pipeline, claiming SOTA force adherence and 16.6 FPS performance.
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models cs.CV · 2026-05-28 · unverdicted · none · ref 16
minWM supplies an end-to-end pipeline that fine-tunes bidirectional T2V/TI2V models with camera control then distills them via Causal Forcing into few-step autoregressive generators for low-latency rollout.
WorldKV: Efficient World Memory with World Retrieval and Compression cs.CV · 2026-05-21 · unverdicted · none · ref 29
WorldKV enables persistent world memory in autoregressive video diffusion models by selectively retrieving and compressing KV-cache chunks, matching full-cache fidelity at roughly twice the throughput without training.
AstraNav-World: World Model for Foresight Control and Consistency cs.CV · 2025-12-25 · unverdicted · none · ref 23
AstraNav-World unifies diffusion video generation and vision-language action planning in a single bidirectional model that improves trajectory accuracy, success rates, and zero-shot real-world adaptation in embodied navigation.
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization cs.CV · 2026-06-05 · unverdicted · none · ref 54
AnchorWorld proposes a simulation framework that adds exogenous viewpoint supervision for full-body grounding and anchor-view text customization for dynamic world evolution in egocentric settings.
DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory cs.CV · 2026-05-29 · unverdicted · none · ref 47
DecMem proposes a decoupled memory system using sparse global and anchored local components to enable consistent minute-long controllable video generation in world models.
Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends cs.CV · 2026-05-31 · unverdicted · none · ref 204
This survey reviews trends, challenges, benchmarks, and future directions in action-conditioned interactive world modeling for video and 3D generation.

Yan: Foundational interactive video generation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer