hub Canonical reference

Relic: Interactive video world model with long-horizon memory.arXiv preprint arXiv:2512.04040

Hong, Y · 2025 · arXiv 2512.04040

Canonical reference. 91% of citing Pith papers cite this work as background.

15 Pith papers citing it

Background 91% of classified citations

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 11

citation-polarity summary

background 10 unclear 1

representative citing papers

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

MoRight: Motion Control Done Right

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

WorldKV: Efficient World Memory with World Retrieval and Compression

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

WorldKV enables persistent world memory in autoregressive video diffusion models by selectively retrieving and compressing KV-cache chunks, matching full-cache fidelity at roughly twice the throughput without training.

Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.

Lyra 2.0: Explorable Generative 3D Worlds

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

cs.CV · 2026-04-11 · conditional · novelty 6.0

Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

cs.CV · 2026-02-08 · unverdicted · novelty 6.0

Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

cs.LG · 2026-02-03 · unverdicted · novelty 6.0

Quant VideoGen reduces KV cache memory by up to 7 times in autoregressive video diffusion models via semantic aware smoothing and progressive residual quantization, achieving better quality than baselines with under 4% latency overhead.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

Advancing Open-source World Models

cs.CV · 2026-01-28 · unverdicted · novelty 4.0

LingBot-World is presented as an open-source world model that delivers high-fidelity simulation, minute-level contextual consistency, and real-time interactivity under one second latency.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

cs.CV · 2026-02-02 · 2 refs

citing papers explorer

Showing 15 of 15 citing papers.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 34 · 2 links
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 32
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
MoRight: Motion Control Done Right cs.CV · 2026-04-08 · unverdicted · none · ref 35
MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos cs.RO · 2026-02-06 · unverdicted · none · ref 34
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
WorldKV: Efficient World Memory with World Retrieval and Compression cs.CV · 2026-05-21 · unverdicted · none · ref 8
WorldKV enables persistent world memory in autoregressive video diffusion models by selectively retrieving and compressing KV-cache chunks, matching full-cache fidelity at roughly twice the throughput without training.
Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation cs.CV · 2026-04-20 · unverdicted · none · ref 20
A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.
Lyra 2.0: Explorable Generative 3D Worlds cs.CV · 2026-04-14 · unverdicted · none · ref 33
Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation cs.CV · 2026-04-11 · conditional · none · ref 14
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion cs.CV · 2026-02-08 · unverdicted · none · ref 37
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization cs.LG · 2026-02-03 · unverdicted · none · ref 5
Quant VideoGen reduces KV cache memory by up to 7 times in autoregressive video diffusion models via semantic aware smoothing and progressive residual quantization, achieving better quality than baselines with under 4% latency overhead.
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer cs.CV · 2026-05-14 · unverdicted · none · ref 40
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 16
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.
Advancing Open-source World Models cs.CV · 2026-01-28 · unverdicted · none · ref 28
LingBot-World is presented as an open-source world model that delivers high-fidelity simulation, minute-level contextual consistency, and real-time interactivity under one second latency.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 293
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.
Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation cs.CV · 2026-02-02 · unreviewed · ref 15 · 2 links

Relic: Interactive video world model with long-horizon memory.arXiv preprint arXiv:2512.04040

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer