Anchored Tree Sampling converts horizon-compounding drift into anchor-bounded drift by organizing video generation as a sparse-to-dense tree of imputations instead of left-to-right autoregressive rollout.
hub Mixed citations
Memflow: Flowing adaptive memory for consistent and efficient long video narratives
Mixed citation behavior. Most common role is background (67%).
hub tools
citation-role summary
citation-polarity summary
years
2026 12representative citing papers
KVPO aligns streaming autoregressive video generators with human preferences via ODE-native GRPO, using KV cache for semantic exploration and TVE for velocity-based policy modeling, yielding gains in quality and alignment.
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
DySink uses adaptive retrieval of relevant historical frames plus a sink anomaly gate to improve dynamic degree and temporal quality in minute-long autoregressive video generation.
IAMFlow is a training-free identity-aware memory system that tracks entities via LLM global ID assignment and VLM frame verification to reduce identity drift in narrative long video generation from shifting prompts.
Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.
SWIFT introduces a semantic injection cache with head-wise updates and an adaptive dynamic window plus segment anchors to achieve efficient multi-prompt long video generation at 22.6 FPS while preserving quality in causal diffusion models.
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
citing papers explorer
-
Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation
Anchored Tree Sampling converts horizon-compounding drift into anchor-bounded drift by organizing video generation as a sparse-to-dense tree of imputations instead of left-to-right autoregressive rollout.
-
KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration
KVPO aligns streaming autoregressive video generators with human preferences via ODE-native GRPO, using KV cache for semantic exploration and TVE for velocity-based policy modeling, yielding gains in quality and alignment.
-
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
-
Efficient Video Diffusion Models: Advancements and Challenges
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
-
StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation
StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriented guidance, and visual prompting.
-
DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation
DySink uses adaptive retrieval of relevant historical frames plus a sink anomaly gate to improve dynamic degree and temporal quality in minute-long autoregressive video generation.
-
Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory
IAMFlow is a training-free identity-aware memory system that tracks entities via LLM global ID assignment and VLM frame verification to reduce identity drift in narrative long video generation from shifting prompts.
-
Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity
Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.
-
SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
SWIFT introduces a semantic injection cache with head-wise updates and an adaptive dynamic window plus segment anchors to achieve efficient multi-prompt long video generation at 22.6 FPS while preserving quality in causal diffusion models.
-
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
-
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
-
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.