hub Canonical reference

Infinity-rope: Action-controllable infinite video generation emerges from autoregressive self-rollout

Hidir Yesiltepe, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, Pinar Yanardag · 2025 · arXiv 2511.20649

Canonical reference. 86% of citing Pith papers cite this work as background.

19 Pith papers citing it

Background 86% of classified citations

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 baseline 1

citation-polarity summary

background 6 baseline 1

representative citing papers

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Q-ARVD introduces final-quality-aware frame weighting and outlier-aware adaptive dual-scale quantization to enable accurate low-bit inference for autoregressive video diffusion models.

Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation

cs.CV · 2026-05-15 · unverdicted · novelty 7.0

Echo-Forcing decouples stable anchors, compressed history, and recent dynamics in video diffusion KV caches using hierarchical memory, scene recall frames, and difference-aware decay to support interactive long video generation under bounded cache.

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

cs.CV · 2026-05-14 · conditional · novelty 7.0

EntityBench is a new benchmark with detailed per-shot entity schedules from real media, and the EntityMem baseline using persistent per-entity memory achieves the highest character fidelity with Cohen's d of +2.33.

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

Grounded Forcing: Bridging Time-Independent Semantics and Proximal Dynamics in Autoregressive Video Synthesis

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

Grounded Forcing introduces dual memory caching, reference-based positional embeddings, and proximity-weighted recaching to bridge stable semantics with local dynamics, improving long-range consistency in autoregressive video synthesis.

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

FlowLong generates videos several times longer than native model windows by blending adjacent predictions with Tweedie matching to enforce manifold and temporal consistency while using stochastic noise injection early and deterministic sampling later.

World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

Proposes World-Ego Modeling with WEM using CP-MoE diffusion and a new HTEWorld benchmark, claiming SOTA on hybrid navigation-manipulation tasks.

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

MIGA introduces two-stage alignment to close train-inference gaps and dual consistency enhancement via self-reflection and long-range guidance to achieve SOTA temporal consistency in infinite-frame video generation on VBench and NarrLV.

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

FashionChameleon achieves interactive multi-garment video customization in real time by training a teacher model with in-context learning on single-garment pairs, applying streaming distillation, and using training-free KV cache rescheduling.

Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.

Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.

RealCam: Real-Time Novel-View Video Generation with Interactive Camera Control

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

RealCam is a causal autoregressive model for real-time camera-controlled video-to-video generation, using cross-frame in-context teacher distillation and loop-closed data augmentation to achieve high fidelity and consistency.

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

cs.CV · 2026-04-11 · conditional · novelty 6.0

Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms

eess.IV · 2026-03-30 · unverdicted · novelty 6.0

Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

cs.CV · 2026-02-08 · unverdicted · novelty 6.0

Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

cs.CV · 2025-12-04 · conditional · novelty 6.0

Reward Forcing combines EMA-Sink tokens and Rewarded Distribution Matching Distillation to deliver state-of-the-art streaming video generation at 23.1 FPS without copying initial frames.

One-Forcing: Towards Stable One-Step Autoregressive Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

One-Forcing augments DMD with a GAN loss to enable stable one-step causal autoregressive video generation, reporting a VBench score of 83.76 as SOTA among one-step methods.

citing papers explorer

Showing 19 of 19 citing papers.

Q-ARVD: Quantizing Autoregressive Video Diffusion Models cs.CV · 2026-05-20 · unverdicted · none · ref 24
Q-ARVD introduces final-quality-aware frame weighting and outlier-aware adaptive dual-scale quantization to enable accurate low-bit inference for autoregressive video diffusion models.
Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation cs.CV · 2026-05-15 · unverdicted · none · ref 20
Echo-Forcing decouples stable anchors, compressed history, and recent dynamics in video diffusion KV caches using hierarchical memory, scene recall frames, and difference-aware decay to support interactive long video generation under bounded cache.
EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation cs.CV · 2026-05-14 · conditional · none · ref 22
EntityBench is a new benchmark with detailed per-shot entity schedules from real media, and the EntityMem baseline using persistent per-entity memory achieves the highest character fidelity with Cohen's d of +2.33.
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives cs.CV · 2026-05-12 · unverdicted · none · ref 52
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 161
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
Grounded Forcing: Bridging Time-Independent Semantics and Proximal Dynamics in Autoregressive Video Synthesis cs.CV · 2026-04-08 · unverdicted · none · ref 27
Grounded Forcing introduces dual memory caching, reference-based positional embeddings, and proximity-weighted recaching to bridge stable semantics with local dynamics, improving long-range consistency in autoregressive video synthesis.
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching cs.CV · 2026-05-20 · unverdicted · none · ref 30
FlowLong generates videos several times longer than native model windows by blending adjacent predictions with Tweedie matching to enforce manifold and temporal consistency while using stochastic noise injection early and deterministic sampling later.
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks cs.CV · 2026-05-19 · unverdicted · none · ref 35
Proposes World-Ego Modeling with WEM using CP-MoE diffusion and a new HTEWorld benchmark, claiming SOTA on hybrid navigation-manipulation tasks.
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos cs.CV · 2026-05-18 · unverdicted · none · ref 30
MIGA introduces two-stage alignment to close train-inference gaps and dual consistency enhancement via self-reflection and long-range guidance to achieve SOTA temporal consistency in infinite-frame video generation on VBench and NarrLV.
FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization cs.CV · 2026-05-15 · unverdicted · none · ref 40
FashionChameleon achieves interactive multi-garment video customization in real time by training a teacher model with in-context learning on single-garment pairs, applying streaming distillation, and using training-free KV cache rescheduling.
Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity cs.CV · 2026-05-14 · unverdicted · none · ref 60
Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.
Pyramid Forcing: Head-Aware Pyramid KV Cache Policy for High-Quality Long Video Generation cs.CV · 2026-05-13 · unverdicted · none · ref 19
Pyramid Forcing classifies attention heads into Anchor, Wave, and Veil types and applies type-specific KV cache policies to improve long-horizon autoregressive video generation quality.
RealCam: Real-Time Novel-View Video Generation with Interactive Camera Control cs.CV · 2026-05-07 · unverdicted · none · ref 39
RealCam is a causal autoregressive model for real-time camera-controlled video-to-video generation, using cross-frame in-context teacher distillation and loop-closed data augmentation to achieve high fidelity and consistency.
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation cs.CV · 2026-04-11 · conditional · none · ref 51
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation cs.CV · 2026-04-03 · unverdicted · none · ref 40
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 137
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion cs.CV · 2026-02-08 · unverdicted · none · ref 101
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation cs.CV · 2025-12-04 · conditional · none · ref 84
Reward Forcing combines EMA-Sink tokens and Rewarded Distribution Matching Distillation to deliver state-of-the-art streaming video generation at 23.1 FPS without copying initial frames.
One-Forcing: Towards Stable One-Step Autoregressive Video Generation cs.CV · 2026-05-22 · unverdicted · none · ref 28
One-Forcing augments DMD with a GAN loss to enable stable one-step causal autoregressive video generation, reporting a VBench score of 83.76 as SOTA among one-step methods.

Infinity-rope: Action-controllable infinite video generation emerges from autoregressive self-rollout

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer