ISPA reduces KV cache size by up to 50% in AR video models by transitioning layers to local attention and applying instance-specific least-squares weight modulation to compensate for lost history.
hub
Stable video infinity: Infinite-length video generation with error recycling.arXiv preprint arXiv:2510.09212
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
fields
cs.CV 16representative citing papers
MemLearner introduces a learning-based adaptive context query method using query tokens in video world models to improve long-term scene consistency over rule-based retrieval.
LongLive-2.0 delivers an NVFP4 parallel infrastructure that enables direct training of long multi-shot autoregressive diffusion video models and achieves up to 2.15x training and 1.84x inference speedups on Blackwell and other GPUs.
Social-Mamba introduces a Cycle Mamba block and social triplet factorization to achieve state-of-the-art trajectory forecasting accuracy with linear-time social interaction modeling on five benchmarks.
Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
MetaWorld scales multi-agent video world models from single-view videos using monocular decomposition into ego-motion and trajectories, subject-aware generation, and cross-attention alignment for consistency.
Robust Dreamer uses Latent Gaussian Memory anchored to diffusion latents and Deviation Learning with a Dynamic Deviation Archive to reduce drift in long-horizon action-controlled image-to-video generation, reporting SOTA results on ScanNet, DL3DV, and OmniWorldGame.
StreamEdit enables high-quality training-free video editing by adapting streaming video generation models with dual-branch fast sampling, self-attention bridge, cross-attention grounding, source-oriented guidance, and visual prompting, outperforming prior methods in few-step regimes.
EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.
HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.
Reward Forcing combines EMA-Sink tokens and Rewarded Distribution Matching Distillation to deliver state-of-the-art streaming video generation at 23.1 FPS without copying initial frames.
A generative navigation world model that uses sparse anchored rollout with epipolar constraints to reduce perceptual and geometric drift.
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.
EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.
citing papers explorer
-
Towards Memory-Efficient Autoregressive Video Generation via Instance-Specific Parametric Absorption
ISPA reduces KV cache size by up to 50% in AR video models by transitioning layers to local attention and applying instance-specific least-squares weight modulation to compensate for lost history.
-
MemLearner: Learning to Query Context memory for Video World Models
MemLearner introduces a learning-based adaptive context query method using query tokens in video world models to improve long-term scene consistency over rule-based retrieval.
-
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
LongLive-2.0 delivers an NVFP4 parallel infrastructure that enables direct training of long multi-shot autoregressive diffusion video models and achieves up to 2.15x training and 1.84x inference speedups on Blackwell and other GPUs.
-
Social-Mamba: Socially-Aware Trajectory Forecasting with State-Space Models
Social-Mamba introduces a Cycle Mamba block and social triplet factorization to achieve state-of-the-art trajectory forecasting accuracy with linear-time social interaction modeling on five benchmarks.
-
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.
-
Efficient Video Diffusion Models: Advancements and Challenges
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
-
MetaWorld: Scaling Multi-Agent Video World Model from Single-view Video Data
MetaWorld scales multi-agent video world models from single-view videos using monocular decomposition into ego-motion and trajectories, subject-aware generation, and cross-attention alignment for consistency.
-
Robust Dreamer: Deviation-Aware Latent Gaussian Memory for Action-Controlled AR Video Generation
Robust Dreamer uses Latent Gaussian Memory anchored to diffusion latents and Deviation Learning with a Dynamic Deviation Archive to reduce drift in long-horizon action-controlled image-to-video generation, reporting SOTA results on ScanNet, DL3DV, and OmniWorldGame.
-
StreamEdit: Training-Free Video Editing via Few-Step Streaming Video Generation
StreamEdit enables high-quality training-free video editing by adapting streaming video generation models with dual-branch fast sampling, self-attention bridge, cross-attention grounding, source-oriented guidance, and visual prompting, outperforming prior methods in few-step regimes.
-
EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration
EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.
-
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation
HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.
-
Drift-Resistant Navigation World Model with Anchored Epipolar Guidance
A generative navigation world model that uses sparse anchored rollout with epipolar constraints to reduce perceptual and geometric drift.
-
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.
-
EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation
EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.
-
Evolution of Video Generative Foundations
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.