LongLive-2.0 delivers an NVFP4 parallel infrastructure that enables direct training of long multi-shot autoregressive diffusion video models and achieves up to 2.15x training and 1.84x inference speedups on Blackwell and other GPUs.
hub Canonical reference
Autoregressive adversarial post- training for real-time interactive video generation
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.
Continuous adversarial flow models replace MSE in flow matching with adversarial training via a discriminator, improving guidance-free FID on ImageNet from 8.26 to 3.63 for SiT and similar gains for JiT and text-to-image benchmarks.
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
Reward Forcing combines EMA-Sink tokens and Rewarded Distribution Matching Distillation to deliver state-of-the-art streaming video generation at 23.1 FPS without copying initial frames.
Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.
Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.
TurboTalk uses progressive distillation from 4 steps to 1 step with distribution matching and adversarial training to achieve 120x faster single-step audio-driven talking avatar video generation.
Matrix-Game 2.0 introduces a scalable data pipeline, action-injection module, and few-step distillation to enable real-time streaming video generation at 25 FPS from game-engine interactions, with open-sourced weights and code.
LingBot-World is presented as an open-source world model that delivers high-fidelity simulation, minute-level contextual consistency, and real-time interactivity under one second latency.
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.
citing papers explorer
-
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.
-
Continuous Adversarial Flow Models
Continuous adversarial flow models replace MSE in flow matching with adversarial training via a discriminator, improving guidance-free FID on ImageNet from 8.26 to 3.63 for SiT and similar gains for JiT and text-to-image benchmarks.
-
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
-
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
-
Evolution of Video Generative Foundations
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.