hub

arXiv preprint arXiv:2507.03745 (2025)

Akio Kodaira, Tingbo Hou, Ji Hou, Markos Georgopoulos, Felix Juefei-Xu, Masayoshi Tomizuka, Yue Zhao · 2025 · arXiv 2507.03745

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

StreamingEffect: Real-Time Human-Centric Video Effect Generation

cs.CV · 2026-05-16 · unverdicted · novelty 7.0

StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.

OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

OphEdit enables text-guided editing of eye surgery videos without training by injecting preserved attention value tensors into the diffusion denoising process to maintain anatomical structure.

Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

Sparse Forcing adds a native trainable sparsity mechanism and PBSA kernel to autoregressive diffusion video models, yielding higher VBench scores and 1.1-1.27x speedups on 5s to 1min generations.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

cs.CV · 2026-04-11 · conditional · novelty 6.0

Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

cs.CV · 2025-12-04 · conditional · novelty 6.0

Live Avatar enables 45 FPS real-time streaming infinite-length audio-driven avatar generation from a 14B diffusion model via distillation and timestep-forcing pipeline parallelism.

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

cs.CV · 2025-10-02 · conditional · novelty 6.0

Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

cs.CV · 2025-09-29 · unverdicted · novelty 6.0

Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.

LongLive: Real-time Interactive Long Video Generation

cs.CV · 2025-09-26 · conditional · novelty 6.0

LongLive is a causal autoregressive video generator that produces up to 240-second interactive videos at 20.7 FPS on one H100 GPU after 32 GPU-days of fine-tuning from a 1.3B short-clip model.

Reward-Forcing: Autoregressive Video Generation with Reward Feedback

cs.CV · 2026-01-23 · unverdicted · novelty 5.0

Reward-Forcing guides autoregressive video generation with reward feedback to achieve performance comparable to teacher-dependent methods on benchmarks like VBench without relying on distillation.

citing papers explorer

Showing 10 of 10 citing papers.

StreamingEffect: Real-Time Human-Centric Video Effect Generation cs.CV · 2026-05-16 · unverdicted · none · ref 28
StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.
OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos cs.CV · 2026-05-08 · unverdicted · none · ref 5
OphEdit enables text-guided editing of eye surgery videos without training by injecting preserved attention value tensors into the diffusion denoising process to maintain anatomical structure.
Sparse Forcing: Native Trainable Sparse Attention for Real-time Autoregressive Diffusion Video Generation cs.CV · 2026-04-23 · unverdicted · none · ref 10
Sparse Forcing adds a native trainable sparsity mechanism and PBSA kernel to autoregressive diffusion video models, yielding higher VBench scores and 1.1-1.27x speedups on 5s to 1min generations.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 76
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation cs.CV · 2026-04-11 · conditional · none · ref 20
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length cs.CV · 2025-12-04 · conditional · none · ref 21
Live Avatar enables 45 FPS real-time streaming infinite-length audio-driven avatar generation from a 14B diffusion model via distillation and timestep-forcing pipeline parallelism.
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation cs.CV · 2025-10-02 · conditional · none · ref 28
Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time cs.CV · 2025-09-29 · unverdicted · none · ref 74
Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.
LongLive: Real-time Interactive Long Video Generation cs.CV · 2025-09-26 · conditional · none · ref 19
LongLive is a causal autoregressive video generator that produces up to 240-second interactive videos at 20.7 FPS on one H100 GPU after 32 GPU-days of fine-tuning from a 1.3B short-clip model.
Reward-Forcing: Autoregressive Video Generation with Reward Feedback cs.CV · 2026-01-23 · unverdicted · none · ref 10
Reward-Forcing guides autoregressive video generation with reward feedback to achieve performance comparable to teacher-dependent methods on benchmarks like VBench without relying on distillation.

arXiv preprint arXiv:2507.03745 (2025)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer