Vbench: Comprehensive benchmark suite for video generative models

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, et al · 2024

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

dataset 4

citation-polarity summary

use dataset 4

representative citing papers

Do generative video models understand physical principles?

cs.CV · 2025-01-14 · unverdicted · novelty 8.0

Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.

Lance: Unified Multimodal Modeling by Multi-Task Synergy

cs.CV · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.

Stream-T1: Test-Time Scaling for Streaming Video Generation

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Stream-T1 is a test-time scaling framework for streaming video generation using scaled noise propagation from history, reward pruning across short and long windows, and feedback-guided memory sinking to improve temporal consistency and visual quality.

Motion-Aware Caching for Efficient Autoregressive Video Generation

cs.CV · 2026-05-03 · conditional · novelty 6.0 · 2 refs

MotionCache accelerates autoregressive video generation up to 6.28x by motion-weighted cache reuse based on inter-frame differences, with negligible quality loss on SkyReels-V2 and MAGI-1.

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

cs.CV · 2025-10-02 · conditional · novelty 6.0

Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.

Emu3: Next-Token Prediction is All You Need

cs.CV · 2024-09-27 · unverdicted · novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

cs.CV · 2026-05-15

citing papers explorer

Showing 8 of 8 citing papers.

Do generative video models understand physical principles? cs.CV · 2025-01-14 · unverdicted · none · ref 47
Physics-IQ benchmark reveals that generative video models exhibit limited physical understanding unrelated to their visual quality.
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation cs.CV · 2026-05-05 · unverdicted · none · ref 14
Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.
Lance: Unified Multimodal Modeling by Multi-Task Synergy cs.CV · 2026-05-18 · unverdicted · none · ref 46 · 2 links
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
Stream-T1: Test-Time Scaling for Streaming Video Generation cs.CV · 2026-05-06 · unverdicted · none · ref 11
Stream-T1 is a test-time scaling framework for streaming video generation using scaled noise propagation from history, reward pruning across short and long windows, and feedback-guided memory sinking to improve temporal consistency and visual quality.
Motion-Aware Caching for Efficient Autoregressive Video Generation cs.CV · 2026-05-03 · conditional · none · ref 14 · 2 links
MotionCache accelerates autoregressive video generation up to 6.28x by motion-weighted cache reuse based on inter-frame differences, with negligible quality loss on SkyReels-V2 and MAGI-1.
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation cs.CV · 2025-10-02 · conditional · none · ref 22
Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.
Emu3: Next-Token Prediction is All You Need cs.CV · 2024-09-27 · unverdicted · none · ref 33
Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization cs.CV · 2026-05-15 · unreviewed · ref 8

Vbench: Comprehensive benchmark suite for video generative models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer