Mobile Video Diffusion

Amir Ghodrati; Amirhossein Habibian; Denis Korzhenkov; Haitam Ben Yahia; Ioannis Lelekas

arxiv: 2412.07583 · v1 · pith:XBVWV4ECnew · submitted 2024-12-10 · 💻 cs.CV · cs.AI

Mobile Video Diffusion

Haitam Ben Yahia , Denis Korzhenkov , Ioannis Lelekas , Amir Ghodrati , Amirhossein Habibian This is my paper

classification 💻 cs.CV cs.AI

keywords diffusionvideoreducecomputationalmobilemodeltemporalachieved

0 comments

read the original abstract

Video diffusion models have achieved impressive realism and controllability but are limited by high computational demands, restricting their use on mobile devices. This paper introduces the first mobile-optimized video diffusion model. Starting from a spatio-temporal UNet from Stable Video Diffusion (SVD), we reduce memory and computational cost by reducing the frame resolution, incorporating multi-scale temporal representations, and introducing two novel pruning schema to reduce the number of channels and temporal blocks. Furthermore, we employ adversarial finetuning to reduce the denoising to a single step. Our model, coined as MobileVD, is 523x more efficient (1817.2 vs. 4.34 TFLOPs) with a slight quality drop (FVD 149 vs. 171), generating latents for a 14x512x256 px clip in 1.7 seconds on a Xiaomi-14 Pro. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-diffusion/

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation
cs.CV 2026-05 unverdicted novelty 6.0

SGMD uses fake-score optimization toward the teacher with stop-gradient Fisher objective and NR/RC dual potentials to deliver ~3x training speedup and better motion dynamics in 4-step video diffusion models.