LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

Chen Zhao; Hongyu Li; Jian Yang; Jiawei Chen; Kai Zhang; Shilin Lu; Xiaoming Wei; Ying Tai; Zhuoliang Kang

arxiv: 2602.11564 · v2 · pith:SQQI7IUBnew · submitted 2026-02-12 · 💻 cs.CV

LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

Chen Zhao , Jiawei Chen , Hongyu Li , Zhuoliang Kang , Shilin Lu , Xiaoming Wei , Kai Zhang , Jian Yang

show 1 more author

Ying Tai

This is my paper

classification 💻 cs.CV

keywords generationluvetextbfvideolatentcontentdetaildual

0 comments

read the original abstract

Recent advances in video diffusion models have significantly improved visual quality, yet ultra-high-resolution (UHR) video generation remains a formidable challenge due to the compounded difficulties of motion modeling, semantic planning, and detail synthesis. To address these limitations, we propose \textbf{LUVE}, a \textbf{L}atent-cascaded \textbf{U}HR \textbf{V}ideo generation framework built upon dual frequency \textbf{E}xperts. LUVE employs a three-stage architecture comprising low-resolution motion generation for motion-consistent latent synthesis, video latent upsampling that performs resolution upsampling directly in the latent space to mitigate memory and computational overhead, and high-resolution content refinement that integrates low-frequency and high-frequency experts to jointly enhance semantic coherence and fine-grained detail generation. Extensive experiments demonstrate that our LUVE achieves superior photorealism and content fidelity in UHR video generation, and comprehensive ablation studies further validate the effectiveness of each component. The project is available at \href{https://unicornanrocinu.github.io/LUVE_web/}{https://github.io/LUVE/}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GS-STVSR: Ultra-Efficient Continuous Spatio-Temporal Video Super-Resolution via 2D Gaussian Splatting
cs.CV 2026-04 unverdicted novelty 7.0

GS-STVSR achieves state-of-the-art continuous spatio-temporal video super-resolution quality with nearly constant inference time at standard scales and over 3x speedup at extreme scales using 2D Gaussian Splatting.
From Zero to Detail: A Progressive Spectral Decoupling Paradigm for UHD Image Restoration with New Benchmark
cs.CV 2026-04 unverdicted novelty 7.0

A new framework called ERR decomposes UHD image restoration into three frequency stages with specialized sub-networks and introduces the LSUHDIR benchmark dataset of over 82,000 images.
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
cs.CV 2026-04 unverdicted novelty 7.0

RefineAnything is a multimodal diffusion model using Focus-and-Refine crop-and-resize with blended paste-back to achieve high-fidelity local image refinement and near-perfect background preservation.
AtlasVid: Efficient Ultra-High-Resolution Long Video Generation via Decoupled Global-Local Modeling
cs.CV 2026-05 unverdicted novelty 6.0

AtlasVid proposes a decoupled global-local diffusion framework that trains at low resolution with LoRA and generalizes to ultra-high-resolution long video synthesis via semantic proxy guidance and locality-preserving ...
Why Do DiT Editors Drift? Plug-and-Play Low Frequency Alignment in VAE Latent Space
cs.CV 2026-05 unverdicted novelty 4.0

VAE-LFA suppresses semantic drift in multi-turn DiT image editing by low-pass filtering latent discrepancies and aligning low-frequency components to an EMA of previous rounds in VAE space.