Infvsr: Breaking length limits of generic video super-resolution

Ziqing Zhang, Kai Liu, Zheng Chen, Xi Li, Yucong Chen, Bingnan Duan, Linghe Kong, Yulun Zhang · 2025 · cs.CV · arXiv 2510.00948

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Real-world videos often extend over thousands of frames. Existing generative video super-resolution (VSR) approaches, however, face two persistent challenges when processing long sequences: (1) inefficiency due to the heavy cost of multi-step denoising for full-length sequences; and (2) poor consistency is hindered by temporal decomposition that causes artifacts and discontinuities. To break these limits, we propose InfVSR, which reformulates VSR as an autoregressive-one-step-diffusion paradigm, and enables streaming inference with video diffusion priors. First, we adapt the pretrained DiT into a causal structure, maintaining both local and global coherence via rolling KV-cache and joint visual guidance. Second, we distill the diffusion process into a single step efficiently, with patch-wise pixel supervision and cross-chunk distribution matching. To fill the gap in long-form video evaluation, we build a new benchmark tailored for extended sequences and further introduce semantic-level metrics to comprehensively assess temporal consistency. Our method pushes the frontier of long-form VSR, achieves state-of-the-art quality with enhanced semantic consistency, and delivers up to 58x speed-up over existing methods such as MGLD-VSR. Our code and models are available at https://github.com/Kai-Liu001/InfVSR.

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

GS-STVSR: Ultra-Efficient Continuous Spatio-Temporal Video Super-Resolution via 2D Gaussian Splatting

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

GS-STVSR achieves state-of-the-art continuous spatio-temporal video super-resolution quality with nearly constant inference time at standard scales and over 3x speedup at extreme scales using 2D Gaussian Splatting.

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

cs.CV · 2025-12-29 · conditional · novelty 7.0

Stream-DiffVSR enables practical low-latency video super-resolution by combining a four-step distilled denoiser, auto-regressive temporal guidance, and a temporal processor in a strictly causal pipeline.

DiffST: Spatiotemporal-Aware Diffusion for Real-World Space-Time Video Super-Resolution

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

DiffST delivers state-of-the-art real-world space-time video super-resolution with 17x faster inference than prior diffusion methods by using one-step sampling, cross-frame context aggregation, and video representation guidance.

DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

DVFace uses a spatio-temporal dual-codebook and asymmetric fusion in a one-step diffusion model to deliver better video face restoration quality, temporal consistency, and identity preservation than recent methods.

TIGER: Taming Identity, Geometry, and Generative Priors for High-Quality Face Video Restoration

cs.CV · 2026-06-23 · unverdicted · novelty 5.0

TIGER is a tri-prior fusion method for face video restoration using identity, geometry, and generative priors with progressive training to achieve SOTA identity fidelity and temporal stability on a new large-scale dataset.

citing papers explorer

Showing 4 of 4 citing papers after filters.

GS-STVSR: Ultra-Efficient Continuous Spatio-Temporal Video Super-Resolution via 2D Gaussian Splatting cs.CV · 2026-04-20 · unverdicted · none · ref 159 · internal anchor
GS-STVSR achieves state-of-the-art continuous spatio-temporal video super-resolution quality with nearly constant inference time at standard scales and over 3x speedup at extreme scales using 2D Gaussian Splatting.
DiffST: Spatiotemporal-Aware Diffusion for Real-World Space-Time Video Super-Resolution cs.CV · 2026-05-13 · unverdicted · none · ref 61 · internal anchor
DiffST delivers state-of-the-art real-world space-time video super-resolution with 17x faster inference than prior diffusion methods by using one-step sampling, cross-frame context aggregation, and video representation guidance.
DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration cs.CV · 2026-04-16 · unverdicted · none · ref 58 · internal anchor
DVFace uses a spatio-temporal dual-codebook and asymmetric fusion in a one-step diffusion model to deliver better video face restoration quality, temporal consistency, and identity preservation than recent methods.
TIGER: Taming Identity, Geometry, and Generative Priors for High-Quality Face Video Restoration cs.CV · 2026-06-23 · unverdicted · none · ref 28 · internal anchor
TIGER is a tri-prior fusion method for face video restoration using identity, geometry, and generative priors with progressive training to achieve SOTA identity fidelity and temporal stability on a new large-scale dataset.

Infvsr: Breaking length limits of generic video super-resolution

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer