Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

· 2025 · cs.CV · arXiv 2505.18600

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity. Project Page: https://bryanswkim.github.io/chain-of-zoom/.

representative citing papers

Tiled Prompts: Overcoming Prompt Misguidance in Image and Video Super-Resolution

cs.CV · 2026-02-03 · unverdicted · novelty 7.0

Tiled Prompts generates tile-specific text prompts for each latent tile in diffusion super-resolution to reduce errors from global prompts and improve perceptual quality.

GaussianZoom: Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

GaussianZoom enables high-fidelity extreme zoom-in 3D rendering from low-res inputs via an iterative framework combining geometry-consistent modeling, depth-based super-resolution, VLM detail synthesis, and an expandable continuous Level-of-Detail hierarchy.

citing papers explorer

Showing 2 of 2 citing papers.

Tiled Prompts: Overcoming Prompt Misguidance in Image and Video Super-Resolution cs.CV · 2026-02-03 · unverdicted · none · ref 22 · internal anchor
Tiled Prompts generates tile-specific text prompts for each latent tile in diffusion super-resolution to reduce errors from global prompts and improve perceptual quality.
GaussianZoom: Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance cs.CV · 2026-05-18 · unverdicted · none · ref 12 · internal anchor
GaussianZoom enables high-fidelity extreme zoom-in 3D rendering from low-res inputs via an iterative framework combining geometry-consistent modeling, depth-based super-resolution, VLM detail synthesis, and an expandable continuous Level-of-Detail hierarchy.

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

fields

years

verdicts

representative citing papers

citing papers explorer