pith. sign in

arxiv: 2605.18252 · v1 · pith:F6EIV2ETnew · submitted 2026-05-18 · 💻 cs.CV

GaussianZoom: Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance

Pith reviewed 2026-05-20 10:42 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian SplattingZoom-in ReconstructionSuper-ResolutionLevel-of-DetailGenerative 3D ModelingMulti-view ConsistencySemantic Guidance
0
0 comments X

The pith

GaussianZoom enables high-fidelity extreme zoom-in rendering of 3D scenes from low-resolution inputs using progressive Gaussian splatting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GaussianZoom as a system for generating detailed close-up views of 3D scenes that exceed the resolution of the original input images. It builds an iterative process that refines the scene model step by step, combining geometric consistency with semantic understanding to add plausible fine details. A dedicated super-resolution step uses depth information to align features across views and vision-language guidance to synthesize new textures. A continuous level-of-detail structure keeps the representation efficient and smooth as the magnification increases. If the approach holds, it would let users explore reconstructed environments at arbitrary scales without needing higher-resolution source data.

Core claim

GaussianZoom is an iterative progressive framework for generative zoom-in 3D reconstruction that integrates geometry-consistent scene modeling and multi-scale semantic reasoning. It introduces a multi-view consistent super-resolution module that applies depth-based feature warping and VLM-driven detail synthesis to enrich appearance beyond the observed resolution while preserving correspondence. An expandable continuous Level-of-Detail hierarchy dynamically adjusts Gaussian visibility to support alias-free rendering across large magnification ranges. On Mip-NeRF360 and Tanks&Temples, the method reports better perceptual quality, multi-view consistency, and stability under extreme zoom.

What carries the argument

The multi-view consistent super-resolution module with depth-based feature warping and VLM-driven detail synthesis that enriches fine-scale appearance while keeping cross-view alignment, together with the expandable continuous Level-of-Detail hierarchy that modulates Gaussian visibility for smooth scaling.

If this is right

  • Achieves higher perceptual quality in zoomed renderings compared with prior 3D Gaussian methods.
  • Preserves multi-view consistency even when magnification exceeds the input resolution by large factors.
  • Remains stable without aliasing or popping when the viewer moves continuously across wide scale ranges.
  • Provides a working baseline that later methods for generative zoom-in reconstruction can improve upon.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same progressive refinement loop could be adapted to add user-specified details or correct specific regions after initial reconstruction.
  • Combining the level-of-detail hierarchy with real-time rendering engines might support interactive exploration in virtual environments from casual photo sets.
  • Extending the semantic guidance to handle time-varying scenes could open applications in video-based 3D zoom-in.

Load-bearing premise

The multi-view consistent super-resolution module with depth-based feature warping and VLM-driven detail synthesis can accurately enrich fine-scale appearance beyond the observed resolution while maintaining multi-view correspondence.

What would settle it

Ground-truth high-resolution images captured at extreme magnification on the same scenes showing that the synthesized details misalign across views or introduce visible artifacts not present in real data.

Figures

Figures reproduced from arXiv: 2605.18252 by Hujun Bao, Jiale Shi, Jiarui Hu, Kaixuan Luan, Zesong Yang, Zhaopeng Cui.

Figure 1
Figure 1. Figure 1: GaussianZoom progressively magnifies 3D scenes from low-resolution inputs, reconstructing them into multi-view consistent and detail-rich representations. The expandable continuous Level-of-Detail hierarchy organizes primitives across scales, enabling smooth and alias-free rendering throughout the zoom-in process. Please refer to the supp. material for more vivid video demonstrations. Abstract We introduce… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between flow-based and depth-based warp [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Method overview. Our framework jointly leverages geometry-aware alignment, semantic priors, and a continuous Level-of￾Detail (LoD) representation to perform generative zoom-in reconstruction. Starting from a coarse 3D Gaussian Splatting model, we derive per-view depth maps that enable depth-based feature warping, providing accurate multi-view correspondence. In parallel, coarse and zoomed-in renderings are… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of 4× super-resolution results. Mip-Splatting reduces aliasing but lacks fine details; SuperGaussian, SRGS and Sequence Matters produces blurry textures; Our method reconstructs sharper textures, cleaner edges, and more coherent struc￾tures across views, closely approaching the ground truth. Method Mip-NeRF360 Tanks&Temples PSNR↑ SSIM↑ LPIPS↓ FID↓ PSNR↑ SSIM↑ LPIPS↓ FID↓ 3DGS [10] 20… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison under extreme zoom-in across multiple focal levels and viewpoints. Competing methods exhibit blurry, [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effectiveness of VLM guidance in detail synthsis. With [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Effectiveness of continuous LoD. Without LoD, opti [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

We introduce GaussianZoom, a generative zoom-in 3D reconstruction system with an iterative progressive framework that combines geometry-consistent scene modeling and multi-scale semantic reasoning to enable high-fidelity extreme zoom-in rendering from low-resolution inputs. To achieve this, we develop a novel multi-view consistent super-resolution module with depth-based feature warping and VLM-driven detail synthesis, ensuring accurate multi-view correspondence while enriching fine-scale appearance beyond the observed resolution. To support zooming across large magnification ranges, we further introduce a new expandable continuous Level-of-Detail hierarchy that dynamically modulates Gaussian visibility for smooth, alias-free cross-scale rendering. Experiments on Mip-NeRF360 and Tanks\&Temples demonstrate that GaussianZoom achieves superior perceptual quality, multi-view consistency, and robustness under extreme magnification, establishing a strong baseline for generative zoom-in 3D scene reconstruction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces GaussianZoom, a progressive zoom-in generative 3D Gaussian Splatting framework that combines geometry-consistent scene modeling with multi-scale semantic reasoning. It proposes a multi-view consistent super-resolution module using depth-based feature warping and VLM-driven detail synthesis to enrich fine-scale appearance, plus an expandable continuous Level-of-Detail hierarchy for alias-free cross-scale rendering. Experiments on Mip-NeRF360 and Tanks&Temples are claimed to show superior perceptual quality, multi-view consistency, and robustness under extreme magnification.

Significance. If the central claims hold, the work would provide a useful baseline for generative zoom-in 3D reconstruction, particularly by integrating geometric guidance with VLM-based semantic detail synthesis and addressing multi-scale rendering via the proposed LOD hierarchy. This could advance applications in high-fidelity rendering from low-resolution inputs where extreme magnification is required.

major comments (2)
  1. [Abstract] Abstract: The manuscript asserts superior performance on Mip-NeRF360 and Tanks&Temples benchmarks with respect to perceptual quality, multi-view consistency, and robustness under extreme magnification, yet supplies no quantitative results, error bars, ablation studies, or specific metrics in the provided text. This absence directly weakens evaluation of the load-bearing claims.
  2. [Multi-view consistent super-resolution module] Multi-view consistent super-resolution module: The VLM-driven detail synthesis is presented as the mechanism for enriching fine-scale appearance beyond observed resolution while preserving correspondence via depth-based feature warping. However, depth warping supplies only coarse alignment and does not address potential semantic or textural hallucinations produced by VLMs at 8-16x zoom factors; without explicit consistency metrics or ground-truth high-frequency validation, this undermines the multi-view consistency and robustness assertions.
minor comments (2)
  1. The description of the expandable continuous Level-of-Detail hierarchy would benefit from a clearer statement of how Gaussian visibility is modulated and any associated computational overhead.
  2. Notation for the progressive iterative framework could be standardized earlier to improve readability of the geometric and semantic guidance components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications from the full manuscript and indicating revisions where they strengthen the presentation of results and technical details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The manuscript asserts superior performance on Mip-NeRF360 and Tanks&Temples benchmarks with respect to perceptual quality, multi-view consistency, and robustness under extreme magnification, yet supplies no quantitative results, error bars, ablation studies, or specific metrics in the provided text. This absence directly weakens evaluation of the load-bearing claims.

    Authors: The abstract serves as a concise summary and conventionally omits specific numerical values, which are instead reported in full in the Experiments section. There we present quantitative comparisons on Mip-NeRF360 and Tanks&Temples using standard metrics (PSNR, SSIM, LPIPS) for perceptual quality, dedicated multi-view consistency scores, and robustness measures under extreme magnification, together with ablation studies and error bars on the reported plots. To make the abstract claims more self-contained, we will revise it to briefly reference these quantitative evaluations and direct readers to the detailed tables and figures. revision: yes

  2. Referee: [Multi-view consistent super-resolution module] Multi-view consistent super-resolution module: The VLM-driven detail synthesis is presented as the mechanism for enriching fine-scale appearance beyond observed resolution while preserving correspondence via depth-based feature warping. However, depth warping supplies only coarse alignment and does not address potential semantic or textural hallucinations produced by VLMs at 8-16x zoom factors; without explicit consistency metrics or ground-truth high-frequency validation, this undermines the multi-view consistency and robustness assertions.

    Authors: We agree that depth-based warping alone supplies only coarse geometric alignment. Multi-view consistency in our framework is additionally enforced by the geometry-consistent scene modeling, iterative progressive optimization across views, and the continuous LOD hierarchy. We evaluate this using explicit multi-view consistency metrics (cross-view feature similarity and perceptual consistency scores) reported in the experiments. The VLM synthesis is constrained by both geometric and semantic guidance to limit hallucinations. We acknowledge that ground-truth high-frequency references at 8-16x magnification are unavailable in the benchmarks, making direct validation difficult; we therefore rely on perceptual user studies and cross-method comparisons. We will expand the manuscript with a dedicated paragraph on the consistency metrics and a limitations discussion of potential VLM-induced artifacts. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a novel technical framework combining progressive Gaussian Splatting, depth-based feature warping, VLM-driven detail synthesis, and a continuous LOD hierarchy for zoom-in rendering. All load-bearing components are introduced as new modules whose behavior is defined by explicit algorithmic choices rather than by fitting parameters to the target metrics or by reducing to self-citations. Experiments on external datasets (Mip-NeRF360, Tanks&Temples) provide independent evaluation; no derivation step equates a claimed prediction or uniqueness result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities are detailed in the available text.

pith-pipeline@v0.9.0 · 5691 in / 1170 out tokens · 37467 ms · 2026-05-20T10:42:05.367582+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 3 internal anchors

  1. [1]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 6

  2. [2]

    Mip-nerf 360: Unbounded anti-aliased neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022. 5

  3. [3]

    Basicvsr: The search for essential compo- nents in video super-resolution and beyond

    Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. Basicvsr: The search for essential compo- nents in video super-resolution and beyond. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4947–4956, 2021. 6

  4. [4]

    Basicvsr++: Improving video super- resolution with enhanced propagation and alignment

    Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. Basicvsr++: Improving video super- resolution with enhanced propagation and alignment. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5972–5981, 2022. 2

  5. [5]

    Bridging diffusion mod- els and 3d representations: A 3d consistent super-resolution framework

    Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo, Alexan- der Schwing, and Jia-Bin Huang. Bridging diffusion mod- els and 3d representations: A 3d consistent super-resolution framework. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13481–13490, 2025. 2

  6. [6]

    Srgs: Super-resolution 3d gaussian splatting.arXiv preprint arXiv:2404.10318, 2024

    Xiang Feng, Yongbo He, Yubo Wang, Yan Yang, Wen Li, Yifei Chen, Zhenzhong Kuang, Jianping Fan, Yu Jun, et al. Srgs: Super-resolution 3d gaussian splatting.arXiv preprint arXiv:2404.10318, 2024. 2, 6, 7, 8

  7. [7]

    Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017. 6

  8. [8]

    Scope of va- lidity of psnr in image/video quality assessment.Electronics letters, 44(13):800–801, 2008

    Quan Huynh-Thu and Mohammed Ghanbari. Scope of va- lidity of psnr in image/video quality assessment.Electronics letters, 44(13):800–801, 2008. 6

  9. [9]

    Musiq: Multi-scale image quality transformer

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021. 6

  10. [10]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  11. [11]

    A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024

    Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024. 2, 3

  12. [12]

    Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

    Bryan Sangwoo Kim, Jeongsol Kim, and Jong Chul Ye. Chain-of-zoom: Extreme super-resolution via scale au- toregression and preference alignment.arXiv preprint arXiv:2505.18600, 2025. 3, 6

  13. [13]

    Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics (ToG), 36 (4):1–13, 2017

    Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics (ToG), 36 (4):1–13, 2017. 5

  14. [14]

    Sequence matters: Har- nessing video models in 3d super-resolution

    Hyun-kyu Ko, Dongheok Park, Youngin Park, Byeonghyeon Lee, Juhee Han, and Eunbyung Park. Sequence matters: Har- nessing video models in 3d super-resolution. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 4356–4364, 2025. 2, 6, 7, 8

  15. [15]

    Lodge: Level-of- detail large-scale gaussian splatting with efficient rendering

    Jonas Kulhanek, Marie-Julie Rakotosaona, Fabian Man- hardt, Christina Tsalicoglou, Michael Niemeyer, Torsten Sat- tler, Songyou Peng, and Federico Tombari. Lodge: Level-of- detail large-scale gaussian splatting with efficient rendering. arXiv preprint arXiv:2505.23158, 2025. 2, 3

  16. [16]

    Photo- realistic single image super-resolution using a generative ad- versarial network

    Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo- realistic single image super-resolution using a generative ad- versarial network. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690,

  17. [17]

    Disr-nerf: Diffusion-guided view-consistent super-resolution nerf

    Jie Long Lee, Chen Li, and Gim Hee Lee. Disr-nerf: Diffusion-guided view-consistent super-resolution nerf. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20561–20570, 2024. 2

  18. [18]

    Swinir: Image restoration us- ing swin transformer

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration us- ing swin transformer. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1833–1844,

  19. [19]

    Enhanced deep residual networks for single image super-resolution

    Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InProceedings of the IEEE confer- ence on computer vision and pattern recognition workshops, pages 136–144, 2017. 2

  20. [20]

    completely blind

    Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 6

  21. [21]

    Optical flow estima- tion using a spatial pyramid network

    Anurag Ranjan and Michael J Black. Optical flow estima- tion using a spatial pyramid network. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 4161–4170, 2017. 3

  22. [22]

    Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians.arXiv preprint arXiv:2403.17898, 2024

    Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, and Bo Dai. Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians.arXiv preprint arXiv:2403.17898, 2024. 2, 3

  23. [23]

    Flod: Integrating flexible level of detail into 3d gaussian splatting for customizable rendering.arXiv preprint arXiv:2408.12894, 2024

    Yunji Seo, Young Sun Choi, Hyun Seung Son, and Youngjung Uh. Flod: Integrating flexible level of detail into 3d gaussian splatting for customizable rendering.arXiv preprint arXiv:2408.12894, 2024. 2, 3

  24. [24]

    Su- pergaussian: Repurposing video models for 3d super reso- lution

    Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J Mitra, Shenlong Wang, and Anna Fr ¨uhst¨uck. Su- pergaussian: Repurposing video models for 3d super reso- lution. InEuropean Conference on Computer Vision, pages 215–233. Springer, 2024. 2, 6, 7, 8

  25. [25]

    Rethinking alignment in video super- resolution transformers.Advances in Neural Information Processing Systems, 35:36081–36093, 2022

    Shuwei Shi, Jinjin Gu, Liangbin Xie, Xintao Wang, Yujiu Yang, and Chao Dong. Rethinking alignment in video super- resolution transformers.Advances in Neural Information Processing Systems, 35:36081–36093, 2022. 2, 6

  26. [26]

    One-step diffusion for detail-rich and temporally consistent video super-resolution

    Yujing Sun, Lingchen Sun, Shuaizheng Liu, Rongyuan Wu, Zhengqiang Zhang, and Lei Zhang. One-step diffusion for detail-rich and temporally consistent video super-resolution. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems, 2025. 2, 6

  27. [27]

    Towards Accurate Generative Models of Video: A New Metric & Challenges

    Thomas Unterthiner, Sjoerd Van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. To- wards accurate generative models of video: A new metric & challenges.arXiv preprint arXiv:1812.01717, 2018. 8

  28. [28]

    Ex- ploring clip for assessing the look and feel of images

    Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 6

  29. [29]

    Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

    Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024. 2

  30. [30]

    Esrgan: En- hanced super-resolution generative adversarial networks

    Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: En- hanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018. 2

  31. [31]

    Edvr: Video restoration with enhanced deformable convolutional networks

    Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. Edvr: Video restoration with enhanced deformable convolutional networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019. 2

  32. [32]

    Real-esrgan: Training real-world blind super-resolution with pure synthetic data

    Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1905–1914,

  33. [33]

    Genera- tive powers of ten

    Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steven M Seitz, Ira Kemelmacher-Shlizerman, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, and Aleksander Holynski. Genera- tive powers of ten. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 7173–7182, 2024. 3

  34. [34]

    Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6

  35. [35]

    One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Process- ing Systems, 37:92529–92553, 2024

    Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Process- ing Systems, 37:92529–92553, 2024. 2

  36. [36]

    Supergs: Super-resolution 3d gaussian splatting via latent feature field and gradient-guided splitting.arXiv preprint arXiv:2410.02571, 1, 2024

    Shiyun Xie, Zhiru Wang, Yinghao Zhu, and Chengwei Pan. Supergs: Super-resolution 3d gaussian splatting via latent feature field and gradient-guided splitting.arXiv preprint arXiv:2410.02571, 1, 2024. 2

  37. [37]

    Videogigagan: Towards detail-rich video super-resolution

    Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, and Difan Liu. Videogigagan: Towards detail-rich video super-resolution. InProceedings of the Computer Vision and Pattern Recog- nition Conference, pages 2139–2149, 2025. 6

  38. [38]

    Gaus- siansr: 3d gaussian super-resolution with 2d diffusion priors

    Xiqian Yu, Hanxin Zhu, Tianyu He, and Zhibo Chen. Gaus- siansr: 3d gaussian super-resolution with 2d diffusion priors. arXiv preprint arXiv:2406.10111, 2024. 2

  39. [39]

    Mip-splatting: Alias-free 3d gaussian splat- ting

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,

  40. [40]

    Resshift: Efficient diffusion model for image super- resolution by residual shifting.Advances in Neural Infor- mation Processing Systems, 36:13294–13307, 2023

    Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super- resolution by residual shifting.Advances in Neural Infor- mation Processing Systems, 36:13294–13307, 2023. 2

  41. [41]

    Rade-gs: Rasterizing depth in gaussian splatting.arXiv preprint arXiv:2406.01467, 2024

    Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, and Ping Tan. Rade-gs: Rasterizing depth in gaussian splatting.arXiv preprint arXiv:2406.01467, 2024. 2, 5, 6

  42. [42]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6

  43. [43]

    Image super-resolution using very deep residual channel attention networks

    Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. InProceedings of the European conference on computer vision (ECCV), pages 286–301, 2018. 2

  44. [44]

    Upscale-a-video: Temporal- consistent diffusion model for real-world video super- resolution

    Shangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, and Chen Change Loy. Upscale-a-video: Temporal- consistent diffusion model for real-world video super- resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2535– 2545, 2024. 2