PR-IQA: Partial-Reference Image Quality Assessment for Diffusion-Based Novel View Synthesis
Pith reviewed 2026-05-10 19:02 UTC · model grok-4.3
The pith
PR-IQA evaluates diffusion-generated novel views with full-reference accuracy using only partial references from other poses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PR-IQA computes a geometrically consistent partial quality map in overlapping regions, then applies cross-attention over reference-view context to inpaint a dense quality map. This map identifies reliable regions for supervision in diffusion-augmented 3D Gaussian Splatting, allowing the pipeline to avoid propagating inconsistencies from the synthesized views and thereby produce more accurate 3D reconstructions and novel-view results.
What carries the argument
Cross-attention completion of a partial quality map that incorporates reference-view context to enforce cross-view consistency across the full image.
If this is right
- PR-IQA reaches accuracy levels comparable to full-reference IQA methods while requiring no ground-truth images.
- Restricting 3DGS supervision to PR-IQA high-confidence regions reduces the impact of photometric and geometric inconsistencies.
- The resulting 3D reconstructions and novel-view renderings outperform those produced when supervision uses unfiltered diffusion outputs or earlier IQA methods.
Where Pith is reading between the lines
- The same partial-to-dense completion pattern could be tested on other generative models that produce multi-view content when only sparse references are available.
- If the cross-attention reliably transfers quality signals, the method may reduce the number of reference views needed for stable supervision in sparse-view pipelines.
- The approach opens a route to quality-aware training loops that adaptively weight generated views without ever needing full ground truth.
Load-bearing premise
The cross-attention step can accurately extend the partial quality map without introducing new errors in non-overlapping regions.
What would settle it
On a dataset where ground-truth images exist, measure how closely the completed PR-IQA maps match full-reference quality maps in regions outside the original overlaps; large systematic differences would indicate failure of the completion step.
Figures
read the original abstract
Diffusion models are promising for sparse-view novel view synthesis (NVS), as they can generate pseudo-ground-truth views to aid 3D reconstruction pipelines like 3D Gaussian Splatting (3DGS). However, these synthesized images often contain photometric and geometric inconsistencies, and their direct use for supervision can impair reconstruction. To address this, we propose Partial-Reference Image Quality Assessment (PR-IQA), a framework that evaluates diffusion-generated views using reference images from different poses, eliminating the need for ground truth. PR-IQA first computes a geometrically consistent partial quality map in overlapping regions. It then performs quality completion to inpaint this partial map into a dense, full-image map. This completion is achieved via a cross-attention mechanism that incorporates reference-view context, ensuring cross-view consistency and enabling thorough quality assessment. When integrated into a diffusion-augmented 3DGS pipeline, PR-IQA restricts supervision to high-confidence regions identified by its quality maps. Experiments demonstrate that PR-IQA outperforms existing IQA methods, achieving full-reference-level accuracy without ground-truth supervision. Thus, our quality-aware 3DGS approach more effectively filters inconsistencies, producing superior 3D reconstructions and NVS results. The project page is available at https://kakaomacao.github.io/pr-iqa-project-page/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes PR-IQA, a partial-reference image quality assessment framework for diffusion-generated views in sparse-view novel view synthesis. It first derives geometrically consistent partial quality maps from overlapping regions across reference views, then applies a cross-attention mechanism to complete these into dense full-image quality maps. The completed maps identify high-confidence regions for selective supervision in a diffusion-augmented 3D Gaussian Splatting pipeline, with the central claim that PR-IQA outperforms existing IQA methods and reaches full-reference-level accuracy without requiring ground-truth images.
Significance. If the core claims hold, the work offers a practical advance for reliable use of diffusion models in 3D reconstruction pipelines by mitigating photometric and geometric inconsistencies without ground truth. The partial-reference strategy and cross-attention completion could improve filtering in 3DGS and similar methods, leading to better NVS results in sparse-view settings. The approach is grounded in a concrete application rather than purely theoretical.
major comments (2)
- [Quality Completion / Cross-Attention Module] The quality completion step (cross-attention inpainting of partial maps) is load-bearing for the claim of full-reference-level accuracy without GT supervision. The manuscript provides no direct pixel-level or region-level validation, such as PLCC/SRCC or error maps comparing the completed PR-IQA maps against oracle FR-IQA maps (e.g., LPIPS or SSIM) computed on held-out ground-truth views for non-overlapping regions. End-to-end 3D reconstruction metrics alone cannot confirm that the attention mechanism correctly identifies inconsistencies rather than introducing new artifacts.
- [Experiments] Experiments section: the claim that PR-IQA 'outperforms existing IQA methods' and achieves 'full-reference-level accuracy' requires explicit quantitative support. Tables should report direct comparisons (e.g., correlation coefficients with FR-IQA baselines on standard NVS datasets) with statistical significance or error bars; without these, the superiority and accuracy assertions remain under-supported relative to the central claim.
minor comments (2)
- [Method] Notation for the partial quality map and cross-attention inputs should be defined more explicitly (e.g., symbols for overlap masks and reference features) to improve readability.
- [Abstract / Experiments] Ensure the project page includes the full implementation details, code, and any pre-trained models referenced in the experiments for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have prepared revisions to provide stronger direct validation for the quality completion module and the performance claims.
read point-by-point responses
-
Referee: The quality completion step (cross-attention inpainting of partial maps) is load-bearing for the claim of full-reference-level accuracy without GT supervision. The manuscript provides no direct pixel-level or region-level validation, such as PLCC/SRCC or error maps comparing the completed PR-IQA maps against oracle FR-IQA maps (e.g., LPIPS or SSIM) computed on held-out ground-truth views for non-overlapping regions. End-to-end 3D reconstruction metrics alone cannot confirm that the attention mechanism correctly identifies inconsistencies rather than introducing new artifacts.
Authors: We agree that direct validation of the cross-attention completion is essential to substantiate the full-reference accuracy claim. In the revised manuscript, we have added a dedicated analysis section with quantitative comparisons: PLCC and SRCC between completed PR-IQA maps and oracle FR-IQA maps (LPIPS and SSIM) computed on held-out ground-truth views, restricted to non-overlapping regions. We also include qualitative error maps demonstrating that the mechanism recovers inconsistencies without introducing artifacts. These results confirm the completion step's fidelity and address the concern that end-to-end metrics alone are insufficient. revision: yes
-
Referee: Experiments section: the claim that PR-IQA 'outperforms existing IQA methods' and achieves 'full-reference-level accuracy' requires explicit quantitative support. Tables should report direct comparisons (e.g., correlation coefficients with FR-IQA baselines on standard NVS datasets) with statistical significance or error bars; without these, the superiority and accuracy assertions remain under-supported relative to the central claim.
Authors: We acknowledge that more explicit quantitative tables are needed to support the superiority and accuracy claims. We have expanded the Experiments section with new tables reporting PLCC and SRCC correlations of PR-IQA against FR-IQA baselines (LPIPS, SSIM, PSNR) on standard NVS datasets including LLFF and DTU. These tables now incorporate error bars from multiple runs and statistical significance tests (p-values). The added results show PR-IQA outperforming other IQA methods while approaching full-reference performance levels, directly bolstering the central claims. revision: yes
Circularity Check
No significant circularity; method is self-contained with external validation
full rationale
The paper introduces PR-IQA as a novel framework that first computes a geometrically consistent partial quality map from overlapping reference regions and then completes it to a dense map via cross-attention incorporating reference-view context. No equations, derivations, or self-citations are shown that reduce the claimed full-reference-level accuracy to fitted inputs, self-definitions, or prior author results by construction. The core claim rests on the design of the partial-map + cross-attention pipeline and its empirical performance on downstream 3DGS reconstruction metrics, which are independent of the method's internal definitions. This is the common case of an honest proposal whose correctness can be externally tested rather than a tautological reduction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PR-IQA first computes a geometrically consistent partial quality map in overlapping regions... via cosine similarity... then performs quality completion... via a reference-conditioned cross-attention mechanism
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We train our model to predict a quality map Q that approximates a GT map Q*... using DINOv2 feature-similarity map or SSIM
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Map-free visual relocalization: Metric pose relative to a single image
Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Aron Monszpart, Victor Prisacariu, Dani- yar Turmukhambetov, and Eric Brachmann. Map-free visual relocalization: Metric pose relative to a single image. In ECCV, pages 690–708, 2022. 6, 2
work page 2022
-
[2]
MET3R: Measuring multi-view consistency in generated images
Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, and Jan Eric Lenssen. MET3R: Measuring multi-view consistency in generated images. InCVPR, pages 6034–6044, 2025. 1, 3, 6
work page 2025
-
[3]
Chong Bao, Xiyu Zhang, Zehao Yu, Jiale Shi, Guofeng Zhang, Songyou Peng, and Zhaopeng Cui. Free360: Layered gaussian splatting for unbounded 360-degree view synthesis from extremely sparse and unposed views. InCVPR, pages 16377–16387, 2025. 1, 3
work page 2025
-
[4]
Barron, Ben Mildenhall, Dor Verbin, Pratul P
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded anti-aliased neural radiance fields. InCVPR, pages 5470– 5479, 2022. 6
work page 2022
-
[5]
Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting Li, Deying Li, Lun Luo, Minhang Wang, and Jintao Xu. Dust to tower: Coarse-to-fine photo- realistic scene reconstruction from sparse uncalibrated im- ages.arXiv preprint arXiv:2412.19518, 2024. 1, 3
-
[6]
PKD: general distillation frame- work for object detectors via pearson correlation coefficient
Weihan Cao, Yifan Zhang, Jianfei Gao, Anda Cheng, Ke Cheng, and Jian Cheng. PKD: general distillation frame- work for object detectors via pearson correlation coefficient. InNeurIPS, 2022. 5, 1
work page 2022
-
[7]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021. 3
work page 2021
-
[8]
Antonio Criminisi, Patrick P ´erez, and Kentaro Toyama. Re- gion filling and object removal by exemplar-based image in- painting.IEEE TIP, 13(9):1200–1212, 2004. 2
work page 2004
-
[9]
Generalized jensen- shannon divergence loss for learning with noisy labels
Erik Englesson and Hossein Azizpour. Generalized jensen- shannon divergence loss for learning with noisy labels. In NeurIPS, pages 30284–30297, 2021. 5, 1
work page 2021
-
[10]
Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feld- man, Zhoutong Zhang, and William T. Freeman. Featup: A model-agnostic framework for features at any resolution. InICLR, 2024. 3
work page 2024
-
[11]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR,
-
[12]
Gaussian error linear units (gelus), 2016
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus), 2016. 1
work page 2016
-
[13]
Nicolai Hermann, Jorge Condor, and Piotr Didyk. Puzzle similarity: A perceptually-guided cross-reference metric for artifact detection in 3d scene reconstructions. InICCV, pages 28881–28891, 2025. 1, 3, 6, 4
work page 2025
-
[14]
Denoising dif- fusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InNeurIPS, pages 6840–6851,
-
[15]
LoftUp: Learning a coordinate- based feature upsampler for vision foundation models
Haiwen Huang, Anpei Chen, V olodymyr Havrylov, Andreas Geiger, and Dan Zhang. LoftUp: Learning a coordinate- based feature upsampler for vision foundation models. In ICCV, 2025. 4
work page 2025
-
[16]
3D gaussian splatting for real-time radiance field rendering.ACM TOG, 42(4):139–1, 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D gaussian splatting for real-time radiance field rendering.ACM TOG, 42(4):139–1, 2023. 1, 6, 7, 8
work page 2023
-
[17]
Pick-a-pic: An open dataset of user preferences for text-to-image generation
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma- tiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation. In NeurIPS, 2023. 11
work page 2023
-
[18]
Tanks and temples: Benchmarking large-scale scene reconstruction.ACM TOG, 36(4), 2017
Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction.ACM TOG, 36(4), 2017. 6
work page 2017
-
[19]
Zero-1-to-3: Zero-shot one image to 3D object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3D object. InICCV, pages 9298– 9309, 2023. 3
work page 2023
-
[20]
Xinhang Liu, Jiaben Chen, Shiu-Hong Kao, Yu-Wing Tai, and Chi-Keung Tang. Deceptive-NeRF/3DGS: Diffusion- generated pseudo-observations for high-quality sparse-view reconstruction. InECCV, pages 337–355. Springer, 2024. 3
work page 2024
-
[21]
Text-guided texturing by synchronized multi-view diffusion
Yuxin Liu, Minshan Xie, Hanyuan Liu, and Tien-Tsin Wong. Text-guided texturing by synchronized multi-view diffusion. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11. ACM, 2024. 3
work page 2024
-
[22]
SGDR: stochastic gradient descent with warm restarts
Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with warm restarts. InICLR, 2017. 3
work page 2017
-
[23]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view syn- thesis. InECCV, pages 405–421. Springer, 2020. 1
work page 2020
-
[24]
No-reference image quality assessment in the spa- tial domain.IEEE TIP, 21(12):4695–4708, 2012
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spa- tial domain.IEEE TIP, 21(12):4695–4708, 2012. 1, 3
work page 2012
-
[25]
Barron, Ben Mildenhall, Mehdi S
Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, and Noha Radwan. RegNeRF: Regularizing neural radiance fields for view syn- thesis from sparse inputs. InCVPR, pages 5480–5490, 2022. 1
work page 2022
-
[26]
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e J´egou, Julien Mairal, P...
work page 2024
-
[27]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. Context encoders: Feature learning by inpainting. InCVPR, pages 2536–2544, 2016. 2
work page 2016
-
[28]
Gen3C: 3d-informed world-consistent video generation with precise camera con- trol
Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas M ¨uller, Alexan- der Keller, Sanja Fidler, and Jun Gao. Gen3C: 3d-informed world-consistent video generation with precise camera con- trol. InCVPR, pages 6121–6132, 2025. 5
work page 2025
-
[29]
High-resolution image syn- thesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, pages 10684– 10695, 2022. 1, 2
work page 2022
-
[30]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InMICCAI, 2015. 1
work page 2015
-
[31]
Denois- ing diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InICLR, 2021. 1, 2
work page 2021
-
[32]
MVDiffusion: Enabling holistic multi- view image generation with correspondence-aware diffusion
Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, and Yasutaka Furukawa. MVDiffusion: Enabling holistic multi- view image generation with correspondence-aware diffusion. InNeurIPS, 2023. 3
work page 2023
-
[33]
Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Fu- rukawa, and Rakesh Ranjan. MVDiffusion++: a dense high- resolution multi-view diffusion model for single or sparse- view 3D object reconstruction. InECCV, pages 175–191. Springer, 2024. 3
work page 2024
-
[34]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, pages 5998–6008, 2017. 5
work page 2017
-
[35]
Narasimhan Venkatanath, D. Praneeth, S. Channappayya Sumohana, S. Medasani Swarup, et al. Blind image quality evaluation using perception based features. In2015 Twenty First National Conference on Communications (NCC), pages 1–6. IEEE, 2015. 1, 3, 6
work page 2015
-
[36]
VGGT: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. VGGT: Visual geometry grounded transformer. InCVPR, pages 5294–5306, 2025. 4, 7
work page 2025
-
[37]
DUSt3R: Geometric 3D vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D vision made easy. InCVPR, pages 20697–20709, 2024. 5, 3
work page 2024
-
[38]
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE TIP, 13(4):600–612, 2004. 1, 3, 6
work page 2004
-
[39]
CrossScore: Towards multi-view image evaluation and scor- ing
Zirui Wang, Wenjing Bian, and Victor Adrian Prisacariu. CrossScore: Towards multi-view image evaluation and scor- ing. InECCV, pages 492–510, 2024. 1, 3, 6
work page 2024
-
[40]
Zirui Wang, Yash Bhalgat, Ruining Li, and Victor Adrian Prisacariu. Active view selector: Fast and accurate active view selection with cross reference image quality assess- ment.arXiv preprint arXiv:2506.19844, 2025. 3
-
[41]
Novel view synthesis with diffusion models
Daniel Watson, William Chan, Ricardo Martin-Brualla, Jonathan Ho, Andrea Tagliasacchi, and Mohammad Norouzi. Novel view synthesis with diffusion models. In ICLR, 2023. 3
work page 2023
-
[42]
CBAM: convolutional block attention module
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. CBAM: convolutional block attention module. In ECCV, pages 3–19, 2018. 4
work page 2018
-
[43]
DiffusioNeRF: Regularizing neural radiance fields with denoising diffusion models
Jamie Wynn and Daniyar Turmukhambetov. DiffusioNeRF: Regularizing neural radiance fields with denoising diffusion models. InCVPR, pages 4180–4189, 2023. 3
work page 2023
-
[44]
From patches to pic- tures (PaQ-2-PiQ): Mapping the perceptual space of picture quality
Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Mahajan, Deepti Ghadiyaram, and Alan Bovik. From patches to pic- tures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. InCVPR, pages 3575–3585, 2020. 1, 3, 6
work page 2020
-
[45]
Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, and Jiajun Wu. WonderWorld: Interactive 3d scene generation from a single image. InCVPR, pages 5916–5926,
-
[46]
Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, and Yonghong Tian. ViewCrafter: Taming video diffusion models for high-fidelity novel view synthesis.IEEE TPAMI, pages 1–18, 2025. 1, 3, 5, 6, 8, 2
work page 2025
-
[47]
Perceptual artifacts localiza- tion for image synthesis tasks
Lingzhi Zhang, Zhengjie Xu, Connelly Barnes, Yuqian Zhou, Qing Liu, He Zhang, Sohrab Amirghodsi, Zhe Lin, Eli Shechtman, and Jianbo Shi. Perceptual artifacts localiza- tion for image synthesis tasks. InICCV, pages 7579–7590,
-
[48]
Efros, Eli Shecht- man, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 1, 3, 6
work page 2018
-
[49]
Stable virtual camera: Generative view synthesis with diffusion models, 2025
Jensen Zhou, Hang Gao, Vikram V oleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, and Varun Jampani. Stable virtual camera: Generative view synthesis with diffusion models, 2025. 5
work page 2025
-
[50]
Stereo magnification: Learning view syn- thesis using multiplane images.ACM TOG, 37(4), 2018
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: Learning view syn- thesis using multiplane images.ACM TOG, 37(4), 2018. 6 PR-IQA: Partial-Reference Image Quality Assessment for Diffusion-Based Novel View Synthesis Supplementary Material This supplementary material complements the main pa- per by providing co...
work page 2018
-
[51]
Architecture Details As illustrated in Fig
Method Details 7.1. Architecture Details As illustrated in Fig. 5, our architecture adopts a U-Net- like [30] encoder-decoder design, leveraging DINOv2 [26] as the feature backbone. The network utilizes GELU [12] as the activation function throughout all layers. Detailed speci- fications, including resolution, channel dimensions, and the number of blocks ...
-
[52]
Experimental Details 8.1. Training Data Generation Frame Sampling.We utilize the Map-free Visual Relo- calization (MFR) dataset [1] as our primary source. For each scene, we uniformly sample 200 frames along the cam- era trajectory, explicitly including the start and end frames. Table 5. List of evaluation scenes. We enumerate the specific scenes and sequ...
-
[53]
More Experimental Results 9.1. Evaluation on Alternative FR-IQA Targets Although our Partial-Reference (PR-IQA) framework is trained to optimize DINOv2-SIM and SSIM maps, we ex- tend our evaluation to alternative FR-IQA targets, specif- ically PSNR and LPIPS, to assess the generalization capa- bility of our predicted quality maps. Table 6 summarizes the P...
-
[54]
More Ablation Studies on IQA 10.1. Impact of the Number of Reference Images We conducted an ablation study to analyze the sensitivity of our PR-IQA framework to the number of available reference imagesN ref. In this experiment, we variedN ref from 1 to 10 by selecting reference views at regular intervals from the corresponding image sequence. Fig. 6 illus...
-
[55]
More Ablation Studies on 3DGS 11.1. Effectiveness of DINOv2 Feature Similarity We validate the rationale behind selecting DINOv2 feature similarity (i.e., DINOv2-SIM) as our primary optimization target by comparing its effectiveness against standard FR- IQA metrics: PSNR, SSIM, and LPIPS. To ensure a fair comparison, we integrated these metrics into the “...
-
[56]
More Qualitative Results 12.1. More Qualitative Results for Quality Map We provide extensive qualitative comparisons on scenes not featured in the main manuscript. Figs. 10, 11, and 12 il- lustrate results across the Mip-NeRF 360, Tanks and Tem- ples, and RealEstate10K datasets, respectively. As shown in these figures, our PR-IQA generates quality maps th...
-
[57]
Limitations and Discussion While PR-IQA achieves state-of-the-art performance in CR-IQA and significantly enhances sparse-view 3DGS re- construction, we acknowledge several limitations and out- line avenues for future research. First, PR-IQA is currently trained using pseudo-GT quality maps derived from FR metrics, specifically DINOv2 feature similarity o...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.