Recognition: no theorem link
UniFixer: A Universal Reference-Guided Fixer for Diffusion-Based View Synthesis
Pith reviewed 2026-05-13 07:03 UTC · model grok-4.3
The pith
UniFixer repairs diffusion degradations in view synthesis with a reference-guided coarse-to-fine refiner.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UniFixer is a universal reference-guided framework that fixes diverse diffusion degradations via a coarse-to-fine strategy. A reference pre-alignment module first performs coarse alignment between the reference view and the degraded novel view. A global structure anchoring mechanism then rectifies geometric distortions, followed by a local detail injection module that recovers fine-grained texture details. This enables plug-and-play zero-shot fixing across diffusion backbones and achieves state-of-the-art performance on novel view synthesis and stereo conversion.
What carries the argument
The coarse-to-fine refiner consisting of reference pre-alignment, global structure anchoring, and local detail injection modules that use a single reference view to correct degradations.
If this is right
- Achieves state-of-the-art results on novel view synthesis benchmarks without task-specific retraining.
- Extends directly to stereo conversion with the same reference-guided modules.
- Operates zero-shot across different diffusion model architectures and scenes.
- Reduces blurred details and geometric distortions while preserving structural fidelity.
Where Pith is reading between the lines
- The same modules could be tested on diffusion-based video generation to enforce temporal consistency across frames.
- Performance may degrade when the reference view comes from an extreme viewpoint angle not covered in training.
- Chaining multiple reference views could further reduce residual artifacts in complex scenes.
Load-bearing premise
A single reference view always supplies enough undistorted information to correct all three degradation types without introducing new artifacts, and the modules generalize zero-shot to unseen diffusion backbones and scenes.
What would settle it
Applying UniFixer to a new diffusion backbone on a scene where the reference view lacks key details and checking whether output artifacts persist or new distortions appear.
Figures
read the original abstract
With the recent surge of generative models, diffusion-based approaches have become mainstream for view synthesis tasks, either in an explicit depth-warp-inpaint or in an implicit end-to-end manner. Despite their success, both paradigms often suffer from noticeable quality degradation, e.g., blurred details and distorted structures, caused by pixel-to-latent compression and diffusion hallucination. In this paper, we investigate diffusion degradation from three key dimensions (i.e., spatial, temporal, and backbone-related) and propose UniFixer, a universal reference-guided framework that fixes diverse degradation artifacts via a coarse-to-fine strategy. Specifically, a reference pre-alignment module is first designed to perform coarse alignment between the reference view and the degraded novel view. A global structure anchoring mechanism then rectifies geometric distortions to ensure structural fidelity, followed by a local detail injection module that recovers fine-grained texture details for high-quality view synthesis. Our UniFixer serves as a plug-and-play refiner that achieves zero-shot fixing across different types of diffusion degradation, and extensive experiments verify our state-of-the-art performance on novel view synthesis and stereo conversion.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes UniFixer, a universal reference-guided framework for correcting degradation artifacts in diffusion-based view synthesis. It decomposes degradation into spatial, temporal, and backbone-related dimensions and employs a coarse-to-fine pipeline consisting of a reference pre-alignment module, a global structure anchoring mechanism, and a local detail injection module. The central claim is that this architecture acts as a plug-and-play, zero-shot refiner that achieves state-of-the-art performance on novel view synthesis and stereo conversion tasks.
Significance. If the zero-shot and SOTA claims are substantiated, the work would offer a practical, model-agnostic post-processing tool that mitigates common diffusion artifacts (blurring, geometric distortion) without retraining the underlying generative backbone. This could be broadly useful given the prevalence of diffusion models in view synthesis pipelines.
major comments (2)
- [Abstract] Abstract: the assertion of 'state-of-the-art performance' and 'extensive experiments' is unsupported by any quantitative metrics, tables, error bars, ablation studies, or dataset descriptions, rendering the central empirical claim unverifiable from the provided text.
- [Abstract] Abstract: the zero-shot generalization claim for the coarse-to-fine modules across unseen diffusion backbones rests on an untested assumption that degradation statistics are sufficiently similar; no evidence is supplied that the reference pre-alignment, global structure anchoring, or local detail injection steps avoid introducing new geometric or texture artifacts when latent spaces or sampling schedules differ.
minor comments (1)
- [Abstract] Abstract: the three degradation dimensions (spatial, temporal, backbone-related) are introduced without a supporting citation or prior reference to establish the taxonomy.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the concerns about the abstract's empirical claims by clarifying the supporting evidence in the full manuscript and committing to revisions for better verifiability. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion of 'state-of-the-art performance' and 'extensive experiments' is unsupported by any quantitative metrics, tables, error bars, ablation studies, or dataset descriptions, rendering the central empirical claim unverifiable from the provided text.
Authors: We acknowledge that the abstract, as a concise summary, omits the detailed metrics. The full manuscript contains quantitative tables with metrics such as PSNR/SSIM/LPIPS, error bars, ablation studies, and dataset descriptions for novel view synthesis and stereo conversion in the Experiments section. To improve verifiability, we will revise the abstract to include key performance highlights substantiating the SOTA claim. revision: yes
-
Referee: [Abstract] Abstract: the zero-shot generalization claim for the coarse-to-fine modules across unseen diffusion backbones rests on an untested assumption that degradation statistics are sufficiently similar; no evidence is supplied that the reference pre-alignment, global structure anchoring, or local detail injection steps avoid introducing new geometric or texture artifacts when latent spaces or sampling schedules differ.
Authors: Our experiments evaluate UniFixer on multiple diffusion-based view synthesis pipelines with different backbones and sampling schedules, showing consistent zero-shot improvements without new artifacts via quantitative and qualitative results. The reference-guided design aims to be backbone-agnostic. We agree that explicit tests on additional unseen backbones would strengthen the claim, and we will add further analysis or experiments in the revision. revision: partial
Circularity Check
No circularity: independent architectural modules with no derivation reducing to inputs
full rationale
The paper introduces UniFixer as a plug-and-play refiner consisting of three explicitly designed modules (reference pre-alignment, global structure anchoring, local detail injection) that operate via a coarse-to-fine strategy on diffusion degradations. No equations, fitted parameters, predictions, or first-principles derivations appear in the abstract or described method. The central claim rests on the novelty of the reference-guided framework and its zero-shot applicability, verified by experiments rather than any self-referential reduction. No self-citation load-bearing, ansatz smuggling, or renaming of known results is present; the contribution is an engineering architecture independent of its own inputs.
Axiom & Free-Parameter Ledger
invented entities (3)
-
reference pre-alignment module
no independent evidence
-
global structure anchoring mechanism
no independent evidence
-
local detail injection module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
An, J., Huang, S., Song, Y., Dou, D., Liu, W., Luo, J.: Artflow: Unbiased im- age style transfer via reversible neural flows. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 862–871 (2021)
work page 2021
-
[2]
arXiv preprint arXiv:2509.19296 (2025)
Bahmani, S., Shen, T., Ren, J., Huang, J., Jiang, Y., Turki, H., Tagliasacchi, A., Lindell, D.B., Gojcic, Z., Fidler, S., et al.: Lyra: Generative 3d scene reconstruction via video diffusion model self-distillation. arXiv preprint arXiv:2509.19296 (2025)
-
[3]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Bai,J., Xia, M., Fu, X.,Wang, X.,Mu, L., Cao, J.,Liu, Z., Hu, H.,Bai, X., Wan, P., et al.: Recammaster: Camera-controlled generative rendering from a single video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14834–14844 (2025)
work page 2025
-
[4]
StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space
Behrens, T., Obukhov, A., Ke, B., Tosi, F., Poggi, M., Schindler, K.: Stereospace: Depth-free synthesis of stereo geometry via end-to-end diffusion in a canonical space. arXiv preprint arXiv:2512.10959 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Bernasconi, M., Djelouah, A., Zhang, Y., Gross, M., Schroers, C.: Rebair: Reference-basedimagerestoration.In:ProceedingsoftheIEEE/CVFInternational Conference on Computer Vision. pp. 5489–5498 (2025)
work page 2025
-
[6]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., et al.: Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
In: European Conference on Computer Vision
Bösiger,L.,Dusmanu,M.,Pollefeys,M.,Bauer,Z.:Mariner:Enhancingnovelviews by matching rendered images with nearby references. In: European Conference on Computer Vision. pp. 76–94. Springer (2024)
work page 2024
-
[8]
In: European conference on computer vision
Cao, J., Liang, J., Zhang, K., Li, Y., Zhang, Y., Wang, W., Gool, L.V.: Reference- based image super-resolution with deformable attention transformer. In: European conference on computer vision. pp. 325–342. Springer (2022)
work page 2022
-
[9]
arXiv preprint arXiv:2512.08765 (2025)
Chu, R., He, Y., Chen, Z., Zhang, S., Xu, X., Xia, B., Wang, D., Yi, H., Liu, X., Zhao, H., et al.: Wan-move: Motion-controllable video generation via latent trajectory guidance. arXiv preprint arXiv:2512.08765 (2025)
- [10]
-
[11]
IEEE Transactions on Pattern Analysis and Machine Intelligence44(5), 2567–2581 (2022)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unify- ing structure and texture similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence44(5), 2567–2581 (2022)
work page 2022
-
[12]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
- [13]
-
[14]
Gao, R., Holynski, A., Henzler, P., Brussee, A., Martin-Brualla, R., Srinivasan, P., Barron, J.T., Poole, B.: Cat3d: Create anything in 3d with multi-view diffusion models. In: NeurIPS (2024)
work page 2024
- [15]
-
[16]
He, H., Xu, Y., Guo, Y., Wetzstein, G., Dai, B., Li, H., Yang, C.: Cameractrl: En- ablingcameracontrolfortext-to-videogeneration.arXivpreprintarXiv:2404.02101 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
In: Advances in neural information processing systems
Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems. pp. 6626–6637 (2017)
work page 2017
-
[18]
In: Proceedings of the 33rd ACM International Conference on Multimedia
Izadimehr, M., Ghanbari, M., Chen, G., Zhou, W., Hao, X., Dasari, M., Timmerer, C., Amirpour, H.: Svd: Spatial video dataset. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 12988–12994 (2025)
work page 2025
-
[19]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Jiang, Z., Han, Z., Mao, C., Zhang, J., Pan, Y., Liu, Y.: Vace: All-in-one video creation and editing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 17191–17202 (2025)
work page 2025
-
[20]
In: European Conference on Computer Vision
Ju, X., Liu, X., Wang, X., Bian, Y., Shan, Y., Xu, Q.: Brushnet: A plug-and- play image inpainting model with decomposed dual-branch diffusion. In: European Conference on Computer Vision. pp. 150–168. Springer (2024)
work page 2024
-
[21]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Re- purposing diffusion-based image generators for monocular depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 9492–9502 (2024)
work page 2024
-
[22]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5148–5157 (2021)
work page 2021
-
[23]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[24]
Auto-Encoding Variational Bayes
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[25]
arXiv preprint arXiv:2203.13215 (2022)
Kolkin, N., Kucera, M., Paris, S., Sykora, D., Shechtman, E., Shakhnarovich, G.: Neural neighbor style transfer. arXiv preprint arXiv:2203.13215 (2022)
-
[26]
arXiv preprint arXiv:2412.12091 (2024)
Liang, H., Cao, J., Goel, V., Qian, G., Korolev, S., Terzopoulos, D., Plataniotis, K.N., Tulyakov, S., Ren, J.: Wonderland: Navigating 3d scenes from a single image. arXiv preprint arXiv:2412.12091 (2024)
-
[27]
Depth Anything 3: Recovering the Visual Space from Any Views
Lin, H., Chen, S., Liew, J., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. arXiv preprint arXiv:2511.10647 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ling, L., Sheng, Y., Tu, Z., Zhao, W., Xin, C., Wan, K., Yu, L., Guo, Q., Yu, Z., Lu, Y., et al.: Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22160–22169 (2024)
work page 2024
-
[29]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Lu, L., Li, W., Tao, X., Lu, J., Jia, J.: Masa-sr: Matching acceleration and spa- tial adaptation for reference-based image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6368– 6377 (2021)
work page 2021
-
[30]
Journal of machine learning research9(11) (2008)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research9(11) (2008)
work page 2008
- [31]
-
[32]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Mehl, L., Schmalfuss, J., Jahedi, A., Nalivayko, Y., Bruhn, A.: Spring: A high- resolutionhigh-detaildatasetandbenchmarkforsceneflow,opticalflowandstereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4981–4991 (2023) UniFixer: A Universal Reference-Guided Fixer 27
work page 2023
-
[33]
arXiv preprint arXiv:2512.14236 (2025)
Metzger, N., Truong, P., Bhat, G., Schindler, K., Tombari, F.: Elastic3d: Con- trollable stereo video conversion with guided latent decoding. arXiv preprint arXiv:2512.14236 (2025)
-
[34]
In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition
Niklaus, S., Liu, F.: Softmax splatting for video frame interpolation. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5437–5446 (2020)
work page 2020
-
[35]
Peebles,W.,Xie,S.:Scalablediffusionmodelswithtransformers.In:Proceedingsof the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)
work page 2023
-
[36]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ren,X.,Shen,T.,Huang,J.,Ling,H.,Lu,Y.,Nimier-David,M.,Müller,T.,Keller, A., Fidler, S., Gao, J.: Gen3c: 3d-informed world-consistent video generation with precise camera control. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6121–6132 (2025)
work page 2025
-
[37]
In: International Conference on Medical image computing and computer-assisted intervention
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
work page 2015
-
[38]
In: European Conference on Computer Vision
Sauer, A., Lorenz, D., Blattmann, A., Rombach, R.: Adversarial diffusion distilla- tion. In: European Conference on Computer Vision. pp. 87–103. Springer (2024)
work page 2024
-
[39]
arXiv preprint arXiv:2512.16915 (2025)
Shen, G., Du, Y., Ge, W., He, J., Chang, C., Zhou, D., Yang, Z., Wang, L., Tao, X., Chen, Y.C.: Stereopilot: Learning unified and efficient stereo conversion via generative priors. arXiv preprint arXiv:2512.16915 (2025)
-
[40]
arXiv preprint arXiv:2505.16565 (2025)
Shvetsova, N., Bhat, G., Truong, P., Kuehne, H., Tombari, F.: M2svid: End-to-end inpainting and refinement for monocular-to-stereo video conversion. arXiv preprint arXiv:2505.16565 (2025)
-
[41]
Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Advances in Neural Information Processing Systems34, 19313–19325 (2021)
Sitzmann, V., Rezchikov, S., Freeman, B., Tenenbaum, J., Durand, F.: Light field networks: Neural scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems34, 19313–19325 (2021)
work page 2021
-
[43]
Wan: Open and Advanced Large-Scale Video Generative Models
Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.W., Chen, D., Yu, F., Zhao, H., Yang, J., et al.: Wan: Open and advanced large-scale video generative models. arXiv preprint arXiv:2503.20314 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 2555–2563 (2023)
work page 2023
-
[45]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wang, L., Frisvad, J.R., Jensen, M.B., Bigdeli, S.A.: Stereodiffusion: Training- free stereo image generation using latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7416– 7425 (2024)
work page 2024
-
[46]
In: European Conference on Computer Vision
Wang, Y., Lipson, L., Deng, J.: Sea-raft: Simple, efficient, accurate raft for optical flow. In: European Conference on Computer Vision. pp. 36–54. Springer (2024)
work page 2024
-
[47]
IEEE Transactions on Image Process- ing13(4), 600–612 (2004)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Process- ing13(4), 600–612 (2004)
work page 2004
-
[48]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Wu, J.Z., Zhang, Y., Turki, H., Ren, X., Gao, J., Shou, M.Z., Fidler, S., Gojcic, Z., Ling, H.: Difix3d+: Improving 3d reconstructions with single-step diffusion models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 26024–26035 (2025)
work page 2025
-
[49]
arXiv preprint arXiv:2512.09363 (2025) 28 S
Xing,K.,Jin,X.,Li,L.,Yin,Y.,Liang,H.,Luo,G.,Fang,C.,Wang,J.,Plataniotis, K.N., Zhao, Y., et al.: Stereoworld: Geometry-aware monocular-to-stereo video generation. arXiv preprint arXiv:2512.09363 (2025) 28 S. Chen, X. Zhang, Y. Zhang, T. Aydin, C. Schroers
-
[50]
Advances in Neural Information Processing Systems37, 21875–21911 (2024)
Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., Zhao, H.: Depth anything v2. Advances in Neural Information Processing Systems37, 21875–21911 (2024)
work page 2024
-
[51]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition workshops
Yang, S., Wu, T., Shi, S., Lao, S.s., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition workshops. pp. 1190–1199 (2022)
work page 2022
-
[52]
In: Proceedings of the IEEE/CVF international conference on computer vision
Yoo, J., Uh, Y., Chun, S., Kang, B., Ha, J.W.: Photorealistic style transfer via wavelet transforms. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9036–9045 (2019)
work page 2019
-
[53]
In: Proceed- ings of the Computer Vision and Pattern Recognition Conference
Yu, S., Chen, Y., Qi, Z., Xie, Z., Wang, Y., Wang, L., Shan, Y., Lu, H.: Mono2stereo: A benchmark and empirical study for stereo conversion. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 21847–21856 (2025)
work page 2025
-
[54]
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
Yu, W., Xing, J., Yuan, L., Hu, W., Li, X., Huang, Z., Gao, X., Wong, T.T., Shan, Y., Tian, Y.: Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis. arXiv preprint arXiv:2409.02048 (2024)
work page internal anchor Pith review arXiv 2024
-
[55]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
work page 2018
-
[56]
Advances in Neural Information Processing Sys- tems37, 108674–108709 (2024)
Zhang, X., Ke, B., Riemenschneider, H., Metzger, N., Obukhov, A., Gross, M., Schindler, K., Schroers, C.: Betterdepth: Plug-and-play diffusion refiner for zero- shot monocular depth estimation. Advances in Neural Information Processing Sys- tems37, 108674–108709 (2024)
work page 2024
-
[57]
Zhang, X., Zhang, Y., Mehl, L., Gross, M., Schroers, C.: High-fidelity novel view synthesis via splatting-guided diffusion. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. pp. 1–11 (2025)
work page 2025
-
[58]
In: Proceedings of the Computer Vision and Pattern Recognition Conference (2026)
Zhang, X., Zhang, Y., Mehl, L., Gross, M., Schroers, C.: Guardians of the hair: Rescuing soft boundaries in depth, stereo, and novel views. In: Proceedings of the Computer Vision and Pattern Recognition Conference (2026)
work page 2026
-
[59]
arXiv preprint arXiv:2409.07447 (2024)
Zhao, S., Hu, W., Cun, X., Zhang, Y., Li, X., Kong, Z., Gao, X., Niu, M., Shan, Y.: Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos. arXiv preprint arXiv:2409.07447 (2024)
-
[60]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Zhou,J.,Gao,H.,Voleti,V.,Vasishta,A.,Yao,C.H.,Boss,M.,Torr,P.,Rupprecht, C., Jampani, V.: Stable virtual camera: Generative view synthesis with diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12405–12414 (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.