Feed-Forward Gaussian Splatting from Sparse Aerial Views
Pith reviewed 2026-05-20 06:47 UTC · model grok-4.3
The pith
AnyCity reconstructs coherent 3D Gaussian urban scenes from sparse aerial views in one feed-forward pass.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AnyCity first predicts an observation-supported geometry latent to anchor reliable structures from the sparse inputs. It then uses scaffold-conditioned aerial completion tokens to predict a gated residual update for weakly constrained content before Gaussian decoding. Training combines dense-to-sparse distillation to transfer structural cues with an aerial-adapted video diffusion prior that supplies fine-grained appearance through gated token conditioning, while observation-preserving objectives ensure the refined representation remains consistent with input-supported geometry. At inference the model produces the final 3D Gaussian scene in a single forward pass.
What carries the argument
Observation-supported geometry latent followed by scaffold-conditioned gated residual update before 3D Gaussian decoding.
If this is right
- Reconstruction completes in seconds rather than minutes or hours for large urban scenes.
- Novel views remain coherent with input geometry and avoid ghosting or stretched textures seen in direct regression baselines.
- The same pipeline works across synthetic, real aerial, UAV-textured, and ground-level scenes without per-scene optimization.
- Observation-preserving losses keep generated content from drifting away from measurable input evidence.
Where Pith is reading between the lines
- The same geometry-first then gated-completion pattern could be tested on other sparse multi-view settings such as street-level or indoor captures.
- Because inference is feed-forward, the model could support real-time rendering loops once the Gaussians are decoded.
- The gated token design might allow targeted style or time-of-day adjustments by swapping the diffusion prior without retraining the geometry stage.
Load-bearing premise
Dense-to-sparse distillation and the gated aerial video diffusion prior can supply missing cues without creating geometry or appearance inconsistencies with the parts directly supported by the input views.
What would settle it
Generate novel views from the output Gaussians and inspect them for floating facades, texture stretching on building sides, or visible seams that contradict the original sparse photos; persistent artifacts would show the separation of observed and generated content has failed.
Figures
read the original abstract
Reconstructing large-scale urban scenes from sparse aerial views is a crucial yet challenging task. Due to biased top-down and shallow-oblique camera poses, sparse aerial captures exhibit strong evidence imbalance: roofs and open regions are repeatedly observed, while facades, distant buildings, and occluded structures receive little multi-view support. Existing feed-forward 3D Gaussian Splatting methods directly regress a deterministic representation from sparse inputs, but this often leads to ghosting, melted facades, and stretched textures. Recent pseudo-view and video-based generative reconstruction methods use additional supervision or generative priors. However, they often lack a clear separation between observed geometry and prior-driven content, which can lead to plausible but inconsistent structures. We propose AnyCity, an observation-grounded generative reconstruction framework for sparse aerial urban scenes. AnyCity first predicts an observation-supported geometry latent to anchor reliable structures, and then uses scaffold-conditioned aerial completion tokens to predict a gated residual update for weakly constrained content before Gaussian decoding. During training, dense-to-sparse distillation transfers structural cues from dense-view reconstruction, while an aerial-adapted video diffusion prior provides fine-grained urban appearance cues through gated token conditioning. Observation-preserving objectives keep the refined representation consistent with input-supported geometry. At inference time, AnyCity reconstructs the final 3D Gaussian scene from sparse aerial views in a single feed-forward pass, achieving coherent urban novel-view synthesis with second-level inference. Experiments on synthetic, aerial-domain, UAV-textured, and real-world scenes show consistent improvements over feed-forward baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes AnyCity, an observation-grounded generative reconstruction framework for sparse aerial urban scenes using 3D Gaussian Splatting. It first predicts an observation-supported geometry latent to anchor reliable structures from sparse aerial views, then uses scaffold-conditioned aerial completion tokens to predict a gated residual update for weakly constrained content before Gaussian decoding. Training uses dense-to-sparse distillation to transfer structural cues and an aerial-adapted video diffusion prior for fine-grained appearance through gated token conditioning, with observation-preserving objectives to maintain consistency. The method claims to enable single feed-forward pass reconstruction with second-level inference, showing consistent improvements over feed-forward baselines on synthetic, aerial, UAV, and real-world scenes.
Significance. If the proposed separation between observation-supported geometry and prior-driven content holds without leakage, the work could be significant for practical applications in large-scale urban 3D reconstruction from sparse aerial captures, addressing the evidence imbalance issue that leads to artifacts in existing methods. The combination of distillation and generative priors in a feed-forward setting is a promising direction, and the fast inference time is a practical strength.
major comments (2)
- Abstract: The abstract asserts 'consistent improvements over feed-forward baselines' on multiple scene types but supplies no quantitative metrics, error analysis, ablation details, or specific numerical comparisons. Without these, the central claims of coherent urban novel-view synthesis and avoidance of ghosting or melted facades cannot be verified.
- Training description paragraph: The framework relies on 'gated token conditioning' from the aerial-adapted video diffusion prior to supply a 'gated residual update' while claiming that 'observation-preserving objectives keep the refined representation consistent with input-supported geometry.' It is not specified whether the gate is hard or learned, nor whether the update is applied at the token level before Gaussian decoding. If the learned gate allows the prior to influence the observation-supported geometry latent, structural changes could propagate to roof and open-region Gaussians despite multi-view support, violating the evidence-imbalance premise.
minor comments (1)
- The abstract refers to 'second-level inference' without specifying the hardware, resolution, or exact timing measurement used to support this claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the practical strengths of our approach for large-scale urban reconstruction. We provide point-by-point responses to the major comments below.
read point-by-point responses
-
Referee: Abstract: The abstract asserts 'consistent improvements over feed-forward baselines' on multiple scene types but supplies no quantitative metrics, error analysis, ablation details, or specific numerical comparisons. Without these, the central claims of coherent urban novel-view synthesis and avoidance of ghosting or melted facades cannot be verified.
Authors: We agree that the abstract, as a concise summary, would be strengthened by including a small number of key quantitative results. In the revised manuscript we will add brief numerical comparisons (e.g., average PSNR/SSIM gains on the synthetic and real-world test sets) while remaining within the abstract length limit. revision: yes
-
Referee: Training description paragraph: The framework relies on 'gated token conditioning' from the aerial-adapted video diffusion prior to supply a 'gated residual update' while claiming that 'observation-preserving objectives keep the refined representation consistent with input-supported geometry.' It is not specified whether the gate is hard or learned, nor whether the update is applied at the token level before Gaussian decoding. If the learned gate allows the prior to influence the observation-supported geometry latent, structural changes could propagate to roof and open-region Gaussians despite multi-view support, violating the evidence-imbalance premise.
Authors: The gate is a learned soft gate realized by a small MLP followed by a sigmoid activation; it modulates only the residual tokens produced for the aerial completion branch. The observation-supported geometry latent is generated in an earlier stage and is held fixed; the residual update is added exclusively to the completion tokens before they enter the Gaussian decoder. Observation-preserving losses further penalize any deviation in well-supported regions. We will expand the methods section with an explicit description of the gate architecture, a diagram illustrating the latent separation, and quantitative gate-activation statistics showing near-zero influence on supported geometry. revision: yes
Circularity Check
No significant circularity; new architecture and training objectives are independent
full rationale
The paper introduces AnyCity as a feed-forward framework that first predicts an observation-supported geometry latent from sparse aerial views, then applies scaffold-conditioned aerial completion tokens for a gated residual update before Gaussian decoding. Training uses dense-to-sparse distillation to transfer structural cues and an aerial-adapted video diffusion prior for appearance via gated token conditioning, with observation-preserving objectives to maintain consistency. No derivation step reduces by construction to its own inputs, no fitted parameters are relabeled as predictions, and no load-bearing claims rest on self-citations or imported uniqueness theorems. The central claims rest on explicitly described novel components and objectives that are not equivalent to the inputs by definition, making the chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Dense-to-sparse distillation transfers structural cues from dense-view reconstruction to sparse inputs.
- domain assumption An aerial-adapted video diffusion prior can supply fine-grained urban appearance cues through gated token conditioning while preserving consistency with observed geometry.
invented entities (2)
-
observation-supported geometry latent
no independent evidence
-
scaffold-conditioned aerial completion tokens
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AnyCity first predicts an observation-supported geometry latent to anchor reliable structures, and then uses scaffold-conditioned aerial completion tokens to predict a gated residual update for weakly constrained content before Gaussian decoding.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
During training, dense-to-sparse distillation transfers structural cues from dense-view reconstruction, while an aerial-adapted video diffusion prior provides fine-grained urban appearance cues through gated token conditioning.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Aligning global semantics and local textures in generative video enhancement
Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, and Tao Mei. Aligning global semantics and local textures in generative video enhancement. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 17087– 17096, 2025. 6
work page 2025
-
[2]
David Eigen and Rob Fergus. Predicting depth, surface normals and semantic labels with a common multi- scale convolutional architecture. InProceedings of the IEEE international conference on computer vision, pages 2650–2658, 2015. 6
work page 2015
-
[3]
Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhi- hang Zhong, Dingwen Zhang, Xiao Sun, and Junwei Han. Citygs-x: A scalable architecture for efficient and geometrically accurate large-scale scene reconstruc- tion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27187–27196,
-
[4]
World recon- struction from inconsistent views.arXiv preprint arXiv:2603.16736, 2026
Lukas Höllein and Matthias Nießner. World recon- struction from inconsistent views.arXiv preprint arXiv:2603.16736, 2026. 3
-
[5]
Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion
Tongyan Hua, Lutao Jiang, Ying-Cong Chen, and Wu- fan Zhao. Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 27978–27988, 2025. 2
work page 2025
-
[6]
Gen3r: 3d scene genera- tion meets feed-forward reconstruction.arXiv preprint arXiv:2601.04090, 2026
Jiaxin Huang, Yuanbo Yang, Bangbang Yang, Lin Ma, Yuewen Ma, and Yiyi Liao. Gen3r: 3d scene genera- tion meets feed-forward reconstruction.arXiv preprint arXiv:2601.04090, 2026. 2, 3
-
[7]
Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025. 2, 3, 6, 14
work page 2025
-
[8]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimküh- ler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023. 2, 3
work page 2023
-
[9]
Gen- erative sparse-view gaussian splatting
Hanyang Kong, Xingyi Yang, and Xinchao Wang. Gen- erative sparse-view gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Con- ference, pages 26745–26755, 2025. 2, 3
work page 2025
-
[10]
arXiv preprint arXiv:2510.21615 (2025) 4
Orest Kupyn, Fabian Manhardt, Federico Tombari, and Christian Rupprecht. Epipolar geometry im- proves video generation models.arXiv preprint arXiv:2510.21615, 2025. 3
-
[11]
Skyfall-gs: Synthe- sizing immersive 3d urban scenes from satellite imagery
Jie-Ying Lee, Yi-Ruei Liu, Shr-Ruei Tsai, Wei-Cheng Chang, Chung-Ho Wu, Jiewen Chan, Zhenjun Zhao, Chieh Hubert Lin, and Yu-Lun Liu. Skyfall-gs: Syn- thesizing immersive 3d urban scenes from satellite im- agery.arXiv preprint arXiv:2510.15869, 2025. 2
-
[12]
Changbai Li, Haodong Zhu, Hanlin Chen, Xiuping Liang, Tongfei Chen, Shuwei Shao, Linlin Yang, Huobin Tan, and Baochang Zhang. Urbangs: A scalable and efficient architecture for geometrically accurate large-scene reconstruction.arXiv preprint arXiv:2602.02089, 2026. 2
-
[13]
Manipdreamer3d: Synthesizing plausible robotic manipulation video with occupancy-aware 3d trajectory
Ying Li, Xiaobao Wei, Xiaowei Chi, Yuming Li, Zhongyu Zhao, Hao Wang, Ningning Ma, Ming Lu, Sirui Han, and Shanghang Zhang. Manipdreamer3d: Synthesizing plausible robotic manipulation video with occupancy-aware 3d trajectory. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI),
-
[14]
Matrixcity: A large-scale city dataset for city-scale neural render- ing and beyond
Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhenzhi Wang, Dahua Lin, and Bo Dai. Matrixcity: A large-scale city dataset for city-scale neural render- ing and beyond. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205–3215, 2023. 2, 5, 13
work page 2023
-
[15]
Wonder- land: Navigating 3d scenes from a single image
Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N Plataniotis, Sergey Tulyakov, and Jian Ren. Wonder- land: Navigating 3d scenes from a single image. In Proceedings of the Computer Vision and Pattern Recog- nition Conference, pages 798–810, 2025. 2, 3
work page 2025
-
[16]
Depth Anything 3: Recovering the Visual Space from Any Views
Haotong Lin, Sili Chen, Junhao Liew, Donny Y Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025. 2, 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Capturing, reconstructing, and simulating: the urbanscene3d dataset
Liqiang Lin, Yilin Liu, Yue Hu, Xingguang Yan, Ke Xie, and Hui Huang. Capturing, reconstructing, and simulating: the urbanscene3d dataset. InECCV,
-
[18]
Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 5
work page 2024
-
[19]
Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, and Zhaoxiang Zhang. Citygaussianv2: Efficient and geometrically accurate reconstruction for large- scale scenes.arXiv preprint arXiv:2411.00771, 2024. 2, 6
-
[20]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101,
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for 9 view synthesis.Communications of the ACM, 65(1): 99–106, 2021. 3
work page 2021
-
[22]
Pixelwise view selection for unstructured multi-view stereo
Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InEuropean con- ference on computer vision, pages 501–518. Springer,
-
[23]
Structure-from-motion revisited
Johannes Lutz Schönberger and Jan-Michael Frahm. Structure-from-motion revisited. InConference on Computer Vision and Pattern Recognition (CVPR),
-
[24]
Lyra 2.0: Explorable Generative 3D Worlds
Tianchang Shen, Sherwin Bahmani, Kai He, Sangeetha Grama Srinivasan, Tianshi Cao, Jiawei Ren, Ruilong Li, Zian Wang, Nicholas Sharp, Zan Gojcic, et al. Lyra 2.0: Explorable generative 3d worlds.arXiv preprint arXiv:2604.13036, 2026. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splat- ting for efficient 3d content creation.arXiv preprint arXiv:2309.16653, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025. 5, 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Hao Wang, Xiaobao Wei, Ying Li, Qingpo Wuwu, Dongli Wu, Jiajun Cao, Ming Lu, Wenzhao Zheng, and Shanghang Zhang. Roboarmgs: High-quality robotic arm splatting via b \’ezier curve refinement.arXiv preprint arXiv:2511.17961, 2025. 3
-
[28]
Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, and Shanghang Zhang. Embodiedocc++: Boosting embod- ied 3d occupancy prediction with plane regularization and uncertainty sampler. InProceedings of the 33rd ACM International Conference on Multimedia, pages 925–934, 2025. 3
work page 2025
-
[29]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, An- drea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InPro- ceedings of the Computer Vision and Pattern Recog- nition Conference, pages 5294–5306, 2025. 2, 3, 6, 12
work page 2025
-
[30]
Chronotailor: Harnessing at- tention guidance for fine-grained video virtual try-on
Jinjuan Wang, Wenzhang Sun, Ming Li, Yun Zheng, Fanyao Li, Zhulin Tao, Donglin Di, Hao Li, Wei Chen, and Xianglin Huang. Chronotailor: Harnessing at- tention guidance for fine-grained video virtual try-on. arXiv preprint arXiv:2506.05858, 2025. 6
-
[31]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20697–20709, 2024. 3
work page 2024
-
[32]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from er- ror visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6
work page 2004
-
[33]
Re- confusion: 3d reconstruction with diffusion priors
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Re- confusion: 3d reconstruction with diffusion priors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21551–21561,
-
[34]
Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering
Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanx- uan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, and Dahua Lin. Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. InEuro- pean conference on computer vision, pages 106–122. Springer, 2022. 2, 7
work page 2022
-
[35]
Citydreamer: Compositional generative model of unbounded 3d cities
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, and Ziwei Liu. Citydreamer: Compositional generative model of unbounded 3d cities. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9666–9675, 2024. 3, 5
work page 2024
-
[36]
Fan Yang, Yousong Zhu, Xin Li, Yufei Zhan, Hongyin Zhao, Shurong Zheng, Yaowei Wang, Ming Tang, and Jinqiao Wang. Focus: Unified vision-language mod- eling for interactive editing driven by referential seg- mentation.arXiv preprint arXiv:2506.16806, 2025. 6
-
[37]
Blended- mvs: A large-scale dataset for generalized multi-view stereo networks
Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blended- mvs: A large-scale dataset for generalized multi-view stereo networks. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 1790–1799, 2020. 5
work page 2020
-
[38]
Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, and Songyou Peng. No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images.arXiv preprint arXiv:2410.24207, 2024. 3, 6
-
[39]
From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images
Fei Yu, Yu Liu, Luyang Tang, Mingchao Sun, Zengye Ge, Rui Bu, Yuchao Jin, Haisen Zhao, He Sun, Yangyan Li, et al. From orbit to ground: Generative city photogrammetry from extreme off-nadir satellite images.arXiv preprint arXiv:2512.07527, 2025. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training dif- fusion transformers is easier than you think.arXiv preprint arXiv:2410.06940, 2024. 6
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[41]
The unreasonable ef- fectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable ef- fectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vi- 10 sion and pattern recognition, pages 586–595, 2018. 6
work page 2018
-
[42]
Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views
Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, and Gordon Wetzstein. Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21936– 21947, 2025. 3, 6 11 A. Detailed progressive training ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.