ForeSplat: Optimization-Aware Foresight for Feed-Forward 3D Gaussian Splatting

Cheng Zhang; Haoyu Wu; Jiadi Cui; Jingyi Yu; Junran Ding; Weihang Liu; Xin Lou; Yuefeng Zhang; Yujiao Shi; Yuke Li

REVIEW 2 major objections 1 minor 67 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

ForeSplat trains feed-forward 3D Gaussian Splatting models to output initializations that converge faster and reach higher quality under subsequent optimization.

2026-05-25 06:10 UTC pith:7JV6YE4O

load-bearing objection MetaGrad gives a practical way to train feed-forward 3DGS predictors as better starting points for refinement, but the abstract supplies zero numbers so the key claims stay untested. the 2 major comments →

arxiv 2605.22020 v2 pith:7JV6YE4O submitted 2026-05-21 cs.CV

ForeSplat: Optimization-Aware Foresight for Feed-Forward 3D Gaussian Splatting

Yuke Li , Weihang Liu , Cheng Zhang , Yuefeng Zhang , Jiadi Cui , Zixuan Wang , Junran Ding , Haoyu Wu

show 3 more authors

Yujiao Shi Jingyi Yu Xin Lou

This is my paper

classification cs.CV

keywords feed-forward 3D Gaussian Splattingoptimization-aware trainingMetaGradpredict-then-refineinitialization for optimization3D reconstructionmeta-gradient

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Feed-forward 3DGS models produce fast single-pass reconstructions but fall short of per-scene optimization quality because they are trained only to minimize immediate rendering error. ForeSplat adds an optimization-aware training stage that explicitly prepares the network output to serve as a good starting point for the downstream 3DGS optimizer. The key device is MetaGrad, which runs a short inner refinement loop, samples several anchor states along that trajectory, and sends aggregated first-order gradients back to the prediction network. This signal lets the feed-forward model offload part of the scene-modeling work to the optimizer, so even compact networks can produce usable high-fidelity results after only a few refinement steps. Experiments across multiple backbones show the ForeSplat initialization requires fewer refinement iterations and attains higher final quality than a conventionally trained counterpart, even when the latter is allowed to run to full convergence.

Core claim

ForeSplat equips feed-forward 3DGS models with an optimization-aware training signal via MetaGrad. MetaGrad unrolls a short inner-loop refinement trajectory, samples anchor states along that trajectory, and back-propagates aggregated first-order gradients to the prediction head as a surrogate signal. The resulting initializations converge in fewer refinement steps and reach higher peak reconstruction quality than vanilla training, even when the vanilla model is allowed to converge fully. The fine-tuning adds no cost at inference time.

What carries the argument

MetaGrad, a multi-anchor meta-gradient rule that unrolls a short inner refinement trajectory, samples anchor states, and aggregates first-order gradients to supply an optimization-aware training signal to the feed-forward prediction head.

Load-bearing premise

Sampling states along a short inner refinement trajectory and back-propagating aggregated first-order gradients produces a training signal that improves the quality of the predicted initialization for the downstream optimizer.

What would settle it

An experiment that applies the same number of refinement steps to both ForeSplat and vanilla initializations and finds no consistent advantage in convergence speed or final PSNR/SSIM would falsify the central claim.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

A ForeSplat-trained initialization converges in fewer refinement steps than its vanilla counterpart.
It reaches higher final reconstruction quality than a vanilla model even after the latter has converged fully.
Compact networks can still deliver high-fidelity results because part of the modeling burden is shifted to the optimizer.
The added training stage incurs no extra cost at inference time.
The same framework can be applied to multiple different feed-forward backbones without changing their inference architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate-gradient idea could be tested on other amortized reconstruction pipelines that are followed by a test-time optimizer.
If the short-trajectory assumption holds more broadly, similar meta-gradient rules might reduce the data requirements for training feed-forward models in other inverse problems.
Edge-deployed distilled variants become more practical because the method explicitly tolerates smaller network capacity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

MetaGrad gives a practical way to train feed-forward 3DGS predictors as better starting points for refinement, but the abstract supplies zero numbers so the key claims stay untested.

read the letter

The new piece is MetaGrad: it unrolls a short inner-loop trajectory on the 3DGS optimizer, samples a few anchor states, and back-propagates aggregated first-order gradients to the prediction head. This avoids expensive higher-order derivatives while giving the feed-forward model an explicit signal to produce inits that the optimizer likes. The rest of ForeSplat is mostly the usual predict-then-refine setup applied to AnySplat, Pi3X, and a distilled edge variant. The practical upside is real: no extra inference cost, and the model can be smaller because some modeling work is handed to the optimizer. That matches a genuine bottleneck when large 3D annotations are scarce. The abstract also claims the ForeSplat inits reach higher final quality than vanilla ones even after full convergence, which would be the interesting result if it holds. The soft spot is obvious and large: the abstract contains no PSNR, SSIM, step counts, or ablation numbers at all. Without those, there is no way to judge whether the short-horizon surrogate actually steers toward better attractors or merely accelerates the first 50-100 iterations. The stress-test worry about curvature that only appears later in the trajectory is therefore still live; the paper would need to show that the final converged quality improves, not just early progress. The method description itself looks internally consistent and does not rely on circular fitting. This is aimed at people already working on feed-forward 3D reconstruction who want to close the gap with per-scene optimization without blowing up model size. A reader who cares about hybrid amortized-plus-refinement pipelines could get something usable from it if the experiments check out. It deserves a serious referee to look at the full results and the exact inner-loop length and anchor choices.

Referee Report

2 major / 1 minor

Summary. The manuscript presents ForeSplat, an optimization-aware training framework for feed-forward 3D Gaussian Splatting models. It introduces MetaGrad, a lightweight meta-gradient rule that unrolls a short inner-loop refinement trajectory, samples anchor states along it, and back-propagates aggregated first-order gradients to the feed-forward prediction head. This produces initializations explicitly suited for downstream per-scene optimization. The central claim is that ForeSplat-trained initializations converge in fewer steps and reach higher final reconstruction quality than vanilla counterparts across tested backbones (AnySplat, Pi3X, distilled edge variant), even after full convergence of the identical 3DGS optimizer.

Significance. If the empirical claims hold, the work would meaningfully advance practical 3D reconstruction by allowing compact feed-forward networks to offload capacity demands to a few refinement steps while still matching or exceeding per-scene optimization quality. The avoidance of higher-order differentiation via first-order aggregation is a pragmatic engineering contribution that could enable faster iteration in real-time or resource-constrained settings.

major comments (2)

[Abstract / MetaGrad description] Abstract / MetaGrad description: the claim that ForeSplat initializations reach strictly higher peak quality 'even fully converged' is load-bearing for the paper's contribution, yet rests on the unexamined assumption that aggregated first-order gradients from a short unroll length constitute a faithful proxy for the long-horizon basin of the 3DGS optimizer. No analysis, ablation, or diagnostic is supplied showing that curvature or saddle structure visible only after many more steps does not cause the surrogate to optimize early-trajectory speed at the expense of the final attractor.
[Experimental validation] Experimental validation: the abstract asserts superiority 'across all tested architectures' with faster convergence and higher final quality, but supplies no quantitative results, tables, figures, error bars, or specific metrics (PSNR/SSIM deltas after full convergence, step counts, statistical tests). Without these, the central claim cannot be evaluated for effect size or robustness.

minor comments (1)

[Abstract] The abstract is dense; separating the problem statement, MetaGrad mechanism, and empirical claims into shorter sentences would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / MetaGrad description] the claim that ForeSplat initializations reach strictly higher peak quality 'even fully converged' is load-bearing for the paper's contribution, yet rests on the unexamined assumption that aggregated first-order gradients from a short unroll length constitute a faithful proxy for the long-horizon basin of the 3DGS optimizer. No analysis, ablation, or diagnostic is supplied showing that curvature or saddle structure visible only after many more steps does not cause the surrogate to optimize early-trajectory speed at the expense of the final attractor.

Authors: We acknowledge that the manuscript does not provide an explicit diagnostic or ablation examining long-horizon curvature or saddle points beyond the chosen unroll length. The central evidence remains the empirical observation that ForeSplat initializations reach higher final quality after identical full convergence of the 3DGS optimizer. In revision we will add a dedicated paragraph discussing the rationale for the short unroll, its practical limitations as a surrogate, and any observed sensitivity to unroll length. revision: partial
Referee: [Experimental validation] the abstract asserts superiority 'across all tested architectures' with faster convergence and higher final quality, but supplies no quantitative results, tables, figures, error bars, or specific metrics (PSNR/SSIM deltas after full convergence, step counts, statistical tests). Without these, the central claim cannot be evaluated for effect size or robustness.

Authors: The full manuscript contains the requested quantitative results, tables, convergence plots, and per-architecture metrics in the experimental section. The abstract, however, summarizes these findings without specific numbers. We will revise the abstract to report key effect sizes (e.g., average PSNR/SSIM deltas at convergence and step-count reductions) together with references to the corresponding tables and figures. revision: yes

Circularity Check

0 steps flagged

No circularity; MetaGrad surrogate is an independent training signal

full rationale

The paper's core contribution is MetaGrad, which constructs a training signal for the feed-forward head by unrolling a short inner-loop trajectory of the 3DGS optimizer, sampling anchors, and aggregating first-order gradients. This is a deliberate methodological choice that does not reduce to a self-definition, a fitted parameter renamed as prediction, or a self-citation chain. The claimed benefit (faster convergence and higher final quality) is presented as an empirical outcome of this surrogate rather than being true by construction of the inputs. No load-bearing self-citations or ansatz smuggling appear in the provided description. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no concrete free parameters, axioms, or invented entities can be extracted or verified.

pith-pipeline@v0.9.0 · 5882 in / 1096 out tokens · 48876 ms · 2026-05-25T06:10:17.948172+00:00 · methodology

0 comments

read the original abstract

Feed-forward 3D Gaussian Splatting models offer fast single-pass reconstruction,but scaling them to match per-scene optimization quality is fundamentally hindered by the scarcity of large-scale 3D annotations. A practical compromise is predict-then-refine,where post-prediction optimization compensates for the limited capacity of the feed-forward network. However,standard feed-forward 3DGS is trained solely for zero-step rendering error,ignoring whether its output constitutes a good initialization for the downstream optimizer. We present ForeSplat,an optimization-aware training framework that equips feed-forward 3DGS models to produce initializations explicitly designed for rapid,effective refinement. By offloading part of the scene-modeling burden to the optimizer,ForeSplat substantially reduces the capacity pressure on the feed-forward model,making high-quality reconstruction feasible even with compact networks. At its core is MetaGrad,a lightweight multi-anchor meta-gradient training rule that bypasses costly higher-order differentiation through the 3DGS optimizer. MetaGrad unrolls a short inner-loop refinement trajectory,samples anchor states,and back-propagates aggregated first-order gradients to the prediction head as a surrogate optimization-aware signal. This fine-tuning adds no inference cost and enables high-quality reconstruction within seconds after a few refinement steps. We instantiate ForeSplat on diverse backbones,including AnySplat,Pi3X,and a distilled variant tailored for edge deployment. Across all tested architectures,a ForeSplat-trained initialization converges in fewer refinement steps and reaches a higher peak reconstruction quality than its vanilla counterpart,even fully converged. The framework consistently bridges the gap between amortized prediction and per-scene optimization,establishing a practical path toward lightweight,high-fidelity 3D reconstruction.

Figures

Figures reproduced from arXiv: 2605.22020 by Cheng Zhang, Haoyu Wu, Jiadi Cui, Jingyi Yu, Junran Ding, Weihang Liu, Xin Lou, Yuefeng Zhang, Yujiao Shi, Yuke Li, Zixuan Wang.

**Figure 1.** Figure 1: ForeSplat equips feed-forward 3D Gaussian Splatting with an optimization-aware training objective, making its predictions [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of ForeSplat. Given a set of uncalibrated input images, a feed-forward 3DGS model predicts initial Gaussians. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison between native K-step unrolling and MetaGrad. Top: native K-step unrolling for second-order metalearning differentiates through the full refinement trajectory, requiring sequential back-propagation across all inner steps and accumulating ill-conditioned gradients. Bottom: MetaGrad samples sparse anchor states along the trajectory and routes their first-order surrogate gradients directly back… view at source ↗

**Figure 4.** Figure 4: Per-scene PSNR gain at 2,000 post-optimization steps. For each backbone, each point denotes one evaluation scene and the vertical axis reports PSNRMetaGrad−PSNRvanilla at 2,000 postoptimization steps. Box plots summarize the scene-level distribution, and the inset statistics report the mean, median, and positivescene ratio. in feed-forward architecture, parameter count, and training data, the direction … view at source ↗

**Figure 7.** Figure 7: A lightweight capture-and-refine camera that produces [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of vanilla and MetaGrad before and after post-optimization. MetaGrad starts from weaker zero-step [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of 3D reconstruction quality between AnySplat and our ForeSplat (ours). The upper block shows two [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 8.** Figure 8: Post-optimization trajectories of the three backbones. (a) AnySplat, (b) Pi3X, and (c) Distill Pi3X. Within each row, we report [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Effect of λ on the three backbones. (a) AnySplat, (b) Pi3X, and (c) Distill Pi3X. Within each row, we report PSNR, SSIM, and LPIPS measured at 2,000 post-optimization steps as a function of the loss-balancing coefficient λ ∈ {0.0, 0.25, 0.5, 0.75, 1.0}. The two endpoints correspond to vanilla supervised fine-tuning (λ = 1.0) and a pure-meta variant (λ = 0.0), respectively. Across all three backbones, the b… view at source ↗

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 4 internal anchors

[1]

Learning to learn by gradient descent by gradient descent

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. 2016. 3

work page 2016
[2]

How to train your MAML

Antreas Antoniou, Harrison Edwards, and Amos Storkey. How to train your MAML.arXiv preprint arXiv:1810.09502,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 5855–5864, 2022. 3

work page 2022
[4]

ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data

Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, et al. ARKitScenes: A Diverse Real-World Dataset for 3D Indoor Scene Understanding Us- ing Mobile RGB-D Data.arXiv preprint arXiv:2111.08897,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 19457–19467, 2024. 2, 3

work page 2024
[6]

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3

work page 2024
[7]

InstantSplat: Sparse-view Gaussian Splatting in Seconds

Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024. 2, 3, 7

work page Pith review arXiv 2024
[8]

Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works. InProc. Int. Conf. Mach. Learn., pages 1126–1135,

work page
[9]

Yang Fu, Xiaolong Wang, Sifei Liu, Amey Kulkarni, Jan Kautz, and Alexei A. Efros. COLMAP-Free 3D Gaussian Splatting. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 20796–20805, 2024. 3

work page 2024
[10]

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Yijing Guo, Mengjun Chao, Luo Wang, Tianyang Zhao, Haizhao Dai, Yingliang Zhang, Jingyi Yu, and Yujiao Shi. PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2026. 3

work page 2026
[11]

Robust Stochastically-Descending Unrolled Net- works.IEEE Trans

Samar Hadou, Navid NaderiAlizadeh, and Alejandro Ribeiro. Robust Stochastically-Descending Unrolled Net- works.IEEE Trans. Image Process., 72:5484–5499, 2024. 2

work page 2024
[12]

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

James Harrison, Luke Metz, and Jascha Sohl-Dickstein. A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases. InProc. Adv. Neural Inform. Process. Syst., 2022. 2

work page 2022
[13]

LRM: Large Reconstruction Model for Single Image to 3D, 2024

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large Reconstruction Model for Single Image to 3D, 2024. 3

work page 2024
[14]

MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data

Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Ji- uxiang Gu, Qixing Huang, Georgios Pavlakos, and Hao Tan. MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data. InProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recog., 2025. 2

work page 2025
[15]

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans

Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans. Graph., 44(6):1–16,

work page
[16]

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph., 42(4), 2023. 3, 4

work page 2023
[17]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization.CoRR, abs/1412.6980, 2014. 3

work page internal anchor Pith review Pith/arXiv arXiv 2014
[18]

Ground- ing Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, and Jerome Revaud. Ground- ing Image Matching in 3D with MASt3R. InProc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024
[19]

MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond

Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond. InProc. IEEE Int. Conf. Comput. Vis., pages 3205– 3215, 2023. 7

work page 2023
[20]

BARF: Bundle-Adjusting Neural Radiance Fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. BARF: Bundle-Adjusting Neural Radiance Fields. InProc. IEEE Int. Conf. Comput. Vis., 2021. 3

work page 2021
[21]

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., pages 22160–22169, 2024. 7

work page 2024
[22]

MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo

Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, and Ziwei Liu. MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3

work page 2024
[23]

Content-Aware Radiance Fields: Aligning Model Complex- ity with Scene Intricacy Through Learned Bitwidth Quanti- zation

Weihang Liu, Xue Xian Zheng, Jingyi Yu, and Xin Lou. Content-Aware Radiance Fields: Aligning Model Complex- ity with Scene Intricacy Through Learned Bitwidth Quanti- zation. InProc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024
[24]

Al- Naffouri, Jingyi Yu, and Xin Lou

Weihang Liu, Xue Xian Zheng, Yuke Li, Tareq Y . Al- Naffouri, Jingyi Yu, and Xin Lou. CoARF++: Content- Aware Radiance Field Aligning Model Complexity With Scene Intricacy.IEEE Trans. Vis. Comput. Graph., pages 1–14, 2025. 3

work page 2025
[25]

CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians

Weihang Liu, Yuhui Zhong, Yuke Li, Xi Chen, Jiadi Cui, Honglong Zhang, Lan Xu, Xin Lou, Yujiao Shi, Jingyi Yu, and Yingliang Zhang. CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians. InProc. ACM SIGGRAPH Asia, 2025. 3

work page 2025
[26]

Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting.IEEE Trans

Weihang Liu, Yuke Li, Yuxuan Li, Jingyi Yu, and Xin Lou. Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting.IEEE Trans. Circuits Syst. Video Technol., 2026. 3

work page 2026
[27]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InProc. Eur. Conf. Comput. Vis., 2020. 3

work page 2020
[28]

Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans. Graph., 41(4),

work page
[29]

On First-Order Meta-Learning Algorithms

Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms.arXiv preprint arXiv:1803.02999, 2018. 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonza- lez Bello, Jaeho Moon, Jihyong Oh, and Munchurl Kim. EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

work page 2025
[31]

Meta-learning with implicit gradients

Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Proc. Adv. Neural Inform. Process. Syst., 2019. 3

work page 2019
[32]

Vi- sion Transformers for Dense Prediction.ArXiv preprint,

Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion Transformers for Dense Prediction.ArXiv preprint,

work page
[33]

Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. InProc. IEEE Int. Conf. Comput. Vis., pages 10901–10911, 2021. 7

work page 2021
[34]

Sch ¨onberger and Jan-Michae Frahm

Johannes L. Sch ¨onberger and Jan-Michae Frahm. Structure- from-Motion Revisited. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 4104–4113, 2016. 3

work page 2016
[35]

MetaSDF: Meta-learning Signed Distance Functions.Proc

Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. MetaSDF: Meta-learning Signed Distance Functions.Proc. Adv. Neural Inform. Pro- cess. Syst., 33:10136–10147, 2020. 3

work page 2020
[36]

FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024

Piraveen Sivakumar, Paul Janson, Jathushan Rajasegaran, and Thanuja Ambegoda. FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024. 3

work page 2024
[37]

Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs

Brandon Smart, Chuanxia Zheng, Iro Laina, and Vic- tor Adrian Prisacariu. Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs. 2024. 3

work page 2024
[38]

Seitz, and Richard Szeliski

Noah Snavely, Steven M. Seitz, and Richard Szeliski. Skele- tal graphs for efficient structure from motion. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1–8,

work page
[39]

Splatter Image: Ultra-Fast Single-View 3D Recon- struction

Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter Image: Ultra-Fast Single-View 3D Recon- struction. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3

work page 2024
[40]

Henriques, Christian Rup- precht, and Andrea Vedaldi

Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, Jo ˜ao F. Henriques, Christian Rup- precht, and Andrea Vedaldi. Flash3D: Feed-Forward Gener- alisable 3D Scene Reconstruction from a Single Image. In 2025 International Conference on 3D Vision (3DV), 2025. 3

work page 2025
[41]

Learned Initializations for Optimizing Coordinate- Based Neural Representations

Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T Barron, and Ren Ng. Learned Initializations for Optimizing Coordinate- Based Neural Representations. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 2846–2855, 2021. 3

work page 2021
[42]

LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation. In Proc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024
[43]

VGGT: Visual Geometry Grounded Transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, An- drea Vedaldi, Christian Rupprecht, and David Novotny. VGGT: Visual Geometry Grounded Transformer. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

work page 2025
[44]

Flow-motion and depth network for monocular stereo and beyond.IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020

Kaixuan Wang and Shaojie Shen. Flow-motion and depth network for monocular stereo and beyond.IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020. 7

work page 2020
[45]

DUSt3R: Geometric 3D Vision Made Easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D Vision Made Easy. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3

work page 2024
[46]

TartanAir: A Dataset to Push the Limits of Visual SLAM

Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Se- bastian Scherer. TartanAir: A Dataset to Push the Limits of Visual SLAM. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4909–4916. IEEE, 2020. 7

work page 2020
[47]

FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes

Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes. InProc. Adv. Neural Inform. Process. Syst., 2024. 3

work page 2024
[48]

YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting

Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting. InProc. Int. Conf. Learn. Represent., 2026. 3

work page 2026
[49]

Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chun- hua Shen, and Tong He.π 3: Permutation-Equivariant Visual Geometry Learning. InInt. Conf. Learn. Represent., 2026. 2, 3, 7

work page 2026
[50]

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction. In Proc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024
[51]

RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos

Hongchi Xia, Yang Fu, Sifei Liu, and Xiaolong Wang. RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22378–22389, 2024. 7

work page 2024
[52]

Bingyu Xin, Meng Ye, Leon Axel, and Dimitris N. Metaxas. Rethinking Deep Unrolled Model for Accelerated MRI Re- construction. InProc. Eur. Conf. Comput. Vis., 2024. 2

work page 2024
[53]

AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024

Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024. 3

work page arXiv 2024
[54]

DepthSplat: Connecting Gaussian Splatting and Depth

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. DepthSplat: Connecting Gaussian Splatting and Depth. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

work page 2025
[55]

Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli

Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3R: Towards 3D Reconstruction of 1000+ Im- ages in One Forward Pass. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., 2025. 3

work page 2025
[56]

BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks

Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1790–1799, 2020. 7

work page 2020
[57]

No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

Botao Ye, Sifei Liu, Haofei Xu, Li Xueting, Marc Pollefeys, Ming-Hsuan Yang, and Peng Songyou. No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images. InProc. Int. Conf. Learn. Represent.,

work page
[58]

ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes. InProc. IEEE Int. Conf. Comput. Vis., pages 12–22, 2023. 7

work page 2023
[59]

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Xu Yinghao, Shi Zifan, Yifan Wang, Chen Hansheng, Yang Ceyuan, Peng Sida, Shen Yujun, and Wetzstein Gordon. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. InProc. Eur. Conf. Com- put. Vis., 2024. 3

work page 2024
[60]

pixelNeRF: Neural Radiance Fields from One or Few Im- ages

Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelNeRF: Neural Radiance Fields from One or Few Im- ages. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog.,

work page
[61]

GS-LRM: Large Re- construction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. GS-LRM: Large Re- construction Model for 3D Gaussian Splatting. InProc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024
[62]

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. InCVPR, 2018. 16

work page 2018
[63]

Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats

Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yi- cong Hong, Li Fuxin, and Zexiang Xu. Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats. InProc. IEEE Int. Conf. Comput. Vis., pages 4349–4359, 2025. 3 ⋯ ⋯ ⋯ ⋯ Input GT Novel View step 0step 2000 AnySplat (vanilla) AnySplat + metagrad Detail Input GT Novel View step...

work page 2025
[64]

The notation matches Section 3.4

MetaGrad Pseudocode Algorithm 1 summarizes one iteration of the MetaGrad training rule within the ForeSplat framework on a single training tupleI. The notation matches Section 3.4. ALGORITHM 1:MetaGrad training rule within ForeSplat: one training iteration. Data:TupleI; weightsΘof FF-3DGS network fΘ; host lossL A; max post opt stepK max; anchor stride∆; i...

work page
[65]

Architecture.The Gaussian head is a lightweight DPT- style [32] decoder grafted onto the frozen Pi3X transformer

Pi3X Gaussian Head: Architecture and Training Protocol This section details the construction and pre-training of the Gaussian head attached to the Pi3X backbone, which turns Pi3X into the FF-3DGS networkf Θ used throughout Sec- tion 3.4. Architecture.The Gaussian head is a lightweight DPT- style [32] decoder grafted onto the frozen Pi3X transformer. It ta...

work page
[66]

Distill Pi3X: Architecture and Training Pro- tocol This section details the construction ofDistill Pi3X, the lightweight backbone introduced in Section 4.1. Architecture.Distill Pi3X is obtained by distilling Pi3X—which couples a DINOv2 Large encoder with a36- layer Transformer decoder—into a student that pairs a DI- NOv2 Base encoder with a24-layer decod...

work page
[67]

The figures visualize the evolution of the metrics over the same2,000-step window summarized in Section 4.2, and provide the completeλsweep trajectories on all three backbones

Continuous Post-Optimization Trajectories This section complements Sections 4.2 and 4.3 by report- ing the underlying post-optimization trajectories at a finer step resolution. The figures visualize the evolution of the metrics over the same2,000-step window summarized in Section 4.2, and provide the completeλsweep trajectories on all three backbones. Ful...

work page

[1] [1]

Learning to learn by gradient descent by gradient descent

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. 2016. 3

work page 2016

[2] [2]

How to train your MAML

Antreas Antoniou, Harrison Edwards, and Amos Storkey. How to train your MAML.arXiv preprint arXiv:1810.09502,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 5855–5864, 2022. 3

work page 2022

[4] [4]

ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data

Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, et al. ARKitScenes: A Diverse Real-World Dataset for 3D Indoor Scene Understanding Us- ing Mobile RGB-D Data.arXiv preprint arXiv:2111.08897,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 19457–19467, 2024. 2, 3

work page 2024

[6] [6]

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3

work page 2024

[7] [7]

InstantSplat: Sparse-view Gaussian Splatting in Seconds

Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024. 2, 3, 7

work page Pith review arXiv 2024

[8] [8]

Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works. InProc. Int. Conf. Mach. Learn., pages 1126–1135,

work page

[9] [9]

Yang Fu, Xiaolong Wang, Sifei Liu, Amey Kulkarni, Jan Kautz, and Alexei A. Efros. COLMAP-Free 3D Gaussian Splatting. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 20796–20805, 2024. 3

work page 2024

[10] [10]

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Yijing Guo, Mengjun Chao, Luo Wang, Tianyang Zhao, Haizhao Dai, Yingliang Zhang, Jingyi Yu, and Yujiao Shi. PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2026. 3

work page 2026

[11] [11]

Robust Stochastically-Descending Unrolled Net- works.IEEE Trans

Samar Hadou, Navid NaderiAlizadeh, and Alejandro Ribeiro. Robust Stochastically-Descending Unrolled Net- works.IEEE Trans. Image Process., 72:5484–5499, 2024. 2

work page 2024

[12] [12]

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

James Harrison, Luke Metz, and Jascha Sohl-Dickstein. A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases. InProc. Adv. Neural Inform. Process. Syst., 2022. 2

work page 2022

[13] [13]

LRM: Large Reconstruction Model for Single Image to 3D, 2024

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large Reconstruction Model for Single Image to 3D, 2024. 3

work page 2024

[14] [14]

MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data

Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Ji- uxiang Gu, Qixing Huang, Georgios Pavlakos, and Hao Tan. MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data. InProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recog., 2025. 2

work page 2025

[15] [15]

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans

Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans. Graph., 44(6):1–16,

work page

[16] [16]

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph., 42(4), 2023. 3, 4

work page 2023

[17] [17]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization.CoRR, abs/1412.6980, 2014. 3

work page internal anchor Pith review Pith/arXiv arXiv 2014

[18] [18]

Ground- ing Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, and Jerome Revaud. Ground- ing Image Matching in 3D with MASt3R. InProc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024

[19] [19]

MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond

Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond. InProc. IEEE Int. Conf. Comput. Vis., pages 3205– 3215, 2023. 7

work page 2023

[20] [20]

BARF: Bundle-Adjusting Neural Radiance Fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. BARF: Bundle-Adjusting Neural Radiance Fields. InProc. IEEE Int. Conf. Comput. Vis., 2021. 3

work page 2021

[21] [21]

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., pages 22160–22169, 2024. 7

work page 2024

[22] [22]

MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo

Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, and Ziwei Liu. MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3

work page 2024

[23] [23]

Content-Aware Radiance Fields: Aligning Model Complex- ity with Scene Intricacy Through Learned Bitwidth Quanti- zation

Weihang Liu, Xue Xian Zheng, Jingyi Yu, and Xin Lou. Content-Aware Radiance Fields: Aligning Model Complex- ity with Scene Intricacy Through Learned Bitwidth Quanti- zation. InProc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024

[24] [24]

Al- Naffouri, Jingyi Yu, and Xin Lou

Weihang Liu, Xue Xian Zheng, Yuke Li, Tareq Y . Al- Naffouri, Jingyi Yu, and Xin Lou. CoARF++: Content- Aware Radiance Field Aligning Model Complexity With Scene Intricacy.IEEE Trans. Vis. Comput. Graph., pages 1–14, 2025. 3

work page 2025

[25] [25]

CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians

Weihang Liu, Yuhui Zhong, Yuke Li, Xi Chen, Jiadi Cui, Honglong Zhang, Lan Xu, Xin Lou, Yujiao Shi, Jingyi Yu, and Yingliang Zhang. CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians. InProc. ACM SIGGRAPH Asia, 2025. 3

work page 2025

[26] [26]

Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting.IEEE Trans

Weihang Liu, Yuke Li, Yuxuan Li, Jingyi Yu, and Xin Lou. Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting.IEEE Trans. Circuits Syst. Video Technol., 2026. 3

work page 2026

[27] [27]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InProc. Eur. Conf. Comput. Vis., 2020. 3

work page 2020

[28] [28]

Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans. Graph., 41(4),

work page

[29] [29]

On First-Order Meta-Learning Algorithms

Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms.arXiv preprint arXiv:1803.02999, 2018. 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonza- lez Bello, Jaeho Moon, Jihyong Oh, and Munchurl Kim. EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

work page 2025

[31] [31]

Meta-learning with implicit gradients

Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Proc. Adv. Neural Inform. Process. Syst., 2019. 3

work page 2019

[32] [32]

Vi- sion Transformers for Dense Prediction.ArXiv preprint,

Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion Transformers for Dense Prediction.ArXiv preprint,

work page

[33] [33]

Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. InProc. IEEE Int. Conf. Comput. Vis., pages 10901–10911, 2021. 7

work page 2021

[34] [34]

Sch ¨onberger and Jan-Michae Frahm

Johannes L. Sch ¨onberger and Jan-Michae Frahm. Structure- from-Motion Revisited. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 4104–4113, 2016. 3

work page 2016

[35] [35]

MetaSDF: Meta-learning Signed Distance Functions.Proc

Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. MetaSDF: Meta-learning Signed Distance Functions.Proc. Adv. Neural Inform. Pro- cess. Syst., 33:10136–10147, 2020. 3

work page 2020

[36] [36]

FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024

Piraveen Sivakumar, Paul Janson, Jathushan Rajasegaran, and Thanuja Ambegoda. FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024. 3

work page 2024

[37] [37]

Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs

Brandon Smart, Chuanxia Zheng, Iro Laina, and Vic- tor Adrian Prisacariu. Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs. 2024. 3

work page 2024

[38] [38]

Seitz, and Richard Szeliski

Noah Snavely, Steven M. Seitz, and Richard Szeliski. Skele- tal graphs for efficient structure from motion. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1–8,

work page

[39] [39]

Splatter Image: Ultra-Fast Single-View 3D Recon- struction

Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter Image: Ultra-Fast Single-View 3D Recon- struction. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3

work page 2024

[40] [40]

Henriques, Christian Rup- precht, and Andrea Vedaldi

Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, Jo ˜ao F. Henriques, Christian Rup- precht, and Andrea Vedaldi. Flash3D: Feed-Forward Gener- alisable 3D Scene Reconstruction from a Single Image. In 2025 International Conference on 3D Vision (3DV), 2025. 3

work page 2025

[41] [41]

Learned Initializations for Optimizing Coordinate- Based Neural Representations

Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T Barron, and Ren Ng. Learned Initializations for Optimizing Coordinate- Based Neural Representations. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 2846–2855, 2021. 3

work page 2021

[42] [42]

LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation. In Proc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024

[43] [43]

VGGT: Visual Geometry Grounded Transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, An- drea Vedaldi, Christian Rupprecht, and David Novotny. VGGT: Visual Geometry Grounded Transformer. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

work page 2025

[44] [44]

Flow-motion and depth network for monocular stereo and beyond.IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020

Kaixuan Wang and Shaojie Shen. Flow-motion and depth network for monocular stereo and beyond.IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020. 7

work page 2020

[45] [45]

DUSt3R: Geometric 3D Vision Made Easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D Vision Made Easy. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3

work page 2024

[46] [46]

TartanAir: A Dataset to Push the Limits of Visual SLAM

Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Se- bastian Scherer. TartanAir: A Dataset to Push the Limits of Visual SLAM. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4909–4916. IEEE, 2020. 7

work page 2020

[47] [47]

FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes

Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes. InProc. Adv. Neural Inform. Process. Syst., 2024. 3

work page 2024

[48] [48]

YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting

Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting. InProc. Int. Conf. Learn. Represent., 2026. 3

work page 2026

[49] [49]

Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chun- hua Shen, and Tong He.π 3: Permutation-Equivariant Visual Geometry Learning. InInt. Conf. Learn. Represent., 2026. 2, 3, 7

work page 2026

[50] [50]

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction. In Proc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024

[51] [51]

RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos

Hongchi Xia, Yang Fu, Sifei Liu, and Xiaolong Wang. RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22378–22389, 2024. 7

work page 2024

[52] [52]

Bingyu Xin, Meng Ye, Leon Axel, and Dimitris N. Metaxas. Rethinking Deep Unrolled Model for Accelerated MRI Re- construction. InProc. Eur. Conf. Comput. Vis., 2024. 2

work page 2024

[53] [53]

AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024

Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024. 3

work page arXiv 2024

[54] [54]

DepthSplat: Connecting Gaussian Splatting and Depth

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. DepthSplat: Connecting Gaussian Splatting and Depth. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

work page 2025

[55] [55]

Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli

Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3R: Towards 3D Reconstruction of 1000+ Im- ages in One Forward Pass. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., 2025. 3

work page 2025

[56] [56]

BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks

Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1790–1799, 2020. 7

work page 2020

[57] [57]

No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

Botao Ye, Sifei Liu, Haofei Xu, Li Xueting, Marc Pollefeys, Ming-Hsuan Yang, and Peng Songyou. No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images. InProc. Int. Conf. Learn. Represent.,

work page

[58] [58]

ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes. InProc. IEEE Int. Conf. Comput. Vis., pages 12–22, 2023. 7

work page 2023

[59] [59]

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Xu Yinghao, Shi Zifan, Yifan Wang, Chen Hansheng, Yang Ceyuan, Peng Sida, Shen Yujun, and Wetzstein Gordon. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. InProc. Eur. Conf. Com- put. Vis., 2024. 3

work page 2024

[60] [60]

pixelNeRF: Neural Radiance Fields from One or Few Im- ages

Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelNeRF: Neural Radiance Fields from One or Few Im- ages. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog.,

work page

[61] [61]

GS-LRM: Large Re- construction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. GS-LRM: Large Re- construction Model for 3D Gaussian Splatting. InProc. Eur. Conf. Comput. Vis., 2024. 3

work page 2024

[62] [62]

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. InCVPR, 2018. 16

work page 2018

[63] [63]

Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats

Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yi- cong Hong, Li Fuxin, and Zexiang Xu. Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats. InProc. IEEE Int. Conf. Comput. Vis., pages 4349–4359, 2025. 3 ⋯ ⋯ ⋯ ⋯ Input GT Novel View step 0step 2000 AnySplat (vanilla) AnySplat + metagrad Detail Input GT Novel View step...

work page 2025

[64] [64]

The notation matches Section 3.4

MetaGrad Pseudocode Algorithm 1 summarizes one iteration of the MetaGrad training rule within the ForeSplat framework on a single training tupleI. The notation matches Section 3.4. ALGORITHM 1:MetaGrad training rule within ForeSplat: one training iteration. Data:TupleI; weightsΘof FF-3DGS network fΘ; host lossL A; max post opt stepK max; anchor stride∆; i...

work page

[65] [65]

Architecture.The Gaussian head is a lightweight DPT- style [32] decoder grafted onto the frozen Pi3X transformer

Pi3X Gaussian Head: Architecture and Training Protocol This section details the construction and pre-training of the Gaussian head attached to the Pi3X backbone, which turns Pi3X into the FF-3DGS networkf Θ used throughout Sec- tion 3.4. Architecture.The Gaussian head is a lightweight DPT- style [32] decoder grafted onto the frozen Pi3X transformer. It ta...

work page

[66] [66]

Distill Pi3X: Architecture and Training Pro- tocol This section details the construction ofDistill Pi3X, the lightweight backbone introduced in Section 4.1. Architecture.Distill Pi3X is obtained by distilling Pi3X—which couples a DINOv2 Large encoder with a36- layer Transformer decoder—into a student that pairs a DI- NOv2 Base encoder with a24-layer decod...

work page

[67] [67]

The figures visualize the evolution of the metrics over the same2,000-step window summarized in Section 4.2, and provide the completeλsweep trajectories on all three backbones

Continuous Post-Optimization Trajectories This section complements Sections 4.2 and 4.3 by report- ing the underlying post-optimization trajectories at a finer step resolution. The figures visualize the evolution of the metrics over the same2,000-step window summarized in Section 4.2, and provide the completeλsweep trajectories on all three backbones. Ful...

work page