pith. sign in

arxiv: 2605.22020 · v2 · pith:7JV6YE4Onew · submitted 2026-05-21 · 💻 cs.CV

ForeSplat: Optimization-Aware Foresight for Feed-Forward 3D Gaussian Splatting

Pith reviewed 2026-05-25 06:10 UTC · model grok-4.3

classification 💻 cs.CV
keywords feed-forward 3D Gaussian Splattingoptimization-aware trainingMetaGradpredict-then-refineinitialization for optimization3D reconstructionmeta-gradient
0
0 comments X

The pith

ForeSplat trains feed-forward 3D Gaussian Splatting models to output initializations that converge faster and reach higher quality under subsequent optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Feed-forward 3DGS models produce fast single-pass reconstructions but fall short of per-scene optimization quality because they are trained only to minimize immediate rendering error. ForeSplat adds an optimization-aware training stage that explicitly prepares the network output to serve as a good starting point for the downstream 3DGS optimizer. The key device is MetaGrad, which runs a short inner refinement loop, samples several anchor states along that trajectory, and sends aggregated first-order gradients back to the prediction network. This signal lets the feed-forward model offload part of the scene-modeling work to the optimizer, so even compact networks can produce usable high-fidelity results after only a few refinement steps. Experiments across multiple backbones show the ForeSplat initialization requires fewer refinement iterations and attains higher final quality than a conventionally trained counterpart, even when the latter is allowed to run to full convergence.

Core claim

ForeSplat equips feed-forward 3DGS models with an optimization-aware training signal via MetaGrad. MetaGrad unrolls a short inner-loop refinement trajectory, samples anchor states along that trajectory, and back-propagates aggregated first-order gradients to the prediction head as a surrogate signal. The resulting initializations converge in fewer refinement steps and reach higher peak reconstruction quality than vanilla training, even when the vanilla model is allowed to converge fully. The fine-tuning adds no cost at inference time.

What carries the argument

MetaGrad, a multi-anchor meta-gradient rule that unrolls a short inner refinement trajectory, samples anchor states, and aggregates first-order gradients to supply an optimization-aware training signal to the feed-forward prediction head.

If this is right

  • A ForeSplat-trained initialization converges in fewer refinement steps than its vanilla counterpart.
  • It reaches higher final reconstruction quality than a vanilla model even after the latter has converged fully.
  • Compact networks can still deliver high-fidelity results because part of the modeling burden is shifted to the optimizer.
  • The added training stage incurs no extra cost at inference time.
  • The same framework can be applied to multiple different feed-forward backbones without changing their inference architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same surrogate-gradient idea could be tested on other amortized reconstruction pipelines that are followed by a test-time optimizer.
  • If the short-trajectory assumption holds more broadly, similar meta-gradient rules might reduce the data requirements for training feed-forward models in other inverse problems.
  • Edge-deployed distilled variants become more practical because the method explicitly tolerates smaller network capacity.

Load-bearing premise

Sampling states along a short inner refinement trajectory and back-propagating aggregated first-order gradients produces a training signal that improves the quality of the predicted initialization for the downstream optimizer.

What would settle it

An experiment that applies the same number of refinement steps to both ForeSplat and vanilla initializations and finds no consistent advantage in convergence speed or final PSNR/SSIM would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.22020 by Cheng Zhang, Haoyu Wu, Jiadi Cui, Jingyi Yu, Junran Ding, Weihang Liu, Xin Lou, Yuefeng Zhang, Yujiao Shi, Yuke Li, Zixuan Wang.

Figure 1
Figure 1. Figure 1: ForeSplat equips feed-forward 3D Gaussian Splatting with an optimization-aware training objective, making its predictions [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ForeSplat. Given a set of uncalibrated input images, a feed-forward 3DGS model predicts initial Gaussians. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison between native K-step unrolling and Meta￾Grad. Top: native K-step unrolling for second-order meta￾learning differentiates through the full refinement trajectory, re￾quiring sequential back-propagation across all inner steps and ac￾cumulating ill-conditioned gradients. Bottom: MetaGrad samples sparse anchor states along the trajectory and routes their first-order surrogate gradients directly back… view at source ↗
Figure 4
Figure 4. Figure 4: Per-scene PSNR gain at 2,000 post-optimization steps. For each backbone, each point denotes one evaluation scene and the vertical axis reports PSNRMetaGrad−PSNRvanilla at 2,000 post￾optimization steps. Box plots summarize the scene-level distribu￾tion, and the inset statistics report the mean, median, and positive￾scene ratio. in feed-forward architecture, parameter count, and training data, the direction … view at source ↗
Figure 7
Figure 7. Figure 7: A lightweight capture-and-refine camera that produces [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of vanilla and MetaGrad before and after post-optimization. MetaGrad starts from weaker zero-step [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison of 3D reconstruction quality between AnySplat and our ForeSplat (ours). The upper block shows two [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Post-optimization trajectories of the three backbones. (a) AnySplat, (b) Pi3X, and (c) Distill Pi3X. Within each row, we report [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effect of λ on the three backbones. (a) AnySplat, (b) Pi3X, and (c) Distill Pi3X. Within each row, we report PSNR, SSIM, and LPIPS measured at 2,000 post-optimization steps as a function of the loss-balancing coefficient λ ∈ {0.0, 0.25, 0.5, 0.75, 1.0}. The two endpoints correspond to vanilla supervised fine-tuning (λ = 1.0) and a pure-meta variant (λ = 0.0), respectively. Across all three backbones, the b… view at source ↗
read the original abstract

Feed-forward 3D Gaussian Splatting models offer fast single-pass reconstruction,but scaling them to match per-scene optimization quality is fundamentally hindered by the scarcity of large-scale 3D annotations. A practical compromise is predict-then-refine,where post-prediction optimization compensates for the limited capacity of the feed-forward network. However,standard feed-forward 3DGS is trained solely for zero-step rendering error,ignoring whether its output constitutes a good initialization for the downstream optimizer. We present ForeSplat,an optimization-aware training framework that equips feed-forward 3DGS models to produce initializations explicitly designed for rapid,effective refinement. By offloading part of the scene-modeling burden to the optimizer,ForeSplat substantially reduces the capacity pressure on the feed-forward model,making high-quality reconstruction feasible even with compact networks. At its core is MetaGrad,a lightweight multi-anchor meta-gradient training rule that bypasses costly higher-order differentiation through the 3DGS optimizer. MetaGrad unrolls a short inner-loop refinement trajectory,samples anchor states,and back-propagates aggregated first-order gradients to the prediction head as a surrogate optimization-aware signal. This fine-tuning adds no inference cost and enables high-quality reconstruction within seconds after a few refinement steps. We instantiate ForeSplat on diverse backbones,including AnySplat,Pi3X,and a distilled variant tailored for edge deployment. Across all tested architectures,a ForeSplat-trained initialization converges in fewer refinement steps and reaches a higher peak reconstruction quality than its vanilla counterpart,even fully converged. The framework consistently bridges the gap between amortized prediction and per-scene optimization,establishing a practical path toward lightweight,high-fidelity 3D reconstruction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents ForeSplat, an optimization-aware training framework for feed-forward 3D Gaussian Splatting models. It introduces MetaGrad, a lightweight meta-gradient rule that unrolls a short inner-loop refinement trajectory, samples anchor states along it, and back-propagates aggregated first-order gradients to the feed-forward prediction head. This produces initializations explicitly suited for downstream per-scene optimization. The central claim is that ForeSplat-trained initializations converge in fewer steps and reach higher final reconstruction quality than vanilla counterparts across tested backbones (AnySplat, Pi3X, distilled edge variant), even after full convergence of the identical 3DGS optimizer.

Significance. If the empirical claims hold, the work would meaningfully advance practical 3D reconstruction by allowing compact feed-forward networks to offload capacity demands to a few refinement steps while still matching or exceeding per-scene optimization quality. The avoidance of higher-order differentiation via first-order aggregation is a pragmatic engineering contribution that could enable faster iteration in real-time or resource-constrained settings.

major comments (2)
  1. [Abstract / MetaGrad description] Abstract / MetaGrad description: the claim that ForeSplat initializations reach strictly higher peak quality 'even fully converged' is load-bearing for the paper's contribution, yet rests on the unexamined assumption that aggregated first-order gradients from a short unroll length constitute a faithful proxy for the long-horizon basin of the 3DGS optimizer. No analysis, ablation, or diagnostic is supplied showing that curvature or saddle structure visible only after many more steps does not cause the surrogate to optimize early-trajectory speed at the expense of the final attractor.
  2. [Experimental validation] Experimental validation: the abstract asserts superiority 'across all tested architectures' with faster convergence and higher final quality, but supplies no quantitative results, tables, figures, error bars, or specific metrics (PSNR/SSIM deltas after full convergence, step counts, statistical tests). Without these, the central claim cannot be evaluated for effect size or robustness.
minor comments (1)
  1. [Abstract] The abstract is dense; separating the problem statement, MetaGrad mechanism, and empirical claims into shorter sentences would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / MetaGrad description] the claim that ForeSplat initializations reach strictly higher peak quality 'even fully converged' is load-bearing for the paper's contribution, yet rests on the unexamined assumption that aggregated first-order gradients from a short unroll length constitute a faithful proxy for the long-horizon basin of the 3DGS optimizer. No analysis, ablation, or diagnostic is supplied showing that curvature or saddle structure visible only after many more steps does not cause the surrogate to optimize early-trajectory speed at the expense of the final attractor.

    Authors: We acknowledge that the manuscript does not provide an explicit diagnostic or ablation examining long-horizon curvature or saddle points beyond the chosen unroll length. The central evidence remains the empirical observation that ForeSplat initializations reach higher final quality after identical full convergence of the 3DGS optimizer. In revision we will add a dedicated paragraph discussing the rationale for the short unroll, its practical limitations as a surrogate, and any observed sensitivity to unroll length. revision: partial

  2. Referee: [Experimental validation] the abstract asserts superiority 'across all tested architectures' with faster convergence and higher final quality, but supplies no quantitative results, tables, figures, error bars, or specific metrics (PSNR/SSIM deltas after full convergence, step counts, statistical tests). Without these, the central claim cannot be evaluated for effect size or robustness.

    Authors: The full manuscript contains the requested quantitative results, tables, convergence plots, and per-architecture metrics in the experimental section. The abstract, however, summarizes these findings without specific numbers. We will revise the abstract to report key effect sizes (e.g., average PSNR/SSIM deltas at convergence and step-count reductions) together with references to the corresponding tables and figures. revision: yes

Circularity Check

0 steps flagged

No circularity; MetaGrad surrogate is an independent training signal

full rationale

The paper's core contribution is MetaGrad, which constructs a training signal for the feed-forward head by unrolling a short inner-loop trajectory of the 3DGS optimizer, sampling anchors, and aggregating first-order gradients. This is a deliberate methodological choice that does not reduce to a self-definition, a fitted parameter renamed as prediction, or a self-citation chain. The claimed benefit (faster convergence and higher final quality) is presented as an empirical outcome of this surrogate rather than being true by construction of the inputs. No load-bearing self-citations or ansatz smuggling appear in the provided description. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no concrete free parameters, axioms, or invented entities can be extracted or verified.

pith-pipeline@v0.9.0 · 5882 in / 1096 out tokens · 48876 ms · 2026-05-25T06:10:17.948172+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 4 internal anchors

  1. [1]

    Learning to learn by gradient descent by gradient descent

    Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. 2016. 3

  2. [2]

    How to train your MAML

    Antreas Antoniou, Harrison Edwards, and Amos Storkey. How to train your MAML.arXiv preprint arXiv:1810.09502,

  3. [3]

    Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 5855–5864, 2022. 3

  4. [4]

    ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data

    Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, et al. ARKitScenes: A Diverse Real-World Dataset for 3D Indoor Scene Understanding Us- ing Mobile RGB-D Data.arXiv preprint arXiv:2111.08897,

  5. [5]

    pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 19457–19467, 2024. 2, 3

  6. [6]

    MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3

  7. [7]

    InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024

    Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024. 2, 3, 7

  8. [8]

    Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works. InProc. Int. Conf. Mach. Learn., pages 1126–1135,

  9. [9]

    Yang Fu, Xiaolong Wang, Sifei Liu, Amey Kulkarni, Jan Kautz, and Alexei A. Efros. COLMAP-Free 3D Gaussian Splatting. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 20796–20805, 2024. 3

  10. [10]

    PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

    Yijing Guo, Mengjun Chao, Luo Wang, Tianyang Zhao, Haizhao Dai, Yingliang Zhang, Jingyi Yu, and Yujiao Shi. PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2026. 3

  11. [11]

    Robust Stochastically-Descending Unrolled Net- works.IEEE Trans

    Samar Hadou, Navid NaderiAlizadeh, and Alejandro Ribeiro. Robust Stochastically-Descending Unrolled Net- works.IEEE Trans. Image Process., 72:5484–5499, 2024. 2

  12. [12]

    A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

    James Harrison, Luke Metz, and Jascha Sohl-Dickstein. A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases. InProc. Adv. Neural Inform. Process. Syst., 2022. 2

  13. [13]

    LRM: Large Reconstruction Model for Single Image to 3D, 2024

    Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large Reconstruction Model for Single Image to 3D, 2024. 3

  14. [14]

    MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data

    Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Ji- uxiang Gu, Qixing Huang, Georgios Pavlakos, and Hao Tan. MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data. InProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recog., 2025. 2

  15. [15]

    AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans

    Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans. Graph., 44(6):1–16,

  16. [16]

    3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph., 42(4), 2023. 3, 4

  17. [17]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization.CoRR, abs/1412.6980, 2014. 3

  18. [18]

    Ground- ing Image Matching in 3D with MASt3R

    Vincent Leroy, Yohann Cabon, and Jerome Revaud. Ground- ing Image Matching in 3D with MASt3R. InProc. Eur. Conf. Comput. Vis., 2024. 3

  19. [19]

    MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond

    Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond. InProc. IEEE Int. Conf. Comput. Vis., pages 3205– 3215, 2023. 7

  20. [20]

    BARF: Bundle-Adjusting Neural Radiance Fields

    Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. BARF: Bundle-Adjusting Neural Radiance Fields. InProc. IEEE Int. Conf. Comput. Vis., 2021. 3

  21. [21]

    DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

    Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., pages 22160–22169, 2024. 7

  22. [22]

    MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo

    Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, and Ziwei Liu. MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3

  23. [23]

    Content-Aware Radiance Fields: Aligning Model Complex- ity with Scene Intricacy Through Learned Bitwidth Quanti- zation

    Weihang Liu, Xue Xian Zheng, Jingyi Yu, and Xin Lou. Content-Aware Radiance Fields: Aligning Model Complex- ity with Scene Intricacy Through Learned Bitwidth Quanti- zation. InProc. Eur. Conf. Comput. Vis., 2024. 3

  24. [24]

    Al- Naffouri, Jingyi Yu, and Xin Lou

    Weihang Liu, Xue Xian Zheng, Yuke Li, Tareq Y . Al- Naffouri, Jingyi Yu, and Xin Lou. CoARF++: Content- Aware Radiance Field Aligning Model Complexity With Scene Intricacy.IEEE Trans. Vis. Comput. Graph., pages 1–14, 2025. 3

  25. [25]

    CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians

    Weihang Liu, Yuhui Zhong, Yuke Li, Xi Chen, Jiadi Cui, Honglong Zhang, Lan Xu, Xin Lou, Yujiao Shi, Jingyi Yu, and Yingliang Zhang. CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians. InProc. ACM SIGGRAPH Asia, 2025. 3

  26. [26]

    Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting.IEEE Trans

    Weihang Liu, Yuke Li, Yuxuan Li, Jingyi Yu, and Xin Lou. Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting.IEEE Trans. Circuits Syst. Video Technol., 2026. 3

  27. [27]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InProc. Eur. Conf. Comput. Vis., 2020. 3

  28. [28]

    Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans. Graph., 41(4),

  29. [29]

    On First-Order Meta-Learning Algorithms

    Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms.arXiv preprint arXiv:1803.02999, 2018. 3, 5

  30. [30]

    EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images

    Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonza- lez Bello, Jaeho Moon, Jihyong Oh, and Munchurl Kim. EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

  31. [31]

    Meta-learning with implicit gradients

    Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Proc. Adv. Neural Inform. Process. Syst., 2019. 3

  32. [32]

    Vi- sion Transformers for Dense Prediction.ArXiv preprint,

    Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion Transformers for Dense Prediction.ArXiv preprint,

  33. [33]

    Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

    Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. InProc. IEEE Int. Conf. Comput. Vis., pages 10901–10911, 2021. 7

  34. [34]

    Sch ¨onberger and Jan-Michae Frahm

    Johannes L. Sch ¨onberger and Jan-Michae Frahm. Structure- from-Motion Revisited. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 4104–4113, 2016. 3

  35. [35]

    MetaSDF: Meta-learning Signed Distance Functions.Proc

    Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. MetaSDF: Meta-learning Signed Distance Functions.Proc. Adv. Neural Inform. Pro- cess. Syst., 33:10136–10147, 2020. 3

  36. [36]

    FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024

    Piraveen Sivakumar, Paul Janson, Jathushan Rajasegaran, and Thanuja Ambegoda. FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024. 3

  37. [37]

    Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs

    Brandon Smart, Chuanxia Zheng, Iro Laina, and Vic- tor Adrian Prisacariu. Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs. 2024. 3

  38. [38]

    Seitz, and Richard Szeliski

    Noah Snavely, Steven M. Seitz, and Richard Szeliski. Skele- tal graphs for efficient structure from motion. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1–8,

  39. [39]

    Splatter Image: Ultra-Fast Single-View 3D Recon- struction

    Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter Image: Ultra-Fast Single-View 3D Recon- struction. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3

  40. [40]

    Henriques, Christian Rup- precht, and Andrea Vedaldi

    Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, Jo ˜ao F. Henriques, Christian Rup- precht, and Andrea Vedaldi. Flash3D: Feed-Forward Gener- alisable 3D Scene Reconstruction from a Single Image. In 2025 International Conference on 3D Vision (3DV), 2025. 3

  41. [41]

    Learned Initializations for Optimizing Coordinate- Based Neural Representations

    Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T Barron, and Ren Ng. Learned Initializations for Optimizing Coordinate- Based Neural Representations. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 2846–2855, 2021. 3

  42. [42]

    LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation

    Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation. In Proc. Eur. Conf. Comput. Vis., 2024. 3

  43. [43]

    VGGT: Visual Geometry Grounded Transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, An- drea Vedaldi, Christian Rupprecht, and David Novotny. VGGT: Visual Geometry Grounded Transformer. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

  44. [44]

    Flow-motion and depth network for monocular stereo and beyond.IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020

    Kaixuan Wang and Shaojie Shen. Flow-motion and depth network for monocular stereo and beyond.IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020. 7

  45. [45]

    DUSt3R: Geometric 3D Vision Made Easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D Vision Made Easy. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3

  46. [46]

    TartanAir: A Dataset to Push the Limits of Visual SLAM

    Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Se- bastian Scherer. TartanAir: A Dataset to Push the Limits of Visual SLAM. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4909–4916. IEEE, 2020. 7

  47. [47]

    FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes

    Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes. InProc. Adv. Neural Inform. Process. Syst., 2024. 3

  48. [48]

    YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting

    Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting. InProc. Int. Conf. Learn. Represent., 2026. 3

  49. [49]

    Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chun- hua Shen, and Tong He.π 3: Permutation-Equivariant Visual Geometry Learning. InInt. Conf. Learn. Represent., 2026. 2, 3, 7

  50. [50]

    latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

    Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction. In Proc. Eur. Conf. Comput. Vis., 2024. 3

  51. [51]

    RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos

    Hongchi Xia, Yang Fu, Sifei Liu, and Xiaolong Wang. RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22378–22389, 2024. 7

  52. [52]

    Bingyu Xin, Meng Ye, Leon Axel, and Dimitris N. Metaxas. Rethinking Deep Unrolled Model for Accelerated MRI Re- construction. InProc. Eur. Conf. Comput. Vis., 2024. 2

  53. [53]

    AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024

    Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024. 3

  54. [54]

    DepthSplat: Connecting Gaussian Splatting and Depth

    Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. DepthSplat: Connecting Gaussian Splatting and Depth. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3

  55. [55]

    Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli

    Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3R: Towards 3D Reconstruction of 1000+ Im- ages in One Forward Pass. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., 2025. 3

  56. [56]

    BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks

    Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1790–1799, 2020. 7

  57. [57]

    No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

    Botao Ye, Sifei Liu, Haofei Xu, Li Xueting, Marc Pollefeys, Ming-Hsuan Yang, and Peng Songyou. No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images. InProc. Int. Conf. Learn. Represent.,

  58. [58]

    ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

    Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes. InProc. IEEE Int. Conf. Comput. Vis., pages 12–22, 2023. 7

  59. [59]

    GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

    Xu Yinghao, Shi Zifan, Yifan Wang, Chen Hansheng, Yang Ceyuan, Peng Sida, Shen Yujun, and Wetzstein Gordon. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. InProc. Eur. Conf. Com- put. Vis., 2024. 3

  60. [60]

    pixelNeRF: Neural Radiance Fields from One or Few Im- ages

    Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelNeRF: Neural Radiance Fields from One or Few Im- ages. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog.,

  61. [61]

    GS-LRM: Large Re- construction Model for 3D Gaussian Splatting

    Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. GS-LRM: Large Re- construction Model for 3D Gaussian Splatting. InProc. Eur. Conf. Comput. Vis., 2024. 3

  62. [62]

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. InCVPR, 2018. 16

  63. [63]

    Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats

    Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yi- cong Hong, Li Fuxin, and Zexiang Xu. Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats. InProc. IEEE Int. Conf. Comput. Vis., pages 4349–4359, 2025. 3 ⋯ ⋯ ⋯ ⋯ Input GT Novel View step 0step 2000 AnySplat (vanilla) AnySplat + metagrad Detail Input GT Novel View step...

  64. [64]

    The notation matches Section 3.4

    MetaGrad Pseudocode Algorithm 1 summarizes one iteration of the MetaGrad training rule within the ForeSplat framework on a single training tupleI. The notation matches Section 3.4. ALGORITHM 1:MetaGrad training rule within ForeSplat: one training iteration. Data:TupleI; weightsΘof FF-3DGS network fΘ; host lossL A; max post opt stepK max; anchor stride∆; i...

  65. [65]

    Architecture.The Gaussian head is a lightweight DPT- style [32] decoder grafted onto the frozen Pi3X transformer

    Pi3X Gaussian Head: Architecture and Training Protocol This section details the construction and pre-training of the Gaussian head attached to the Pi3X backbone, which turns Pi3X into the FF-3DGS networkf Θ used throughout Sec- tion 3.4. Architecture.The Gaussian head is a lightweight DPT- style [32] decoder grafted onto the frozen Pi3X transformer. It ta...

  66. [66]

    Distill Pi3X: Architecture and Training Pro- tocol This section details the construction ofDistill Pi3X, the lightweight backbone introduced in Section 4.1. Architecture.Distill Pi3X is obtained by distilling Pi3X—which couples a DINOv2 Large encoder with a36- layer Transformer decoder—into a student that pairs a DI- NOv2 Base encoder with a24-layer decod...

  67. [67]

    The figures visualize the evolution of the metrics over the same2,000-step window summarized in Section 4.2, and provide the completeλsweep trajectories on all three backbones

    Continuous Post-Optimization Trajectories This section complements Sections 4.2 and 4.3 by report- ing the underlying post-optimization trajectories at a finer step resolution. The figures visualize the evolution of the metrics over the same2,000-step window summarized in Section 4.2, and provide the completeλsweep trajectories on all three backbones. Ful...