ForeSplat: Optimization-Aware Foresight for Feed-Forward 3D Gaussian Splatting
Pith reviewed 2026-05-25 06:10 UTC · model grok-4.3
The pith
ForeSplat trains feed-forward 3D Gaussian Splatting models to output initializations that converge faster and reach higher quality under subsequent optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ForeSplat equips feed-forward 3DGS models with an optimization-aware training signal via MetaGrad. MetaGrad unrolls a short inner-loop refinement trajectory, samples anchor states along that trajectory, and back-propagates aggregated first-order gradients to the prediction head as a surrogate signal. The resulting initializations converge in fewer refinement steps and reach higher peak reconstruction quality than vanilla training, even when the vanilla model is allowed to converge fully. The fine-tuning adds no cost at inference time.
What carries the argument
MetaGrad, a multi-anchor meta-gradient rule that unrolls a short inner refinement trajectory, samples anchor states, and aggregates first-order gradients to supply an optimization-aware training signal to the feed-forward prediction head.
If this is right
- A ForeSplat-trained initialization converges in fewer refinement steps than its vanilla counterpart.
- It reaches higher final reconstruction quality than a vanilla model even after the latter has converged fully.
- Compact networks can still deliver high-fidelity results because part of the modeling burden is shifted to the optimizer.
- The added training stage incurs no extra cost at inference time.
- The same framework can be applied to multiple different feed-forward backbones without changing their inference architecture.
Where Pith is reading between the lines
- The same surrogate-gradient idea could be tested on other amortized reconstruction pipelines that are followed by a test-time optimizer.
- If the short-trajectory assumption holds more broadly, similar meta-gradient rules might reduce the data requirements for training feed-forward models in other inverse problems.
- Edge-deployed distilled variants become more practical because the method explicitly tolerates smaller network capacity.
Load-bearing premise
Sampling states along a short inner refinement trajectory and back-propagating aggregated first-order gradients produces a training signal that improves the quality of the predicted initialization for the downstream optimizer.
What would settle it
An experiment that applies the same number of refinement steps to both ForeSplat and vanilla initializations and finds no consistent advantage in convergence speed or final PSNR/SSIM would falsify the central claim.
Figures
read the original abstract
Feed-forward 3D Gaussian Splatting models offer fast single-pass reconstruction,but scaling them to match per-scene optimization quality is fundamentally hindered by the scarcity of large-scale 3D annotations. A practical compromise is predict-then-refine,where post-prediction optimization compensates for the limited capacity of the feed-forward network. However,standard feed-forward 3DGS is trained solely for zero-step rendering error,ignoring whether its output constitutes a good initialization for the downstream optimizer. We present ForeSplat,an optimization-aware training framework that equips feed-forward 3DGS models to produce initializations explicitly designed for rapid,effective refinement. By offloading part of the scene-modeling burden to the optimizer,ForeSplat substantially reduces the capacity pressure on the feed-forward model,making high-quality reconstruction feasible even with compact networks. At its core is MetaGrad,a lightweight multi-anchor meta-gradient training rule that bypasses costly higher-order differentiation through the 3DGS optimizer. MetaGrad unrolls a short inner-loop refinement trajectory,samples anchor states,and back-propagates aggregated first-order gradients to the prediction head as a surrogate optimization-aware signal. This fine-tuning adds no inference cost and enables high-quality reconstruction within seconds after a few refinement steps. We instantiate ForeSplat on diverse backbones,including AnySplat,Pi3X,and a distilled variant tailored for edge deployment. Across all tested architectures,a ForeSplat-trained initialization converges in fewer refinement steps and reaches a higher peak reconstruction quality than its vanilla counterpart,even fully converged. The framework consistently bridges the gap between amortized prediction and per-scene optimization,establishing a practical path toward lightweight,high-fidelity 3D reconstruction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents ForeSplat, an optimization-aware training framework for feed-forward 3D Gaussian Splatting models. It introduces MetaGrad, a lightweight meta-gradient rule that unrolls a short inner-loop refinement trajectory, samples anchor states along it, and back-propagates aggregated first-order gradients to the feed-forward prediction head. This produces initializations explicitly suited for downstream per-scene optimization. The central claim is that ForeSplat-trained initializations converge in fewer steps and reach higher final reconstruction quality than vanilla counterparts across tested backbones (AnySplat, Pi3X, distilled edge variant), even after full convergence of the identical 3DGS optimizer.
Significance. If the empirical claims hold, the work would meaningfully advance practical 3D reconstruction by allowing compact feed-forward networks to offload capacity demands to a few refinement steps while still matching or exceeding per-scene optimization quality. The avoidance of higher-order differentiation via first-order aggregation is a pragmatic engineering contribution that could enable faster iteration in real-time or resource-constrained settings.
major comments (2)
- [Abstract / MetaGrad description] Abstract / MetaGrad description: the claim that ForeSplat initializations reach strictly higher peak quality 'even fully converged' is load-bearing for the paper's contribution, yet rests on the unexamined assumption that aggregated first-order gradients from a short unroll length constitute a faithful proxy for the long-horizon basin of the 3DGS optimizer. No analysis, ablation, or diagnostic is supplied showing that curvature or saddle structure visible only after many more steps does not cause the surrogate to optimize early-trajectory speed at the expense of the final attractor.
- [Experimental validation] Experimental validation: the abstract asserts superiority 'across all tested architectures' with faster convergence and higher final quality, but supplies no quantitative results, tables, figures, error bars, or specific metrics (PSNR/SSIM deltas after full convergence, step counts, statistical tests). Without these, the central claim cannot be evaluated for effect size or robustness.
minor comments (1)
- [Abstract] The abstract is dense; separating the problem statement, MetaGrad mechanism, and empirical claims into shorter sentences would improve readability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below and indicate planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / MetaGrad description] the claim that ForeSplat initializations reach strictly higher peak quality 'even fully converged' is load-bearing for the paper's contribution, yet rests on the unexamined assumption that aggregated first-order gradients from a short unroll length constitute a faithful proxy for the long-horizon basin of the 3DGS optimizer. No analysis, ablation, or diagnostic is supplied showing that curvature or saddle structure visible only after many more steps does not cause the surrogate to optimize early-trajectory speed at the expense of the final attractor.
Authors: We acknowledge that the manuscript does not provide an explicit diagnostic or ablation examining long-horizon curvature or saddle points beyond the chosen unroll length. The central evidence remains the empirical observation that ForeSplat initializations reach higher final quality after identical full convergence of the 3DGS optimizer. In revision we will add a dedicated paragraph discussing the rationale for the short unroll, its practical limitations as a surrogate, and any observed sensitivity to unroll length. revision: partial
-
Referee: [Experimental validation] the abstract asserts superiority 'across all tested architectures' with faster convergence and higher final quality, but supplies no quantitative results, tables, figures, error bars, or specific metrics (PSNR/SSIM deltas after full convergence, step counts, statistical tests). Without these, the central claim cannot be evaluated for effect size or robustness.
Authors: The full manuscript contains the requested quantitative results, tables, convergence plots, and per-architecture metrics in the experimental section. The abstract, however, summarizes these findings without specific numbers. We will revise the abstract to report key effect sizes (e.g., average PSNR/SSIM deltas at convergence and step-count reductions) together with references to the corresponding tables and figures. revision: yes
Circularity Check
No circularity; MetaGrad surrogate is an independent training signal
full rationale
The paper's core contribution is MetaGrad, which constructs a training signal for the feed-forward head by unrolling a short inner-loop trajectory of the 3DGS optimizer, sampling anchors, and aggregating first-order gradients. This is a deliberate methodological choice that does not reduce to a self-definition, a fitted parameter renamed as prediction, or a self-citation chain. The claimed benefit (faster convergence and higher final quality) is presented as an empirical outcome of this surrogate rather than being true by construction of the inputs. No load-bearing self-citations or ansatz smuggling appear in the provided description. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Learning to learn by gradient descent by gradient descent
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. 2016. 3
work page 2016
-
[2]
Antreas Antoniou, Harrison Edwards, and Amos Storkey. How to train your MAML.arXiv preprint arXiv:1810.09502,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 5855–5864, 2022. 3
work page 2022
-
[4]
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data
Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, et al. ARKitScenes: A Diverse Real-World Dataset for 3D Indoor Scene Understanding Us- ing Mobile RGB-D Data.arXiv preprint arXiv:2111.08897,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelSplat: 3D Gaussian Splats from Im- age Pairs for Scalable Generalizable 3D Reconstruction. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 19457–19467, 2024. 2, 3
work page 2024
-
[6]
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3
work page 2024
-
[7]
InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024
Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024. 2, 3, 7
-
[8]
Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model- Agnostic Meta-Learning for Fast Adaptation of Deep Net- works. InProc. Int. Conf. Mach. Learn., pages 1126–1135,
-
[9]
Yang Fu, Xiaolong Wang, Sifei Liu, Amey Kulkarni, Jan Kautz, and Alexei A. Efros. COLMAP-Free 3D Gaussian Splatting. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 20796–20805, 2024. 3
work page 2024
-
[10]
PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery
Yijing Guo, Mengjun Chao, Luo Wang, Tianyang Zhao, Haizhao Dai, Yingliang Zhang, Jingyi Yu, and Yujiao Shi. PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2026. 3
work page 2026
-
[11]
Robust Stochastically-Descending Unrolled Net- works.IEEE Trans
Samar Hadou, Navid NaderiAlizadeh, and Alejandro Ribeiro. Robust Stochastically-Descending Unrolled Net- works.IEEE Trans. Image Process., 72:5484–5499, 2024. 2
work page 2024
-
[12]
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison, Luke Metz, and Jascha Sohl-Dickstein. A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases. InProc. Adv. Neural Inform. Process. Syst., 2022. 2
work page 2022
-
[13]
LRM: Large Reconstruction Model for Single Image to 3D, 2024
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large Reconstruction Model for Single Image to 3D, 2024. 3
work page 2024
-
[14]
MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data
Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Ji- uxiang Gu, Qixing Huang, Georgios Pavlakos, and Hao Tan. MegaSynth: Scaling Up 3D Scene Reconstruction with Syn- thesized Data. InProc. IEEE/CVF Conf. Comput. Vis. Pat- tern Recog., 2025. 2
work page 2025
-
[15]
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans
Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views.ACM Trans. Graph., 44(6):1–16,
-
[16]
3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph., 42(4), 2023. 3, 4
work page 2023
-
[17]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization.CoRR, abs/1412.6980, 2014. 3
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[18]
Ground- ing Image Matching in 3D with MASt3R
Vincent Leroy, Yohann Cabon, and Jerome Revaud. Ground- ing Image Matching in 3D with MASt3R. InProc. Eur. Conf. Comput. Vis., 2024. 3
work page 2024
-
[19]
MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond
Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. MatrixCity: A Large- Scale City Dataset for City-Scale Neural Rendering and Be- yond. InProc. IEEE Int. Conf. Comput. Vis., pages 3205– 3215, 2023. 7
work page 2023
-
[20]
BARF: Bundle-Adjusting Neural Radiance Fields
Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. BARF: Bundle-Adjusting Neural Radiance Fields. InProc. IEEE Int. Conf. Comput. Vis., 2021. 3
work page 2021
-
[21]
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., pages 22160–22169, 2024. 7
work page 2024
-
[22]
MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo
Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, and Ziwei Liu. MVSGaussian: Fast Generalizable Gaussian Splatting Re- construction from Multi-View Stereo. InProc. Eur. Conf. Comput. Vis., 2024. 2, 3
work page 2024
-
[23]
Weihang Liu, Xue Xian Zheng, Jingyi Yu, and Xin Lou. Content-Aware Radiance Fields: Aligning Model Complex- ity with Scene Intricacy Through Learned Bitwidth Quanti- zation. InProc. Eur. Conf. Comput. Vis., 2024. 3
work page 2024
-
[24]
Al- Naffouri, Jingyi Yu, and Xin Lou
Weihang Liu, Xue Xian Zheng, Yuke Li, Tareq Y . Al- Naffouri, Jingyi Yu, and Xin Lou. CoARF++: Content- Aware Radiance Field Aligning Model Complexity With Scene Intricacy.IEEE Trans. Vis. Comput. Graph., pages 1–14, 2025. 3
work page 2025
-
[25]
CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians
Weihang Liu, Yuhui Zhong, Yuke Li, Xi Chen, Jiadi Cui, Honglong Zhang, Lan Xu, Xin Lou, Yujiao Shi, Jingyi Yu, and Yingliang Zhang. CityGo: Lightweight Urban Model- ing and Rendering with Proxy Buildings and Residual Gaus- sians. InProc. ACM SIGGRAPH Asia, 2025. 3
work page 2025
-
[26]
Weihang Liu, Yuke Li, Yuxuan Li, Jingyi Yu, and Xin Lou. Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting.IEEE Trans. Circuits Syst. Video Technol., 2026. 3
work page 2026
-
[27]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InProc. Eur. Conf. Comput. Vis., 2020. 3
work page 2020
-
[28]
Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans
Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant Neural Graphics Primitives with a Mul- tiresolution Hash Encoding.ACM Trans. Graph., 41(4),
-
[29]
On First-Order Meta-Learning Algorithms
Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms.arXiv preprint arXiv:1803.02999, 2018. 3, 5
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[30]
EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonza- lez Bello, Jaeho Moon, Jihyong Oh, and Munchurl Kim. EcoSplat: Efficiency-controllable Feed-forward 3D Gaus- sian Splatting from Multi-view Images. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3
work page 2025
-
[31]
Meta-learning with implicit gradients
Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Proc. Adv. Neural Inform. Process. Syst., 2019. 3
work page 2019
-
[32]
Vi- sion Transformers for Dense Prediction.ArXiv preprint,
Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion Transformers for Dense Prediction.ArXiv preprint,
-
[33]
Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. InProc. IEEE Int. Conf. Comput. Vis., pages 10901–10911, 2021. 7
work page 2021
-
[34]
Sch ¨onberger and Jan-Michae Frahm
Johannes L. Sch ¨onberger and Jan-Michae Frahm. Structure- from-Motion Revisited. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 4104–4113, 2016. 3
work page 2016
-
[35]
MetaSDF: Meta-learning Signed Distance Functions.Proc
Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. MetaSDF: Meta-learning Signed Distance Functions.Proc. Adv. Neural Inform. Pro- cess. Syst., 33:10136–10147, 2020. 3
work page 2020
-
[36]
FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024
Piraveen Sivakumar, Paul Janson, Jathushan Rajasegaran, and Thanuja Ambegoda. FewShotNeRF: Meta-Learning- based Novel View Synthesis for Rapid Scene-Specific Adap- tation, 2024. 3
work page 2024
-
[37]
Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs
Brandon Smart, Chuanxia Zheng, Iro Laina, and Vic- tor Adrian Prisacariu. Splatt3R: Zero-shot Gaussian Splat- ting from Uncalibrated Image Pairs. 2024. 3
work page 2024
-
[38]
Noah Snavely, Steven M. Seitz, and Richard Szeliski. Skele- tal graphs for efficient structure from motion. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1–8,
-
[39]
Splatter Image: Ultra-Fast Single-View 3D Recon- struction
Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter Image: Ultra-Fast Single-View 3D Recon- struction. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3
work page 2024
-
[40]
Henriques, Christian Rup- precht, and Andrea Vedaldi
Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, Jo ˜ao F. Henriques, Christian Rup- precht, and Andrea Vedaldi. Flash3D: Feed-Forward Gener- alisable 3D Scene Reconstruction from a Single Image. In 2025 International Conference on 3D Vision (3DV), 2025. 3
work page 2025
-
[41]
Learned Initializations for Optimizing Coordinate- Based Neural Representations
Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T Barron, and Ren Ng. Learned Initializations for Optimizing Coordinate- Based Neural Representations. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 2846–2855, 2021. 3
work page 2021
-
[42]
LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. LGM: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation. In Proc. Eur. Conf. Comput. Vis., 2024. 3
work page 2024
-
[43]
VGGT: Visual Geometry Grounded Transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, An- drea Vedaldi, Christian Rupprecht, and David Novotny. VGGT: Visual Geometry Grounded Transformer. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3
work page 2025
-
[44]
Kaixuan Wang and Shaojie Shen. Flow-motion and depth network for monocular stereo and beyond.IEEE Robotics and Automation Letters, 5(2):3307–3314, 2020. 7
work page 2020
-
[45]
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D Vision Made Easy. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024. 3
work page 2024
-
[46]
TartanAir: A Dataset to Push the Limits of Visual SLAM
Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Se- bastian Scherer. TartanAir: A Dataset to Push the Limits of Visual SLAM. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4909–4916. IEEE, 2020. 7
work page 2020
-
[47]
FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes
Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. FreeSplat: Generalizable 3D Gaussian Splatting To- wards Free-View Synthesis of Indoor Scenes. InProc. Adv. Neural Inform. Process. Syst., 2024. 3
work page 2024
-
[48]
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting. InProc. Int. Conf. Learn. Represent., 2026. 3
work page 2026
-
[49]
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chun- hua Shen, and Tong He.π 3: Permutation-Equivariant Visual Geometry Learning. InInt. Conf. Learn. Represent., 2026. 2, 3, 7
work page 2026
-
[50]
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction. In Proc. Eur. Conf. Comput. Vis., 2024. 3
work page 2024
-
[51]
RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos
Hongchi Xia, Yang Fu, Sifei Liu, and Xiaolong Wang. RGBD Objects in the Wild: Scaling Real-World 3D Ob- ject Learning from RGB-D Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22378–22389, 2024. 7
work page 2024
-
[52]
Bingyu Xin, Meng Ye, Leon Axel, and Dimitris N. Metaxas. Rethinking Deep Unrolled Model for Accelerated MRI Re- construction. InProc. Eur. Conf. Comput. Vis., 2024. 2
work page 2024
-
[53]
AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024
Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. AGG: Amor- tized Generative 3D Gaussians for Single Image to 3D.arXiv preprint 2401.04099, 2024. 3
-
[54]
DepthSplat: Connecting Gaussian Splatting and Depth
Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. DepthSplat: Connecting Gaussian Splatting and Depth. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2025. 3
work page 2025
-
[55]
Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli
Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3R: Towards 3D Reconstruction of 1000+ Im- ages in One Forward Pass. InProc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog., 2025. 3
work page 2025
-
[56]
BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks
Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 1790–1799, 2020. 7
work page 2020
-
[57]
No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Botao Ye, Sifei Liu, Haofei Xu, Li Xueting, Marc Pollefeys, Ming-Hsuan Yang, and Peng Songyou. No Pose, No Prob- lem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images. InProc. Int. Conf. Learn. Represent.,
-
[58]
ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes. InProc. IEEE Int. Conf. Comput. Vis., pages 12–22, 2023. 7
work page 2023
-
[59]
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Xu Yinghao, Shi Zifan, Yifan Wang, Chen Hansheng, Yang Ceyuan, Peng Sida, Shen Yujun, and Wetzstein Gordon. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation. InProc. Eur. Conf. Com- put. Vis., 2024. 3
work page 2024
-
[60]
pixelNeRF: Neural Radiance Fields from One or Few Im- ages
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelNeRF: Neural Radiance Fields from One or Few Im- ages. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog.,
-
[61]
GS-LRM: Large Re- construction Model for 3D Gaussian Splatting
Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. GS-LRM: Large Re- construction Model for 3D Gaussian Splatting. InProc. Eur. Conf. Comput. Vis., 2024. 3
work page 2024
-
[62]
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. InCVPR, 2018. 16
work page 2018
-
[63]
Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats
Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yi- cong Hong, Li Fuxin, and Zexiang Xu. Long-LRM: Long- Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats. InProc. IEEE Int. Conf. Comput. Vis., pages 4349–4359, 2025. 3 ⋯ ⋯ ⋯ ⋯ Input GT Novel View step 0step 2000 AnySplat (vanilla) AnySplat + metagrad Detail Input GT Novel View step...
work page 2025
-
[64]
The notation matches Section 3.4
MetaGrad Pseudocode Algorithm 1 summarizes one iteration of the MetaGrad training rule within the ForeSplat framework on a single training tupleI. The notation matches Section 3.4. ALGORITHM 1:MetaGrad training rule within ForeSplat: one training iteration. Data:TupleI; weightsΘof FF-3DGS network fΘ; host lossL A; max post opt stepK max; anchor stride∆; i...
-
[65]
Pi3X Gaussian Head: Architecture and Training Protocol This section details the construction and pre-training of the Gaussian head attached to the Pi3X backbone, which turns Pi3X into the FF-3DGS networkf Θ used throughout Sec- tion 3.4. Architecture.The Gaussian head is a lightweight DPT- style [32] decoder grafted onto the frozen Pi3X transformer. It ta...
-
[66]
Distill Pi3X: Architecture and Training Pro- tocol This section details the construction ofDistill Pi3X, the lightweight backbone introduced in Section 4.1. Architecture.Distill Pi3X is obtained by distilling Pi3X—which couples a DINOv2 Large encoder with a36- layer Transformer decoder—into a student that pairs a DI- NOv2 Base encoder with a24-layer decod...
-
[67]
Continuous Post-Optimization Trajectories This section complements Sections 4.2 and 4.3 by report- ing the underlying post-optimization trajectories at a finer step resolution. The figures visualize the evolution of the metrics over the same2,000-step window summarized in Section 4.2, and provide the completeλsweep trajectories on all three backbones. Ful...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.