Recognition: 2 theorem links
· Lean TheoremLearning 3D Reconstruction with Priors in Test Time
Pith reviewed 2026-05-13 16:43 UTC · model grok-4.3
The pith
Test-time optimization lets pre-trained multiview transformers use priors to improve 3D reconstruction without retraining.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Casting priors as soft constraints and jointly minimizing them with a multi-view compatibility objective inside a frozen multiview transformer at test time produces predictions that are markedly more accurate than the network's original feed-forward output, without any change to its weights or architecture.
What carries the argument
Test-time constrained optimization (TCO) that minimizes a composite loss of self-supervised multi-view photometric or geometric consistency plus explicit penalty terms derived from any supplied priors.
If this is right
- Point-map distance error drops by more than half on ETH3D, 7-Scenes, and NRGBD compared with the base multiview transformer.
- The same procedure improves camera-pose estimation accuracy on the same datasets.
- The optimized outputs beat those of prior-aware feed-forward networks that were retrained from scratch.
- No architectural modification or offline retraining is required to incorporate new priors at inference.
Where Pith is reading between the lines
- The approach suggests that view-consistency losses can act as a generic interface for injecting external measurements into any frozen multi-view network.
- Similar test-time refinement could extend to other modalities where priors arrive only after the initial training phase.
- The framework may allow rapid adaptation to new camera rigs or lighting conditions without collecting additional labeled data.
Load-bearing premise
The combined self-supervised and prior-based loss can be optimized reliably at test time for new inputs without divergence, excessive compute, or per-scene hyperparameter tuning.
What would settle it
Applying the test-time optimization to the ETH3D benchmark and finding no reduction, or an increase, in point-map distance error relative to the base image-only model would falsify the claimed performance gains.
Figures
read the original abstract
We introduce a test-time framework for multiview Transformers (MVTs) that incorporates priors (e.g., camera poses, intrinsics, and depth) to improve 3D tasks without retraining or modifying pre-trained image-only networks. Rather than feeding priors into the architecture, we cast them as constraints on the predictions and optimize the network at inference time. The optimization loss consists of a self-supervised objective and prior penalty terms. The self-supervised objective captures the compatibility among multi-view predictions and is implemented using photometric or geometric loss between renderings from other views and each view itself. Any available priors are converted into penalty terms on the corresponding output modalities. Across a series of 3D vision benchmarks, including point map estimation and camera pose estimation, our method consistently improves performance over base MVTs by a large margin. On the ETH3D, 7-Scenes, and NRGBD datasets, our method reduces the point-map distance error by more than half compared with the base image-only models. Our method also outperforms retrained prior-aware feed-forward methods, demonstrating the effectiveness of our test-time constrained optimization (TCO) framework for incorporating priors into 3D vision tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a test-time constrained optimization (TCO) framework for multiview Transformers (MVTs) in 3D tasks. Priors (camera poses, intrinsics, depth) are cast as penalty terms rather than architectural inputs; at inference the frozen network is optimized using a self-supervised multi-view compatibility loss (photometric or geometric) plus the prior penalties. The central claim is that this yields large gains over base image-only MVTs and over retrained prior-aware feed-forward models, including >50% reduction in point-map distance error on ETH3D, 7-Scenes and NRGBD.
Significance. If the optimization procedure is shown to be stable and the gains reproducible, the approach would be significant: it decouples prior incorporation from network architecture and training, offering a practical route to exploit available geometric priors in existing 3D vision models. The comparison to retrained feed-forward baselines is a positive strength.
major comments (3)
- [Abstract] Abstract: the claim that point-map distance error is reduced by more than half on ETH3D, 7-Scenes and NRGBD is presented without any description of the test-time optimization procedure (optimizer, iteration count, learning-rate schedule, convergence criterion, or per-scene hyper-parameter policy). Because the entire method rests on reliable convergence of the joint self-supervised + prior objective, this omission is load-bearing for the central claim.
- [Method] Method description: the self-supervised multi-view compatibility objective is stated only at the level of 'photometric or geometric loss between renderings from other views and each view itself.' No explicit loss equation, weighting schedule between terms, or handling of occlusions / visibility is supplied, preventing assessment of whether the objective is well-behaved or prone to the local minima warned about in the stress-test note.
- [Experiments] Experiments: no ablation isolating the contribution of the self-supervised term versus the prior penalties, no error bars, and no stability analysis across scenes or inputs are reported. Without these, it is impossible to determine whether the reported gains are robust or the result of per-dataset tuning, directly undermining the generalizability asserted in the abstract.
minor comments (2)
- [Abstract] The acronym TCO is used in the abstract before being defined.
- [Method] Notation for the output modalities (point maps, poses) is not introduced consistently before the loss terms are discussed.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We will revise the manuscript to incorporate additional details on the optimization procedure, explicit loss formulation, and experimental ablations to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that point-map distance error is reduced by more than half on ETH3D, 7-Scenes and NRGBD is presented without any description of the test-time optimization procedure (optimizer, iteration count, learning-rate schedule, convergence criterion, or per-scene hyper-parameter policy). Because the entire method rests on reliable convergence of the joint self-supervised + prior objective, this omission is load-bearing for the central claim.
Authors: We agree that the abstract would benefit from a concise description of the test-time optimization to support the central claim. In the revised version, we will add a brief clause noting the use of Adam optimization over a fixed number of iterations (typically 100-200) with a standard learning-rate schedule and convergence based on loss stabilization. Full per-scene hyper-parameter details remain in the supplementary material, but this addition will make the abstract self-contained while preserving its length. The reported gains are based on consistent convergence observed across all evaluated scenes. revision: yes
-
Referee: [Method] Method description: the self-supervised multi-view compatibility objective is stated only at the level of 'photometric or geometric loss between renderings from other views and each view itself.' No explicit loss equation, weighting schedule between terms, or handling of occlusions / visibility is supplied, preventing assessment of whether the objective is well-behaved or prone to the local minima warned about in the stress-test note.
Authors: We acknowledge that an explicit equation and implementation details would improve clarity and allow better assessment of behavior. In the revision, we will insert the full mathematical formulation of the multi-view compatibility loss (photometric L1 + geometric consistency terms), specify the weighting schedule (equal weights with a small regularization term), and describe occlusion handling via rendered depth visibility masks. We will also expand the stress-test discussion to explicitly address local-minima risks and mitigation via the prior penalties. revision: yes
-
Referee: [Experiments] Experiments: no ablation isolating the contribution of the self-supervised term versus the prior penalties, no error bars, and no stability analysis across scenes or inputs are reported. Without these, it is impossible to determine whether the reported gains are robust or the result of per-dataset tuning, directly undermining the generalizability asserted in the abstract.
Authors: We agree that these analyses are important for demonstrating robustness. In the revised manuscript, we will add an ablation study isolating the self-supervised term from the prior penalties, include error bars computed over multiple random seeds, and provide a stability analysis across scenes and input variations (e.g., different numbers of views). These additions will directly support the generalizability claims without altering the core results. revision: yes
Circularity Check
No significant circularity in the test-time optimization framework
full rationale
The paper presents an empirical test-time optimization method (TCO) that refines pre-trained MVT outputs by minimizing a combination of self-supervised multi-view compatibility losses (photometric/geometric) and prior penalty terms at inference. Performance gains on ETH3D, 7-Scenes, and NRGBD are reported as measured outcomes of this optimization rather than as closed-form predictions derived from the inputs. No equations reduce to their own definitions by construction, no fitted parameters are relabeled as independent predictions, and no load-bearing self-citations or uniqueness theorems are invoked to justify the core claims. The framework is self-contained as a practical optimization procedure whose validity rests on external benchmark measurements.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The optimization loss consists of a self-supervised objective and prior penalty terms. The self-supervised objective captures the compatibility among multi-view predictions and is implemented using photometric or geometric loss between renderings from other views and each view itself.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use 2DGS as a differentiable renderer to project the source RGB images and depth maps to the target views using the predicted source and target camera poses and intrinsics.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Neural rgb-d surface reconstruction
Dejan Azinovi ´c, Ricardo Martin-Brualla, Dan B Goldman, Matthias Nießner, and Justus Thies. Neural rgb-d surface reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6290– 6301, 2022. 1, 5
work page 2022
-
[3]
Must3r: Multi-view network for stereo 3d recon- struction
Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jerome Revaud, and Vincent Leroy. Must3r: Multi-view network for stereo 3d recon- struction. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1050–1060, 2025. 1, 2
work page 2025
-
[4]
Ttt3r: 3d reconstruction as test-time training
Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, and Anpei Chen. Ttt3r: 3d reconstruction as test-time training. arXiv preprint arXiv:2509.26645, 2025. 2
-
[5]
Superpoint: Self-supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point detection and description. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018. 1
work page 2018
-
[6]
Learning iterative reasoning through energy mini- mization
Yilun Du, Shuang Li, Joshua Tenenbaum, and Igor Mor- datch. Learning iterative reasoning through energy mini- mization. InInternational Conference on Machine Learning, pages 5570–5582. PMLR, 2022. 2
work page 2022
-
[7]
Learning iterative reasoning through energy diffusion
Yilun Du, Jiayuan Mao, and Joshua B Tenenbaum. Learning iterative reasoning through energy diffusion.arXiv preprint arXiv:2406.11179, 2024. 2
-
[8]
Roma: Robust dense fea- ture matching
Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense fea- ture matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19790– 19800, 2024. 1
work page 2024
-
[9]
How the Tesla AI / FSD system learns to drive – an inside look from Tesla VP of AI / Autopilot
Ashok (@aelluswamy) Elluswamy. How the Tesla AI / FSD system learns to drive – an inside look from Tesla VP of AI / Autopilot. X (formerly Twitter), 2025. Accessed: 2025-11-
work page 2025
-
[10]
Project Aria: A New Tool for Egocentric Multi-Modal AI Research
Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, et al. Project aria: A new tool for egocentric multi-modal ai research.arXiv preprint arXiv:2308.13561, 2023. 1
work page internal anchor Pith review arXiv 2023
-
[11]
Yossi Gandelsman, Yu Sun, Xinlei Chen, and Alexei Efros. Test-time training with masked autoencoders.Advances in Neural Information Processing Systems, 35:29374–29385,
-
[12]
Energy-based trans- formers are scalable learners and thinkers.arXiv preprint arXiv:2507.02092, 2025
Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Peix- uan Han, Hyeonjeong Ha, Aman Chadha, Yilun Du, Heng Ji, Jundong Li, and Tariq Iqbal. Energy-based trans- formers are scalable learners and thinkers.arXiv preprint arXiv:2507.02092, 2025. 2
-
[13]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Training large lan- guage models to reason in a continuous latent space.arXiv preprint arXiv:2412.06769, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Richard Hartley and Andrew Zisserman.Multiple view ge- ometry in computer vision. Cambridge university press,
-
[16]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 1, 5
work page 2022
-
[17]
2d gaussian splatting for geometrically ac- curate radiance fields
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024. 4
work page 2024
-
[18]
Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richard- son, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Pow3r: Empowering un- constrained 3d reconstruction with camera and scene priors
Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lour- des Agapito, and Jerome Revaud. Pow3r: Empowering un- constrained 3d reconstruction with camera and scene priors. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 1071–1081, 2025. 1, 2, 3, 6, 7
work page 2025
-
[20]
Large scale multi-view stereopsis eval- uation
Rasmus Jensen, Anders Dahl, George V ogiatzis, Engil Tola, and Henrik Aanæs. Large scale multi-view stereopsis eval- uation. In2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 406–413. IEEE, 2014. 1, 5
work page 2014
-
[21]
Thinking, fast and slow.Farrar , Straus and Giroux, 2011
Daniel Kahneman. Thinking, fast and slow.Farrar , Straus and Giroux, 2011. 2
work page 2011
-
[22]
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Nikhil Keetha, Norman M ¨uller, Johannes Sch ¨onberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, et al. Mapanything: Universal feed-forward metric 3d re- construction.arXiv preprint arXiv:2509.13414, 2025. 1, 2, 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[24]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014. 5
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[25]
Ground- ing image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r. InEuropean Confer- ence on Computer Vision, pages 71–91. Springer, 2024. 1, 2
work page 2024
-
[26]
Lightglue: Local feature matching at light speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023. 1 9
work page 2023
-
[27]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
David G Lowe. Distinctive image features from scale- invariant keypoints.International journal of computer vi- sion, 60(2):91–110, 2004. 1
work page 2004
-
[29]
Align3r: Aligned monocular depth estimation for dynamic videos
Jiahao Lu, Tianyu Huang, Peng Li, Zhiyang Dou, Cheng Lin, Zhiming Cui, Zhen Dong, Sai-Kit Yeung, Wenping Wang, and Yuan Liu. Align3r: Aligned monocular depth estimation for dynamic videos. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22820–22830,
-
[30]
Jorge Nocedal and Stephen J Wright.Numerical optimiza- tion. Springer, 2006. 3
work page 2006
-
[31]
Global structure-from-motion revisited
Linfei Pan, D ´aniel Bar´ath, Marc Pollefeys, and Johannes L Sch¨onberger. Global structure-from-motion revisited. In European Conference on Computer Vision, pages 58–77. Springer, 2024. 1, 2
work page 2024
-
[32]
Surfels: Surface elements as rendering primi- tives
Hanspeter Pfister, Matthias Zwicker, Jeroen Van Baar, and Markus Gross. Surfels: Surface elements as rendering primi- tives. InProceedings of the 27th annual conference on Com- puter graphics and interactive techniques, pages 335–342,
-
[33]
Unidepth: Universal monocular metric depth estimation
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. Unidepth: Universal monocular metric depth estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10106–10116, 2024. 2
work page 2024
-
[34]
3d-mvp: 3d multi- view pretraining for robotic manipulation.arXiv preprint arXiv:2406.18158, 2024
Shengyi Qian, Kaichun Mo, Valts Blukis, David F Fouhey, Dieter Fox, and Ankit Goyal. 3d-mvp: 3d multi- view pretraining for robotic manipulation.arXiv preprint arXiv:2406.18158, 2024. 1
-
[35]
Superglue: Learning feature matching with graph neural networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020. 1
work page 2020
-
[36]
Structure- from-motion revisited
Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 4104–4113, 2016. 1, 2
work page 2016
-
[37]
A multi-view stereo benchmark with high- resolution images and multi-camera videos
Thomas Schops, Johannes L Schonberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and An- dreas Geiger. A multi-view stereo benchmark with high- resolution images and multi-camera videos. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 3260–3269, 2017. 1, 5
work page 2017
-
[38]
Scene co- ordinate regression forests for camera relocalization in rgb-d images
Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene co- ordinate regression forests for camera relocalization in rgb-d images. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2930–2937, 2013. 1, 5
work page 2013
-
[39]
Loftr: Detector-free local feature matching with transformers
Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. Loftr: Detector-free local feature matching with transformers. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8922–8931, 2021. 1
work page 2021
-
[40]
Richard Szeliski.Computer vision: algorithms and applica- tions. Springer Nature, 2022. 1, 2
work page 2022
-
[41]
Bundle adjustment—a modern synthe- sis
Bill Triggs, Philip F McLauchlan, Richard I Hartley, and An- drew W Fitzgibbon. Bundle adjustment—a modern synthe- sis. InInternational workshop on vision algorithms, pages 298–372. Springer, 1999. 1
work page 1999
-
[42]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2
work page 2017
-
[43]
Vggt: Vi- sual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Vi- sual geometry grounded transformer. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5294–5306, 2025. 1, 2, 3, 5, 6, 7
work page 2025
-
[44]
Continuous 3d per- ception model with persistent state
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d per- ception model with persistent state. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10510–10522, 2025. 2, 6, 7
work page 2025
-
[45]
Dust3r: Geometric 3d vi- sion made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and J´erˆome Revaud. Dust3r: Geometric 3d vi- sion made easy. 2024 ieee. InCVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20697– 20709, 2023. 1, 2
work page 2024
-
[46]
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He.π 3: Scalable permutation-equivariant visual geometry learning.arXiv e-prints, pages arXiv–2507,
-
[47]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large lan- guage models.Advances in neural information processing systems, 35:24824–24837, 2022. 2
work page 2022
-
[48]
Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass
Jianing Yang, Alexander Sax, Kevin J Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21924–21935,
-
[49]
Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, et al. gsplat: An open-source library for gaussian splatting.Journal of Machine Learning Research, 26(34):1–17, 2025. 5
work page 2025
-
[50]
Relative pose estimation through affine cor- rections of monocular depth priors
Yifan Yu, Shaohui Liu, R ´emi Pautrat, Marc Pollefeys, and Viktor Larsson. Relative pose estimation through affine cor- rections of monocular depth priors. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16706–16716, 2025. 2
work page 2025
-
[51]
Test3r: Learning to reconstruct 3d at test time
Yuheng Yuan, Qiuhong Shen, Shizun Wang, Xingyi Yang, and Xinchao Wang. Test3r: Learning to reconstruct 3d at test time. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 2
work page 2025
-
[52]
Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jam- pani, Trevor Darrell, Forrester Cole, Deqing Sun, and Ming- 10 Hsuan Yang. Monst3r: A simple approach for estimat- ing geometry in the presence of motion.arXiv preprint arXiv:2410.03825, 2024. 7
-
[53]
Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views
Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, and Gor- don Wetzstein. Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 21936–21947, 2025. 6
work page 2025
-
[54]
Fast-livo: Fast and tightly- coupled sparse-direct lidar-inertial-visual odometry
Chunran Zheng, Qingyan Zhu, Wei Xu, Xiyuan Liu, Qizhi Guo, and Fu Zhang. Fast-livo: Fast and tightly- coupled sparse-direct lidar-inertial-visual odometry. In2022 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4003–4009. IEEE, 2022. 1
work page 2022
-
[55]
Fast-livo2: Fast, direct lidar-inertial- visual odometry.IEEE Transactions on Robotics, 2024
Chunran Zheng, Wei Xu, Zuhao Zou, Tong Hua, Chongjian Yuan, Dongjiao He, Bingyang Zhou, Zheng Liu, Jiarong Lin, Fangcheng Zhu, et al. Fast-livo2: Fast, direct lidar-inertial- visual odometry.IEEE Transactions on Robotics, 2024. 1 11 Learning 3D Reconstruction with Priors in Test Time Supplementary Material
work page 2024
-
[56]
More Ablation Studies 7.1. Prediction Compatibility Objective We ablate on the heuristic rules designed for the predic- tion compatibility objective, i.e., the rendering loss im- plemented with 2DGS rasterization. First, we try differ- ent scale factorsαbetween the final 2DGS radius and the point map gradient magnitude, i.e.,r i,x,y =α|n i,x,y[z]| · [∥∇xp...
-
[57]
Implementation Details For reconstruction tasks, we only use photometric loss to realize the prediction compatibility objective. We set rota- tion loss weightµ 1 = 1.0, translation loss weightµ 2 = 2, and focal length loss weightµ 3 = 0.01. For ETH3D and 7-Scenes datasets, we use the a weaker photometric loss weight, i.e.,λ 1 = 0.2. For DTU and NRGBD data...
-
[58]
Robustness to the Prior Noise We test the robustness of our method to camera pose and intrinsic noise. We perturb the camera pose and intrinsic parameters by adding a small random perturbation to the ground truth values. We report the results in Tab. 8. Al- though the performance deteriorates as the perturbation in- creases, our method still outperforms t...
work page 2013
-
[59]
Test-time Inference Time and Limitations In this section, we report the test-time inference time of our method on the ETH3D dataset under different settings. As shown in Tab. 9, inference time is a limitation of our method. By trading off some efficiency, we improve the performance of our method on a series of benchmarks. We 1 Input imagesTCO-VGGT MapAnyt...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.