Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself

Xingyi Yang; Yuhang Dai

arxiv: 2604.14048 · v1 · submitted 2026-04-15 · 💻 cs.CV

Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself

Yuhang Dai , Xingyi Yang This is my paper

Pith reviewed 2026-05-10 12:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructiontest-time adaptationself-supervised learningfeed-forward modelscamera posepoint mapsLoRA updatesmulti-view consistency

0 comments

The pith

Feed-forward 3D reconstruction models can refine their own outputs at test time by enforcing consistency between full sequences and masked-frame subsets without any ground truth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that lets rigid feed-forward 3D models adapt to individual test scenes by treating longer view sequences as a source of self-supervision. It rests on the observation that adding more input views yields reconstructions that are more reliable across viewpoints, so the method masks some frames, compares the model's representations from the complete versus reduced inputs, and enforces matching features plus preserved pairwise geometry. Lightweight LoRA updates then recalibrate the model in under two minutes per scene. This yields measurable gains in camera pose accuracy and point map quality on standard benchmarks for models such as Depth Anything 3 and VGGT.

Core claim

Free Geometry constructs a self-supervised task from a testing sequence by masking a subset of frames, then enforces cross-view feature consistency between the representations produced from the full observation and the partial observation while also maintaining the pairwise relations implied by the held-out frames; these signals drive fast LoRA-based recalibration that improves the base model's accuracy on the same scene.

What carries the argument

The masked-frame consistency task that compares full-sequence and partial-sequence representations while preserving implied pairwise geometry, used to generate a self-supervised training signal for LoRA updates.

If this is right

Camera pose accuracy rises by an average of 3.73 percent across four benchmark datasets.
Point map prediction accuracy rises by an average of 2.88 percent on the same datasets.
The same procedure works on top of existing foundation models including Depth Anything 3 and VGGT.
Adaptation completes in less than two minutes per dataset on a single GPU.
The gains appear in scenes containing occlusions, specular surfaces, and ambiguous visual cues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same masking-and-consistency principle might be tested on other feed-forward geometric tasks such as surface normal estimation or novel-view synthesis.
If longer sequences continue to supply stronger signals, the method could be iterated multiple times on a single scene to produce further incremental gains.
The approach suggests a general route for turning extra test-time observations into supervision for any model whose output quality scales with input length.

Load-bearing premise

More input views always produce more reliable and view-consistent reconstructions than fewer views, allowing masked subsets to serve as a trustworthy self-supervised signal.

What would settle it

Applying the masking-and-consistency procedure to a new test sequence and observing no improvement or a drop in camera-pose or point-map accuracy on held-out frames would show the self-supervision signal is not reliable.

Figures

Figures reproduced from arXiv: 2604.14048 by Xingyi Yang, Yuhang Dai.

**Figure 1.** Figure 1: Free Geometry enables feed-forward 3D reconstruction models to self-evolve at test time without any 3D ground truth and generalize on models and datasets. Abstract. Feed-forward 3D reconstruction models are efficient but rigid: once trained, they perform inference in a zero-shot manner and cannot adapt to the test scene. As a result, visually plausible reconstructions often contain errors, particularly und… view at source ↗

**Figure 2.** Figure 2: Long Sequence Provides Better Reconstruction Geometry. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Architecture of Free Geometry. The test sequence is processed in two configurations. Top: the full observation (all views, e.g. 8 views) passes through the Image Patch Embedding (e.g. DINOv2 [9]), the Multi-view Transformer, a randomized camera token, and encodes the views into feature representations. All encoders are frozen (gray). Bottom: the partial observation (half of views masked, e.g. 4 views) pass… view at source ↗

**Figure 4.** Figure 4: Self-Supervised Geometric Losses of Free Geometry: [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative Results On Multi-view Depth. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative Results on 3D Reconstruction. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 1.** Figure 1: Qualitative Results on 3D Reconstruction. [PITH_FULL_IMAGE:figures/full_fig_p027_1.png] view at source ↗

**Figure 2.** Figure 2: Qualitative Results on Multi-view Depth. [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗

read the original abstract

Feed-forward 3D reconstruction models are efficient but rigid: once trained, they perform inference in a zero-shot manner and cannot adapt to the test scene. As a result, visually plausible reconstructions often contain errors, particularly under occlusions, specularities, and ambiguous cues. To address this, we introduce Free Geometry, a framework that enables feed-forward 3D reconstruction models to self-evolve at test time without any 3D ground truth. Our key insight is that, when the model receives more views, it produces more reliable and view-consistent reconstructions. Leveraging this property, given a testing sequence, we mask a subset of frames to construct a self-supervised task. Free Geometry enforces cross-view feature consistency between representations from full and partial observations, while maintaining the pairwise relations implied by the held-out frames. This self-supervision allows for fast recalibration via lightweight LoRA updates, taking less than 2 minutes per dataset on a single GPU. Our approach consistently improves state-of-the-art foundation models, including Depth Anything 3 and VGGT, across 4 benchmark datasets, yielding an average improvement of 3.73% in camera pose accuracy and 2.88% in point map prediction. Code is available at https://github.com/hiteacherIamhumble/Free-Geometry .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives feed-forward 3D models a workable self-supervised test-time tweak via view masking and LoRA, but the bet that longer sequences always produce better targets is a real soft spot.

read the letter

The main takeaway is a test-time adaptation scheme for 3D reconstruction that masks some frames in a test sequence, treats the full-sequence output as a pseudo-target for consistency, and applies lightweight LoRA updates. No labels or external data are needed, and the whole process runs in under two minutes per scene on one GPU. They apply it to models like Depth Anything 3 and VGGT and report average gains of roughly 3.7 percent on camera pose and 2.9 percent on point maps across four benchmarks, with code released for inspection. That combination of masking-based self-supervision and parameter-efficient recalibration is the concrete new piece; it is not just another fine-tuning trick but one built around the model's own multi-view behavior. The practical framing and quick runtime are clear strengths, and releasing the code lets others verify the numbers without guesswork. The central assumption still needs scrutiny. The method counts on full sequences producing reliably more consistent and accurate outputs than masked subsets, so the consistency loss can pull the model toward better geometry. In scenes with specularities, occlusions, or textureless patches, extra views can add new inconsistencies instead of resolving them, which would turn the self-supervision into error reinforcement rather than correction. The abstract states the gains but does not include ablations that isolate when the assumption fails or direct comparisons against other adaptation baselines, so the reported improvements rest on limited evidence for now. This work is aimed at people who deploy 3D foundation models in robotics or spatial AI and need fast scene-specific boosts without retraining. A reader already working on test-time methods or multi-view consistency would get immediate value from the experiments and the released implementation. It deserves peer review because the idea is straightforward, the runtime numbers are usable, and the quantitative claims are specific enough to be checked, even though revisions would likely need to address the robustness of the core consistency premise.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Free Geometry, a test-time adaptation framework for feed-forward 3D reconstruction models. Given a test sequence, it masks a subset of frames to create a self-supervised task that enforces cross-view feature consistency between full-sequence and partial reconstructions while preserving pairwise relations from the held-out frames. Lightweight LoRA updates are then applied to refine models such as Depth Anything 3 and VGGT. The paper reports average improvements of 3.73% in camera pose accuracy and 2.88% in point map prediction across four benchmark datasets, with the process taking less than 2 minutes per dataset on a single GPU and no 3D ground truth required.

Significance. If the central claims hold, the work would provide a practical, efficient mechanism for adapting rigid zero-shot 3D foundation models to individual test scenes via internal consistency signals. This addresses a key limitation of current feed-forward approaches in handling ambiguities like occlusions and specularities. The reported gains on standard benchmarks and the emphasis on reproducibility (code release) would make it a useful contribution to test-time adaptation in 3D vision, provided the self-supervision mechanism is shown to be robust rather than merely self-reinforcing.

major comments (2)

[Method] Method section (description of the self-supervised consistency loss): The framework rests on the unvalidated premise that reconstructions from the full sequence are reliably more accurate and view-consistent than those from masked subsets, allowing the former to serve as pseudo-targets. No analysis, failure-case experiments, or quantitative comparison to ground truth is provided to show when this holds (e.g., under persistent ambiguities such as textureless regions or specularities). If the premise fails, the loss simply aligns the model to its own errors, directly undermining the claimed improvements.
[Experiments] Experiments section (quantitative results and ablations): The reported average gains of 3.73% pose and 2.88% point-map accuracy are presented without ablation studies isolating the contributions of cross-view feature consistency versus pairwise relation preservation, without details on masking ratios or LoRA hyperparameters, and without analysis of variance across sequences. This makes it impossible to determine whether the gains are robust or sensitive to the specific self-supervision construction.

minor comments (2)

[Abstract and Method] The abstract and method descriptions would benefit from explicit notation for the masking operation and the exact form of the consistency loss (e.g., whether it is L2 on features or a different metric).
[Figures] Figure captions and the framework diagram should more clearly distinguish the full-sequence path from the masked-subset path to aid reader comprehension.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major point below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Method] Method section (description of the self-supervised consistency loss): The framework rests on the unvalidated premise that reconstructions from the full sequence are reliably more accurate and view-consistent than those from masked subsets, allowing the former to serve as pseudo-targets. No analysis, failure-case experiments, or quantitative comparison to ground truth is provided to show when this holds (e.g., under persistent ambiguities such as textureless regions or specularities). If the premise fails, the loss simply aligns the model to its own errors, directly undermining the claimed improvements.

Authors: We acknowledge that the original manuscript does not provide direct quantitative comparisons to ground truth or failure-case analyses specifically validating that full-sequence reconstructions are superior to masked ones. The reported improvements on standard benchmarks provide indirect evidence of the method's effectiveness. To address this concern rigorously, we will add in the revised manuscript: (1) quantitative comparisons of full vs. masked reconstruction accuracy against ground truth on a subset of sequences, (2) failure case studies highlighting scenarios with textureless regions and specularities, and (3) discussion of conditions under which the premise holds. This will clarify the robustness of the self-supervision signal. revision: yes
Referee: [Experiments] Experiments section (quantitative results and ablations): The reported average gains of 3.73% pose and 2.88% point-map accuracy are presented without ablation studies isolating the contributions of cross-view feature consistency versus pairwise relation preservation, without details on masking ratios or LoRA hyperparameters, and without analysis of variance across sequences. This makes it impossible to determine whether the gains are robust or sensitive to the specific self-supervision construction.

Authors: We agree that additional details and ablations are necessary to demonstrate the robustness of the results. The original submission focused on overall performance but omitted component-wise ablations, specific hyperparameter values, and per-sequence variance. In the revision, we will include: ablations separating the effects of cross-view consistency and pairwise preservation, tables detailing masking ratios (e.g., 20-50%) and LoRA configurations (rank, alpha), and standard deviation or per-dataset variance analysis for the reported metrics. These additions will allow readers to assess sensitivity and reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; self-supervision uses held-out frames with external benchmark validation

full rationale

The paper's core mechanism generates a self-supervised consistency loss by comparing the model's output on a full test sequence against its output on a masked subset of the same sequence, then applies LoRA updates. This does not reduce to a tautology by construction because the full-sequence output is not mathematically forced to equal the masked output; the loss is minimized through parameter updates whose effect is measured on independent ground-truth benchmarks (camera pose and point map accuracy). No equations are presented that equate the target to the input by definition, no parameters are fitted on a subset and then renamed as a prediction of the same quantity, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The derivation therefore remains self-contained against external evaluation rather than self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that additional views improve reconstruction consistency, which is turned into a self-supervised objective; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption When the model receives more views, it produces more reliable and view-consistent reconstructions
This property is directly invoked to justify masking frames and using the resulting consistency as supervision.

pith-pipeline@v0.9.0 · 5530 in / 1265 out tokens · 58752 ms · 2026-05-10T12:53:54.582395+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

In: ICCV (2021)

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: ICCV (2021)

work page 2021
[2]

In: International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

work page 2021
[3]

In: NeurIPS (2020)

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do- ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent: A new approach to self-supervised learn- ing. In: NeurIPS (2020)

work page 2020
[4]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

In: International Conference on Learning Representations (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: International Conference on Learning Representations (2022)

work page 2022
[6]

In: The Fourteenth International Conference on Learning Representations (2026)

Lin, H., Chen, S., Liew, J.H., Chen, D.Y., Li, Z., Zhao, Y., Peng, S., Guo, H., Zhou, X., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. In: The Fourteenth International Conference on Learning Representations (2026)

work page 2026
[7]

(eds.) Advances in Neural Infor- mation Processing Systems (2021)

Liu, Y., Kothari, P., van Delft, B.G., Bellot-Gurlet, B., Mordan, T., Alahi, A.: TTT++: When does self-supervised test-time training fail or thrive? In: Beygelz- imer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Infor- mation Processing Systems (2021)

work page 2021
[8]

In: ICLR (2019)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

work page 2019
[9]

TMLR (2024)

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: DINOv2: Learning robust visual features without supervision. TMLR (2024)

work page 2024
[10]

In: CVPR (2019)

Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: CVPR (2019)

work page 2019
[11]

In: ICLR (2015)

Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: Hints for thin deep nets. In: ICLR (2015)

work page 2015
[12]

In: Conference on Computer Vision and Pattern Recognition (2016)

Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (2016)

work page 2016
[13]

In: CVPR (2017)

Schöps, T., Schönberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)

work page 2017
[14]

In: CVPR (2013)

Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)

work page 2013
[15]

In: ICML (2020)

Sun,Y.,Wang,X.,Zhuang,L.,Miller,J.,Hardt,M.,Efros,A.A.:Test-timetraining with self-supervision for generalization under distribution shifts. In: ICML (2020)

work page 2020
[16]

IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 376–380 (1991)

Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 376–380 (1991)

work page 1991
[17]

Dai and X

Wang, D., Shelhamer, E., Liu, S., Olshausen, B., Darrell, T.: Tent: Fully test- timeadaptationbyentropyminimization.In:InternationalConferenceonLearning Representations (2021) 16 Y. Dai and X. Yang

work page 2021
[18]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (2025)

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (2025)

work page 2025
[19]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024
[20]

European Conference on Computer Vision (2018)

Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: Mvsnet: Depth inference for unstruc- tured multi-view stereo. European Conference on Computer Vision (2018)

work page 2018
[21]

In: ICCV (2023)

Yeshwanth,C.,Liu,Y.C.,Nießner,M.,Dai,A.:ScanNet++:Ahigh-fidelitydataset of 3d indoor scenes. In: ICCV (2023)

work page 2023
[22]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Yuan, Y., Shen, Q., Wang, S., Yang, X., Wang, X.: Test3r: Learning to reconstruct 3d at test time. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

work page 2025
[23]

Zhang, M., Levine, S., Finn, C.: MEMO: Test time robustness via adaptation and augmentation. In: NeurIPS (2022) Free Geometry 1 Supplementary Material 1 Method Details 1.1 Free Geometry Self-Supervised Geometric Losses Free Geometry performs test-time adaptation through a self-supervised geo- metric objective defined between two branches of the same scene...

work page arXiv 2022

[1] [1]

In: ICCV (2021)

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: ICCV (2021)

work page 2021

[2] [2]

In: International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

work page 2021

[3] [3]

In: NeurIPS (2020)

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do- ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent: A new approach to self-supervised learn- ing. In: NeurIPS (2020)

work page 2020

[4] [4]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[5] [5]

In: International Conference on Learning Representations (2022)

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: International Conference on Learning Representations (2022)

work page 2022

[6] [6]

In: The Fourteenth International Conference on Learning Representations (2026)

Lin, H., Chen, S., Liew, J.H., Chen, D.Y., Li, Z., Zhao, Y., Peng, S., Guo, H., Zhou, X., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. In: The Fourteenth International Conference on Learning Representations (2026)

work page 2026

[7] [7]

(eds.) Advances in Neural Infor- mation Processing Systems (2021)

Liu, Y., Kothari, P., van Delft, B.G., Bellot-Gurlet, B., Mordan, T., Alahi, A.: TTT++: When does self-supervised test-time training fail or thrive? In: Beygelz- imer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Infor- mation Processing Systems (2021)

work page 2021

[8] [8]

In: ICLR (2019)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

work page 2019

[9] [9]

TMLR (2024)

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: DINOv2: Learning robust visual features without supervision. TMLR (2024)

work page 2024

[10] [10]

In: CVPR (2019)

Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: CVPR (2019)

work page 2019

[11] [11]

In: ICLR (2015)

Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: Hints for thin deep nets. In: ICLR (2015)

work page 2015

[12] [12]

In: Conference on Computer Vision and Pattern Recognition (2016)

Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (2016)

work page 2016

[13] [13]

In: CVPR (2017)

Schöps, T., Schönberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)

work page 2017

[14] [14]

In: CVPR (2013)

Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)

work page 2013

[15] [15]

In: ICML (2020)

Sun,Y.,Wang,X.,Zhuang,L.,Miller,J.,Hardt,M.,Efros,A.A.:Test-timetraining with self-supervision for generalization under distribution shifts. In: ICML (2020)

work page 2020

[16] [16]

IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 376–380 (1991)

Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(4), 376–380 (1991)

work page 1991

[17] [17]

Dai and X

Wang, D., Shelhamer, E., Liu, S., Olshausen, B., Darrell, T.: Tent: Fully test- timeadaptationbyentropyminimization.In:InternationalConferenceonLearning Representations (2021) 16 Y. Dai and X. Yang

work page 2021

[18] [18]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (2025)

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (2025)

work page 2025

[19] [19]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024

[20] [20]

European Conference on Computer Vision (2018)

Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: Mvsnet: Depth inference for unstruc- tured multi-view stereo. European Conference on Computer Vision (2018)

work page 2018

[21] [21]

In: ICCV (2023)

Yeshwanth,C.,Liu,Y.C.,Nießner,M.,Dai,A.:ScanNet++:Ahigh-fidelitydataset of 3d indoor scenes. In: ICCV (2023)

work page 2023

[22] [22]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Yuan, Y., Shen, Q., Wang, S., Yang, X., Wang, X.: Test3r: Learning to reconstruct 3d at test time. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

work page 2025

[23] [23]

Zhang, M., Levine, S., Finn, C.: MEMO: Test time robustness via adaptation and augmentation. In: NeurIPS (2022) Free Geometry 1 Supplementary Material 1 Method Details 1.1 Free Geometry Self-Supervised Geometric Losses Free Geometry performs test-time adaptation through a self-supervised geo- metric objective defined between two branches of the same scene...

work page arXiv 2022