EPO: Boosting 3D Foundation Models with Edge-based Pose Optimization

Christian Sormann; Friedrich Fraundorfer; Mattia D'Urso; Mattia Rossi

arxiv: 2607.00579 · v1 · pith:OLYHCAJUnew · submitted 2026-07-01 · 💻 cs.CV

EPO: Boosting 3D Foundation Models with Edge-based Pose Optimization

Mattia D'Urso , Christian Sormann , Mattia Rossi , Friedrich Fraundorfer This is my paper

Pith reviewed 2026-07-02 14:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords edge-based pose optimizationstructure from motion3D foundation modelsgeometric refinementdifferentiable optimizationbundle adjustment alternative

0 comments

The pith

Edge map alignment refines camera poses in 3D reconstructions without building feature tracks or point correspondences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EPO as a way to improve the geometric accuracy of 3D foundation models that produce fast but imprecise Structure-from-Motion results. It replaces the usual requirement for explicit feature tracks with a differentiable process that aligns edge maps extracted from the input images. This proxy signal drives pose and geometry optimization directly. If the approach holds, it would let users refine foundation-model outputs at much lower cost in time and memory than traditional bundle-adjustment pipelines, including on ordinary hardware.

Core claim

EPO is a fully differentiable, trackless framework that treats edge-map alignment as a proxy for geometric optimization, recovering camera poses and 3D structure from 3D foundation model outputs while avoiding feature extraction and track construction entirely; evaluations show it matches or exceeds the accuracy of Bundle Adjustment-style methods at substantially lower runtime and memory footprint.

What carries the argument

Edge map alignment, which supplies the optimization objective by measuring consistency of edge maps across views to adjust poses and points without explicit multi-view correspondences.

If this is right

Refinement becomes possible immediately after a foundation-model forward pass without any intermediate feature detection or matching.
Memory consumption drops enough for the method to execute on consumer-grade GPUs where competing refinement techniques run out of memory.
Runtime per scene decreases while accuracy on standard SfM benchmarks stays comparable to or better than track-based methods.
The same edge-alignment objective can be applied across multiple downstream 3D tasks that start from foundation-model reconstructions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be inserted into any pipeline that already produces rough camera estimates, lowering the barrier to high-quality 3D output on limited hardware.
Because the signal comes from edges rather than points, the method might tolerate scenes where repeatable point features are scarce but edge structure remains visible.
An end-to-end differentiable chain from image to refined 3D model becomes feasible if foundation-model components are also made differentiable with respect to the same edge objective.

Load-bearing premise

Edge maps extracted from the input images supply a sufficiently complete and unbiased geometric signal to recover accurate poses and structure without tracked point features.

What would settle it

A controlled test on a dataset with known ground-truth poses where EPO converges to solutions whose reprojection error or pose error exceeds that of a standard bundle-adjustment baseline by a clear margin.

Figures

Figures reproduced from arXiv: 2607.00579 by Christian Sormann, Friedrich Fraundorfer, Mattia D'Urso, Mattia Rossi.

**Figure 1.** Figure 1: Visualization of three stages of our pose optimization method and the ground truth sparse point cloud for the Graz Town Hall scene (TerraSky3D). Starting from the initial optimization stage (a), provided by the VGGT output, we illustrate an intermediate state (b) and the final refined poses (c). The ground truth poses are shown in green and the optimized ones in red . Abstract. We introduce Edge-based Pose… view at source ↗

**Figure 2.** Figure 2: Example of input RGB image, VGGT raw depth, and the corresponding DTF [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Qualitative Novel View Synthesis Results. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Absolute and relative inference time breakdown for the optimization methods evaluated in Tab. 3. † indicates runs on an NVIDIA H200. 6 Discussion and Limitations 6.1 Boosting Different 3D Foundation Models We conducted the same experiments described in Sec. 5.1 using various 3DFMs reconstructions as a starting point to demonstrate our method’s generalization across different models. Aside from the RGB imag… view at source ↗

read the original abstract

We introduce \textbf{Edge-based Pose Optimization (EPO)}, a trackless geometric optimization framework specifically designed to boost the Structure-from-Motion reconstructions generated by 3D Foundation Models. These models achieve rapid inference by bypassing the time-consuming feature extraction and matching stages of traditional pipelines, where explicit correspondences between each 3D point and multiple images, referred to as tracks, are established. However, their geometric accuracy currently falls short of traditional pipelines. While this can be addressed in a post-processing step via Bundle Adjustment-like refinement, doing so requires extracting feature tracks, thus defeating the original speed advantage. Instead, our fully differentiable framework uses edge map alignment as a proxy for geometric optimization, avoiding feature extraction and track construction entirely. Through extensive evaluation across multiple datasets and tasks, we demonstrate that EPO matches or outperforms Bundle Adjustment-like methods while requiring significantly lower runtime and memory. Notably, its reduced memory footprint makes EPO suitable for consumer-grade hardware, where competing refinement methods cannot run.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EPO replaces feature tracks with differentiable edge-map alignment for pose refinement in 3D foundation models, but the abstract gives no evidence that this actually recovers accurate geometry without the usual multi-view constraints.

read the letter

The core move here is to drop track construction altogether and optimize camera poses by aligning edge maps extracted from the input images in a fully differentiable setup. That directly targets the speed-accuracy tradeoff the authors identify in current 3D foundation models: they skip matching for speed but then lose geometric quality, and standard bundle-adjustment fixes reintroduce the very cost they avoided.

What the paper actually shows is a clean framing of the problem and a plausible alternative signal. Edge maps are cheap to compute and the alignment loss can be made differentiable, which in principle lets the optimization run on consumer hardware with lower memory than track-based methods. If the experiments hold up, this would be useful for anyone running SfM on limited devices.

The soft spot is the weakest assumption the abstract itself flags: that edge consistency across views supplies enough unbiased constraint to match or beat explicit 3D-2D correspondences. Edges suffer from the aperture problem, viewpoint-dependent fragmentation, and ambiguity in textureless or repetitive areas. Nothing in the provided text explains how the optimizer avoids local minima or systematic bias that triangulation from point tracks normally suppresses. The claim of matching BA performance rests on “extensive evaluation,” but without loss details, dataset stats, or ablations visible here, it is impossible to tell whether the result is robust or tuned to favorable cases.

This is for readers working on practical, memory-constrained 3D pipelines who already use foundation models and need a lightweight refinement step. It deserves a serious referee because the idea is technically distinct and the motivation is real, even though the current evidence is thin and the central assumption needs direct testing against track-based baselines.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Edge-based Pose Optimization (EPO), a fully differentiable, trackless framework for refining camera poses and 3D geometry from 3D foundation model outputs. It replaces explicit feature tracks and Bundle Adjustment with edge-map alignment as a geometric proxy, claiming to achieve comparable or superior accuracy to BA-like methods while using substantially lower runtime and memory, thereby enabling refinement on consumer-grade hardware.

Significance. If the performance claims hold under rigorous verification, EPO would represent a meaningful advance in efficient post-processing for foundation-model SfM pipelines by eliminating the need for feature extraction and multi-view track construction. The reduced memory footprint is a practical strength for deployment scenarios where traditional refinement is infeasible.

major comments (2)

[Abstract] Abstract: the central performance claim ('matches or outperforms Bundle Adjustment-like methods') is presented without any quantitative metrics, tables, datasets, or ablation results. Because the soundness of the method rests on this empirical demonstration, the absence of supporting evidence in the manuscript is load-bearing.
[Abstract] Abstract: the claim that edge-map alignment supplies a sufficiently constraining and unbiased signal for pose recovery is not accompanied by analysis of known limitations (aperture problem, viewpoint-dependent edge fragmentation, low-texture regions). Without explicit handling or empirical mitigation of these issues, the substitution for explicit multi-view tracks remains unverified.

minor comments (1)

[Abstract] Abstract: the phrase 'Bundle Adjustment-like methods' is imprecise; the manuscript should name the specific baselines (e.g., COLMAP BA, certain differentiable renderers) used in the claimed comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim ('matches or outperforms Bundle Adjustment-like methods') is presented without any quantitative metrics, tables, datasets, or ablation results. Because the soundness of the method rests on this empirical demonstration, the absence of supporting evidence in the manuscript is load-bearing.

Authors: The abstract is a concise summary. The full manuscript provides the required empirical support through quantitative metrics, tables, datasets, and ablations in the Experiments section, demonstrating that EPO matches or outperforms BA-like methods with lower runtime and memory. No change to the abstract is needed as the evidence is already present in the body of the paper. revision: no
Referee: [Abstract] Abstract: the claim that edge-map alignment supplies a sufficiently constraining and unbiased signal for pose recovery is not accompanied by analysis of known limitations (aperture problem, viewpoint-dependent edge fragmentation, low-texture regions). Without explicit handling or empirical mitigation of these issues, the substitution for explicit multi-view tracks remains unverified.

Authors: We agree that explicit discussion of these limitations strengthens the paper. In the revised manuscript we will add a dedicated analysis subsection addressing the aperture problem, edge fragmentation, and low-texture regions, including how EPO's differentiable multi-view alignment provides mitigation. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The provided abstract and context describe EPO as a differentiable edge-map alignment proxy for pose optimization that avoids explicit tracks and feature extraction. No equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes smuggled via prior work are present in the given material. Performance claims rest on external dataset evaluations rather than any reduction of outputs to inputs by construction. The framework is therefore self-contained against the benchmarks it reports.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no concrete free parameters, axioms, or invented entities; the method itself is the contribution rather than any new postulated object.

pith-pipeline@v0.9.1-grok · 5708 in / 1161 out tokens · 35419 ms · 2026-07-02T14:53:15.443712+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 4 canonical work pages · 2 internal anchors

[1]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip- nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5470–5479 (2022)

2022
[2]

In: European Conference on Com- puter Vision

Brachmann, E., Wynn, J., Chen, S., Cavallari, T., Monszpart, A., Turmukhambe- tov, D., Prisacariu, V.A.: Scene coordinate reconstruction: Posing of image collec- tions via incremental learning of a relocalizer. In: European Conference on Com- puter Vision. pp. 421–440. Springer (2024)

2024
[3]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Cabon, Y., Stoffl, L., Antsfeld, L., Csurka, G., Chidlovskii, B., Revaud, J., Leroy, V.: Must3r: Multi-view network for stereo 3d reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1050– 1060 (2025)

2025
[4]

IEEE Transactions on Pattern Analysis and Machine IntelligenceP AMI-8(6), 679–698 (1986)

Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine IntelligenceP AMI-8(6), 679–698 (1986)

1986
[5]

In: Proceedings of the IEEE conference on com- puter vision and pattern recognition workshops

DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE conference on com- puter vision and pattern recognition workshops. pp. 224–236 (2018)

2018
[6]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Doersch, C., Yang, Y., Vecerik, M., Gokay, D., Gupta, A., Aytar, Y., Carreira, J., Zisserman, A.: Tapir: Tracking any point with per-frame initialization and tem- poral refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10061–10072 (2023)

2023
[7]

In: 2025 International Conference on 3D Vision (3DV)

Duisterhof, B.P., Zust, L., Weinzaepfel, P., Leroy, V., Cabon, Y., Revaud, J.: Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion. In: 2025 International Conference on 3D Vision (3DV). pp. 1–10. IEEE (2025)

2025
[8]

In: 2026 Inter- national Conference on 3D Vision (3DV)

D’Urso, M., Santellani, E., Sormann, C., Rossi, M., Kuhn, A., Fraundorfer, F.: A streamlined attention-based network for descriptor extraction. In: 2026 Inter- national Conference on 3D Vision (3DV). pp. 247–256. IEEE Computer Society (2026)

2026
[9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)

D’Urso, M., Sormann, C., Rossi, M., Fraundorfer, F.: Multi-view reconstructions of european landmarks in 4k. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)

2026
[10]

D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: A trainable cnn for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1905
[11]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Edstedt, J., Athanasiadis, I., Wadenbäck, M., Felsberg, M.: Dkm: Dense kernel- ized feature matching for geometry estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17765–17775 (2023)

2023
[12]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Edstedt, J., Sun, Q., Bökman, G., Wadenbäck, M., Felsberg, M.: Roma: Robust dense feature matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19790–19800 (2024)

2024
[13]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Elflein, S., Zhou, Q., Leal-Taixé, L.: Light3r-sfm: Towards feed-forward structure- from-motion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16774–16784 (2025)

2025
[14]

Theory of computing8(1), 415–428 (2012)

Felzenszwalb, P.F., Huttenlocher, D.P.: Distance transforms of sampled functions. Theory of computing8(1), 415–428 (2012)

2012
[15]

In: European Conference on Computer Vision

Harley, A.W., Fang, Z., Fragkiadaki, K.: Particle video revisited: Tracking through occlusions using point trajectories. In: European Conference on Computer Vision. pp. 59–75. Springer (2022) 10 M. D’Urso et al

2022
[16]

In: Proceedings of the IEEE interna- tional conference on computer vision

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human- level performance on imagenet classification. In: Proceedings of the IEEE interna- tional conference on computer vision. pp. 1026–1034 (2015)

2015
[17]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Karaev, N., Makarov, Y., Wang, J., Neverova, N., Vedaldi, A., Rupprecht, C.: Cotracker3: Simpler and better point tracking by pseudo-labelling real videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6013–6022 (2025)

2025
[18]

In: European conference on computer vision

Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., Rupprecht, C.: Cotracker: It is better to track together. In: European conference on computer vision. pp. 18–35. Springer (2024)

2024
[19]

Keetha, N., Müller, N., Schönberger, J., Porzi, L., Zhang, Y., Fischer, T., Knapitsch, A., Zauss, D., Weber, E., Antunes, N., Luiten, J., Lopez-Antequera, M., Rota Bulò, S., Richardt, C., Ramanan, D., Scherer, S., Kontschieder, P.: Ma- pAnything:Universalfeed-forwardmetric3Dreconstruction.In:2026International Conference on 3D Vision (3DV). pp. 499–509. IE...

2026
[20]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

2023
[21]

In: 2016 IEEE international conference on robotics and automation (ICRA)

Kuse, M., Shen, S.: Robust camera motion estimation using direct edge alignment and sub-gradient method. In: 2016 IEEE international conference on robotics and automation (ICRA). pp. 573–579. IEEE (2016)

2016
[22]

In: European Conference on Computer Vision

Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: European Conference on Computer Vision. pp. 71–91. Springer (2024)

2024
[23]

In: 2026 International Conference on 3D Vision (3DV)

Li, J., Wang, H., Irshad, M.Z., Vasiljevic, I., Walter, M.R., Guizilini, V.C., Shakhnarovich, G.: FastMap: Revisiting structure from motion through first-order optimization. In: 2026 International Conference on 3D Vision (3DV). pp. 29–39. IEEE Computer Society (2026)

2026
[24]

In: International Conference on Learning Representations (ICLR) (2026),https://openreview.net/forum?id=yirunib8l8, oral

Lin, H., Chen, S., Liew, J.H., Chen, D.Y., Li, Z., Zhao, Y., Peng, S., Guo, H., Zhou, X., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. In: International Conference on Learning Representations (ICLR) (2026),https://openreview.net/forum?id=yirunib8l8, oral

2026
[25]

In: Proceedings of the IEEE/CVF international conference on com- puter vision

Lindenberger, P., Sarlin, P.E., Pollefeys, M.: Lightglue: Local feature matching at light speed. In: Proceedings of the IEEE/CVF international conference on com- puter vision. pp. 17627–17638 (2023)

2023
[26]

In: European Conference on Computer Vision

Liu, S., Gao, Y., Zhang, T., Pautrat, R., Schönberger, J.L., Larsson, V., Pollefeys, M.: Robust incremental structure-from-motion with hybrid features. In: European Conference on Computer Vision. pp. 249–269. Springer (2024)

2024
[27]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, S., Yu, Y., Pautrat, R., Pollefeys, M., Larsson, V.: 3d line mapping revisited. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21445–21455 (2023)

2023
[28]

In: 7th Interna- tional Conference on Learning Representations (2019)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: 7th Interna- tional Conference on Learning Representations (2019)

2019
[29]

Interna- tional journal of computer vision60(2), 91–110 (2004)

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Interna- tional journal of computer vision60(2), 91–110 (2004)

2004
[30]

Commu- nications of the ACM65(1), 99–106 (2021)

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021)

2021
[31]

In: International Workshop on Reproducible Research in Pattern Recog- nition

Moulon, P., Monasse, P., Perrot, R., Marlet, R.: Openmvg: Open multiple view geometry. In: International Workshop on Reproducible Research in Pattern Recog- nition. pp. 60–74. Springer (2016) Supplementary Material: Boosting 3D Foundation Models... 11

2016
[32]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

In: European Conference on Computer Vision

Pan, L., Baráth, D., Pollefeys, M., Schönberger, J.L.: Global structure-from-motion revisited. In: European Conference on Computer Vision. pp. 58–77. Springer (2024)

2024
[34]

Advances in neural information processing sys- tems32(2019)

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high- performance deep learning library. Advances in neural information processing sys- tems32(2019)

2019
[35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pautrat, R., Barath, D., Larsson, V., Oswald, M.R., Pollefeys, M.: Deeplsd: Line segment detection and refinement withdeep image gradients. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17327– 17336 (2023)

2023
[36]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pautrat, R., Suárez, I., Yu, Y., Pollefeys, M., Larsson, V.: Gluestick: Robust image matching by sticking points and lines together. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9706–9716 (2023)

2023
[37]

In: Proceedings of the IEEE/CVF international conference on computer vision

Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12179–12188 (2021)

2021
[38]

Advances in neural information processing sys- tems32(2019)

Revaud, J., De Souza, C., Humenberger, M., Weinzaepfel, P.: R2d2: Reliable and repeatable detector and descriptor. Advances in neural information processing sys- tems32(2019)

2019
[39]

In: 2022 26th International conference on pattern recognition (ICPR)

Santellani, E., Sormann, C., Rossi, M., Kuhn, A., Fraundorfer, F.: Md-net: Multi- detector for local feature extraction. In: 2022 26th International conference on pattern recognition (ICPR). pp. 3944–3951. IEEE (2022)

2022
[40]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4938–4947 (2020)

2020
[41]

In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Schenk, F., Fraundorfer, F.: Robust edge-based visual odometry using machine- learned edges. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1297–1304. IEEE (2017)

2017
[42]

In: 2019 International Conference on Robotics and Automation (ICRA)

Schenk, F., Fraundorfer, F.: Reslam: A real-time robust edge-based slam system. In: 2019 International Conference on Robotics and Automation (ICRA). pp. 154–

2019
[43]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4104–4113 (2016)

2016
[44]

International journal of computer vision80(2), 189–210 (2008)

Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo col- lections. International journal of computer vision80(2), 189–210 (2008)

2008
[45]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local fea- ture matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8922–8931 (2021)

2021
[46]

In: Proceedings of the 23rd ACM international conference on Mul- timedia

Sweeney, C., Hollerer, T., Turk, M.: Theia: A fast and scalable structure-from- motion library. In: Proceedings of the 23rd ACM international conference on Mul- timedia. pp. 693–696 (2015)

2015
[47]

In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages

Tillet, P., Kung, H.T., Cox, D.: Triton: an intermediate language and compiler for tiled neural network computations. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. pp. 10–19 (2019) 12 M. D’Urso et al

2019
[48]

In: International workshop on vision algorithms

Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjust- ment—a modern synthesis. In: International workshop on vision algorithms. pp. 298–372. Springer (1999)

1999
[49]

Advances in neural information processing systems33, 14254–14265 (2020)

Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: Learning local features with policy gra- dient. Advances in neural information processing systems33, 14254–14265 (2020)

2020
[50]

ACM Trans- actions on Graphics (TOG)36(1), 1–11 (2017)

Waechter, M., Beljan, M., Fuhrmann, S., Moehrle, N., Kopf, J., Goesele, M.: Vir- tual rephotography: Novel view prediction error for 3d reconstruction. ACM Trans- actions on Graphics (TOG)36(1), 1–11 (2017)

2017
[51]

In: 2025 Interna- tional Conference on 3D Vision (3DV)

Wang, H., Agapito, L.: 3d reconstruction with spatial memory. In: 2025 Interna- tional Conference on 3D Vision (3DV). pp. 78–89. IEEE (2025)

2025
[52]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5294–5306 (2025)

2025
[53]

In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition

Wang, J., Karaev, N., Rupprecht, C., Novotny, D.: Vggsfm: Visual geometry grounded deep structure from motion. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 21686–21697 (2024)

2024
[54]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, Q., Zhang, Y., Holynski, A., Efros, A.A., Kanazawa, A.: Continuous 3d perception model with persistent state. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 10510–10522 (2025)

2025
[55]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20697–20709 (2024)

2024
[56]

In: International Conference on Learning Representations (ICLR) (2026),https://openreview

Wang, Y., Zhou, J., Zhu, H., Chang, W., Zhou, Y., Li, Z., Chen, J., Pang, J., Shen, C., He, T.:π3: Permutation-equivariant visual geometry learning. In: International Conference on Learning Representations (ICLR) (2026),https://openreview. net/forum?id=DTQIjngDta

2026
[57]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Yang, J., Sax, A., Liang, K.J., Henaff, M., Tang, H., Cao, A., Chai, J., Meier, F., Feiszli, M.: Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 21924–21935 (2025)

2025
[58]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Yeshwanth, C., Liu, Y.C., Nießner, M., Dai, A.: Scannet++: A high-fidelity dataset of 3d indoor scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12–22 (2023)

2023
[59]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

2018
[60]

arXiv preprint arXiv:2302.09493 (2023)

Zhao, H., Shang, J., Liu, K., Chen, C., Gu, F.: Edgevo: An efficient and accurate edge-based visual odometry. arXiv preprint arXiv:2302.09493 (2023)

work page arXiv 2023
[61]

IEEE Transac- tions on Instrumentation and Measurement72, 1–16 (2023)

Zhao, X., Wu, X., Chen, W., Chen, P.C., Xu, Q., Li, Z.: Aliked: A lighter keypoint and descriptor extraction network via deformable transformation. IEEE Transac- tions on Instrumentation and Measurement72, 1–16 (2023)

2023
[62]

arXiv preprint arXiv:2510.13310 (2025)

Zhong, J., Zhan, Z., Gao, Q., Chen, Z., Lou, H., Mao, J., Neumann, U., Wang, C., Wang, Y.: InstantSfM: Towards GPU-native SfM for the deep learning era. arXiv preprint arXiv:2510.13310 (2025)

work page arXiv 2025
[63]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation rep- resentations in neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5745–5753 (2019)

2019
[64]

IEEE Transactions on Robotics35(1), 184– 199 (2018)

Zhou, Y., Li, H., Kneip, L.: Canny-vo: Visual odometry with rgb-d cameras based on geometric 3-d–2-d edge alignment. IEEE Transactions on Robotics35(1), 184– 199 (2018)

2018

[1] [1]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip- nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5470–5479 (2022)

2022

[2] [2]

In: European Conference on Com- puter Vision

Brachmann, E., Wynn, J., Chen, S., Cavallari, T., Monszpart, A., Turmukhambe- tov, D., Prisacariu, V.A.: Scene coordinate reconstruction: Posing of image collec- tions via incremental learning of a relocalizer. In: European Conference on Com- puter Vision. pp. 421–440. Springer (2024)

2024

[3] [3]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Cabon, Y., Stoffl, L., Antsfeld, L., Csurka, G., Chidlovskii, B., Revaud, J., Leroy, V.: Must3r: Multi-view network for stereo 3d reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1050– 1060 (2025)

2025

[4] [4]

IEEE Transactions on Pattern Analysis and Machine IntelligenceP AMI-8(6), 679–698 (1986)

Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine IntelligenceP AMI-8(6), 679–698 (1986)

1986

[5] [5]

In: Proceedings of the IEEE conference on com- puter vision and pattern recognition workshops

DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE conference on com- puter vision and pattern recognition workshops. pp. 224–236 (2018)

2018

[6] [6]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Doersch, C., Yang, Y., Vecerik, M., Gokay, D., Gupta, A., Aytar, Y., Carreira, J., Zisserman, A.: Tapir: Tracking any point with per-frame initialization and tem- poral refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10061–10072 (2023)

2023

[7] [7]

In: 2025 International Conference on 3D Vision (3DV)

Duisterhof, B.P., Zust, L., Weinzaepfel, P., Leroy, V., Cabon, Y., Revaud, J.: Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion. In: 2025 International Conference on 3D Vision (3DV). pp. 1–10. IEEE (2025)

2025

[8] [8]

In: 2026 Inter- national Conference on 3D Vision (3DV)

D’Urso, M., Santellani, E., Sormann, C., Rossi, M., Kuhn, A., Fraundorfer, F.: A streamlined attention-based network for descriptor extraction. In: 2026 Inter- national Conference on 3D Vision (3DV). pp. 247–256. IEEE Computer Society (2026)

2026

[9] [9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)

D’Urso, M., Sormann, C., Rossi, M., Fraundorfer, F.: Multi-view reconstructions of european landmarks in 4k. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)

2026

[10] [10]

D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: A trainable cnn for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1905

[11] [11]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Edstedt, J., Athanasiadis, I., Wadenbäck, M., Felsberg, M.: Dkm: Dense kernel- ized feature matching for geometry estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17765–17775 (2023)

2023

[12] [12]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Edstedt, J., Sun, Q., Bökman, G., Wadenbäck, M., Felsberg, M.: Roma: Robust dense feature matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19790–19800 (2024)

2024

[13] [13]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Elflein, S., Zhou, Q., Leal-Taixé, L.: Light3r-sfm: Towards feed-forward structure- from-motion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16774–16784 (2025)

2025

[14] [14]

Theory of computing8(1), 415–428 (2012)

Felzenszwalb, P.F., Huttenlocher, D.P.: Distance transforms of sampled functions. Theory of computing8(1), 415–428 (2012)

2012

[15] [15]

In: European Conference on Computer Vision

Harley, A.W., Fang, Z., Fragkiadaki, K.: Particle video revisited: Tracking through occlusions using point trajectories. In: European Conference on Computer Vision. pp. 59–75. Springer (2022) 10 M. D’Urso et al

2022

[16] [16]

In: Proceedings of the IEEE interna- tional conference on computer vision

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human- level performance on imagenet classification. In: Proceedings of the IEEE interna- tional conference on computer vision. pp. 1026–1034 (2015)

2015

[17] [17]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Karaev, N., Makarov, Y., Wang, J., Neverova, N., Vedaldi, A., Rupprecht, C.: Cotracker3: Simpler and better point tracking by pseudo-labelling real videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6013–6022 (2025)

2025

[18] [18]

In: European conference on computer vision

Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., Rupprecht, C.: Cotracker: It is better to track together. In: European conference on computer vision. pp. 18–35. Springer (2024)

2024

[19] [19]

Keetha, N., Müller, N., Schönberger, J., Porzi, L., Zhang, Y., Fischer, T., Knapitsch, A., Zauss, D., Weber, E., Antunes, N., Luiten, J., Lopez-Antequera, M., Rota Bulò, S., Richardt, C., Ramanan, D., Scherer, S., Kontschieder, P.: Ma- pAnything:Universalfeed-forwardmetric3Dreconstruction.In:2026International Conference on 3D Vision (3DV). pp. 499–509. IE...

2026

[20] [20]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

2023

[21] [21]

In: 2016 IEEE international conference on robotics and automation (ICRA)

Kuse, M., Shen, S.: Robust camera motion estimation using direct edge alignment and sub-gradient method. In: 2016 IEEE international conference on robotics and automation (ICRA). pp. 573–579. IEEE (2016)

2016

[22] [22]

In: European Conference on Computer Vision

Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: European Conference on Computer Vision. pp. 71–91. Springer (2024)

2024

[23] [23]

In: 2026 International Conference on 3D Vision (3DV)

Li, J., Wang, H., Irshad, M.Z., Vasiljevic, I., Walter, M.R., Guizilini, V.C., Shakhnarovich, G.: FastMap: Revisiting structure from motion through first-order optimization. In: 2026 International Conference on 3D Vision (3DV). pp. 29–39. IEEE Computer Society (2026)

2026

[24] [24]

In: International Conference on Learning Representations (ICLR) (2026),https://openreview.net/forum?id=yirunib8l8, oral

Lin, H., Chen, S., Liew, J.H., Chen, D.Y., Li, Z., Zhao, Y., Peng, S., Guo, H., Zhou, X., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. In: International Conference on Learning Representations (ICLR) (2026),https://openreview.net/forum?id=yirunib8l8, oral

2026

[25] [25]

In: Proceedings of the IEEE/CVF international conference on com- puter vision

Lindenberger, P., Sarlin, P.E., Pollefeys, M.: Lightglue: Local feature matching at light speed. In: Proceedings of the IEEE/CVF international conference on com- puter vision. pp. 17627–17638 (2023)

2023

[26] [26]

In: European Conference on Computer Vision

Liu, S., Gao, Y., Zhang, T., Pautrat, R., Schönberger, J.L., Larsson, V., Pollefeys, M.: Robust incremental structure-from-motion with hybrid features. In: European Conference on Computer Vision. pp. 249–269. Springer (2024)

2024

[27] [27]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, S., Yu, Y., Pautrat, R., Pollefeys, M., Larsson, V.: 3d line mapping revisited. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21445–21455 (2023)

2023

[28] [28]

In: 7th Interna- tional Conference on Learning Representations (2019)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: 7th Interna- tional Conference on Learning Representations (2019)

2019

[29] [29]

Interna- tional journal of computer vision60(2), 91–110 (2004)

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Interna- tional journal of computer vision60(2), 91–110 (2004)

2004

[30] [30]

Commu- nications of the ACM65(1), 99–106 (2021)

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021)

2021

[31] [31]

In: International Workshop on Reproducible Research in Pattern Recog- nition

Moulon, P., Monasse, P., Perrot, R., Marlet, R.: Openmvg: Open multiple view geometry. In: International Workshop on Reproducible Research in Pattern Recog- nition. pp. 60–74. Springer (2016) Supplementary Material: Boosting 3D Foundation Models... 11

2016

[32] [32]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[33] [33]

In: European Conference on Computer Vision

Pan, L., Baráth, D., Pollefeys, M., Schönberger, J.L.: Global structure-from-motion revisited. In: European Conference on Computer Vision. pp. 58–77. Springer (2024)

2024

[34] [34]

Advances in neural information processing sys- tems32(2019)

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high- performance deep learning library. Advances in neural information processing sys- tems32(2019)

2019

[35] [35]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pautrat, R., Barath, D., Larsson, V., Oswald, M.R., Pollefeys, M.: Deeplsd: Line segment detection and refinement withdeep image gradients. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17327– 17336 (2023)

2023

[36] [36]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pautrat, R., Suárez, I., Yu, Y., Pollefeys, M., Larsson, V.: Gluestick: Robust image matching by sticking points and lines together. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9706–9716 (2023)

2023

[37] [37]

In: Proceedings of the IEEE/CVF international conference on computer vision

Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12179–12188 (2021)

2021

[38] [38]

Advances in neural information processing sys- tems32(2019)

Revaud, J., De Souza, C., Humenberger, M., Weinzaepfel, P.: R2d2: Reliable and repeatable detector and descriptor. Advances in neural information processing sys- tems32(2019)

2019

[39] [39]

In: 2022 26th International conference on pattern recognition (ICPR)

Santellani, E., Sormann, C., Rossi, M., Kuhn, A., Fraundorfer, F.: Md-net: Multi- detector for local feature extraction. In: 2022 26th International conference on pattern recognition (ICPR). pp. 3944–3951. IEEE (2022)

2022

[40] [40]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4938–4947 (2020)

2020

[41] [41]

In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Schenk, F., Fraundorfer, F.: Robust edge-based visual odometry using machine- learned edges. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1297–1304. IEEE (2017)

2017

[42] [42]

In: 2019 International Conference on Robotics and Automation (ICRA)

Schenk, F., Fraundorfer, F.: Reslam: A real-time robust edge-based slam system. In: 2019 International Conference on Robotics and Automation (ICRA). pp. 154–

2019

[43] [43]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4104–4113 (2016)

2016

[44] [44]

International journal of computer vision80(2), 189–210 (2008)

Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo col- lections. International journal of computer vision80(2), 189–210 (2008)

2008

[45] [45]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local fea- ture matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8922–8931 (2021)

2021

[46] [46]

In: Proceedings of the 23rd ACM international conference on Mul- timedia

Sweeney, C., Hollerer, T., Turk, M.: Theia: A fast and scalable structure-from- motion library. In: Proceedings of the 23rd ACM international conference on Mul- timedia. pp. 693–696 (2015)

2015

[47] [47]

In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages

Tillet, P., Kung, H.T., Cox, D.: Triton: an intermediate language and compiler for tiled neural network computations. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. pp. 10–19 (2019) 12 M. D’Urso et al

2019

[48] [48]

In: International workshop on vision algorithms

Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjust- ment—a modern synthesis. In: International workshop on vision algorithms. pp. 298–372. Springer (1999)

1999

[49] [49]

Advances in neural information processing systems33, 14254–14265 (2020)

Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: Learning local features with policy gra- dient. Advances in neural information processing systems33, 14254–14265 (2020)

2020

[50] [50]

ACM Trans- actions on Graphics (TOG)36(1), 1–11 (2017)

Waechter, M., Beljan, M., Fuhrmann, S., Moehrle, N., Kopf, J., Goesele, M.: Vir- tual rephotography: Novel view prediction error for 3d reconstruction. ACM Trans- actions on Graphics (TOG)36(1), 1–11 (2017)

2017

[51] [51]

In: 2025 Interna- tional Conference on 3D Vision (3DV)

Wang, H., Agapito, L.: 3d reconstruction with spatial memory. In: 2025 Interna- tional Conference on 3D Vision (3DV). pp. 78–89. IEEE (2025)

2025

[52] [52]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5294–5306 (2025)

2025

[53] [53]

In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition

Wang, J., Karaev, N., Rupprecht, C., Novotny, D.: Vggsfm: Visual geometry grounded deep structure from motion. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 21686–21697 (2024)

2024

[54] [54]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, Q., Zhang, Y., Holynski, A., Efros, A.A., Kanazawa, A.: Continuous 3d perception model with persistent state. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 10510–10522 (2025)

2025

[55] [55]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20697–20709 (2024)

2024

[56] [56]

In: International Conference on Learning Representations (ICLR) (2026),https://openreview

Wang, Y., Zhou, J., Zhu, H., Chang, W., Zhou, Y., Li, Z., Chen, J., Pang, J., Shen, C., He, T.:π3: Permutation-equivariant visual geometry learning. In: International Conference on Learning Representations (ICLR) (2026),https://openreview. net/forum?id=DTQIjngDta

2026

[57] [57]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Yang, J., Sax, A., Liang, K.J., Henaff, M., Tang, H., Cao, A., Chai, J., Meier, F., Feiszli, M.: Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 21924–21935 (2025)

2025

[58] [58]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Yeshwanth, C., Liu, Y.C., Nießner, M., Dai, A.: Scannet++: A high-fidelity dataset of 3d indoor scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12–22 (2023)

2023

[59] [59]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

2018

[60] [60]

arXiv preprint arXiv:2302.09493 (2023)

Zhao, H., Shang, J., Liu, K., Chen, C., Gu, F.: Edgevo: An efficient and accurate edge-based visual odometry. arXiv preprint arXiv:2302.09493 (2023)

work page arXiv 2023

[61] [61]

IEEE Transac- tions on Instrumentation and Measurement72, 1–16 (2023)

Zhao, X., Wu, X., Chen, W., Chen, P.C., Xu, Q., Li, Z.: Aliked: A lighter keypoint and descriptor extraction network via deformable transformation. IEEE Transac- tions on Instrumentation and Measurement72, 1–16 (2023)

2023

[62] [62]

arXiv preprint arXiv:2510.13310 (2025)

Zhong, J., Zhan, Z., Gao, Q., Chen, Z., Lou, H., Mao, J., Neumann, U., Wang, C., Wang, Y.: InstantSfM: Towards GPU-native SfM for the deep learning era. arXiv preprint arXiv:2510.13310 (2025)

work page arXiv 2025

[63] [63]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation rep- resentations in neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5745–5753 (2019)

2019

[64] [64]

IEEE Transactions on Robotics35(1), 184– 199 (2018)

Zhou, Y., Li, H., Kneip, L.: Canny-vo: Visual odometry with rgb-d cameras based on geometric 3-d–2-d edge alignment. IEEE Transactions on Robotics35(1), 184– 199 (2018)

2018