pith. sign in

arxiv: 2409.12190 · v4 · pith:GY62R43Dnew · submitted 2024-09-18 · 💻 cs.RO · cs.CV

Bundle Adjustment in the Eager Mode

Pith reviewed 2026-05-23 20:51 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords bundle adjustmentPyTorchGPU accelerationsparse auto-differentiationsecond-order optimizationSLAMeager moderobotics
0
0 comments X

The pith

A PyTorch eager-mode bundle adjustment library achieves average speedups of 18.5x to 23x on GPU over GTSAM, g2o, and Ceres.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a bundle adjustment implementation that runs directly inside PyTorch using eager execution. It relies on a sparsity-aware auto-differentiation design paired with GPU-accelerated sparse matrix operations to support second-order optimization. Existing C++ solvers lack this native connection to deep learning tools, which restricts their use in perception pipelines that mix learned models with geometric optimization. A reader would care because the new library removes the need to switch between frameworks when building robotic systems such as SLAM or augmented reality.

Core claim

Our eager-mode BA on GPU demonstrates substantial runtime efficiency, achieving an average speedup of 18.5×, 22×, and 23× across all benchmarks compared to GTSAM, g²o, and Ceres, respectively, by means of a sparsity-aware auto-differentiation design and GPU-accelerated sparse operations designed for 2nd-order optimization.

What carries the argument

Sparsity-aware auto-differentiation design combined with GPU-accelerated sparse operations for second-order optimization inside PyTorch.

If this is right

  • Bundle adjustment can be called directly from PyTorch code without data transfer to external C++ libraries.
  • Robotic perception systems gain faster second-order optimization on GPU hardware for tasks such as SLAM.
  • Implementation and debugging of bundle adjustment become simpler within the PyTorch ecosystem.
  • Second-order methods for camera pose and landmark estimation become more accessible to deep learning workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The integration opens the possibility of embedding bundle adjustment as a differentiable layer inside larger neural networks for joint training.
  • Similar sparsity handling could be applied to other second-order problems that arise in robotics beyond pure bundle adjustment.
  • The reported speedups suggest that real-time bundle adjustment on embedded GPU devices becomes more feasible when paired with learned components.

Load-bearing premise

A sparsity-aware auto-differentiation design realized in PyTorch can deliver the reported GPU speedups while preserving the same numerical correctness and convergence behavior as the C++ solvers.

What would settle it

Running identical benchmark problems on the same GPU hardware and finding that the PyTorch version either converges to different parameter values or takes longer wall-clock time than GTSAM, g2o, or Ceres.

Figures

Figures reproduced from arXiv: 2409.12190 by Chen Wang, Huan Xu, Xinpeng Wei, Yaoyu Hu, Zihang Fang, Zitong Zhan.

Figure 1
Figure 1. Figure 1: Comparison of runtime and implementational efficiency across [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative results on the BAL dataset. Our method successfully recovered the 3D geometry in the scene. Best viewed digitally. To achieve end-to-end differentiability under an eager mode in￾terface and for a simpler implementation, PyPose [23] provides a variety of non-linear solvers, including Gauss-Newton and Levenberg-Marquardt (LM), entirely in PyTorch. gradSLAM [24] is a SLAM demo project that include… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the Bundle Adjustment (BA) process. The optimization pipeline starts from the input of camera poses and landmarks, followed by camera reprojection and residual computation. Jacobian matrix stores the gradient of each residual w.r.t camera pose and 3D point parameter, showing the contributions of parameters to the Jacobian blocks. The final optimization step iteratively refines camera poses and … view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the sparsity-aware Jacobian construction in BA. The forward pass is shown by the arrows moving rightward. Each residual rij is calculated by parameters of a camera pose ζi and a 3D point pj . Since each camera and point contribute to multiple reprojections, the parameters are replicated to match the residuals using camera_indices and point_indices, respectively. The backward pass, shown by … view at source ↗
Figure 5
Figure 5. Figure 5: Sparse Levenberg-Marquardt Optimization in Eager-Mode Bun￾dle Adjustment. The diagram illustrates the core computation steps in the optimization. It starts by forming the normal equations via sparse Jacobian multiplication, followed by computing the damped system using diagonal clamping. The system is then solved via sparse linear solvers, and convergence is checked to determine whether to update parameter… view at source ↗
Figure 6
Figure 6. Figure 6: Speedup of our BA relative to other frameworks exponentially [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a)-(c): Convergence curves (MSE v.s. Time) on the BAL dataset. (d): [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative 3D reconstruction results on diverse indoor environments [ [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative 3D reconstruction results on outdoor scenes [ [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Zero-shot qualitative evaluation of the iDKM model on the Waymo Open Dataset (row 1) and diverse real-world scenarios (rows 2–4). Our approach [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative Comparison of Pose Optimization Results. Top row: Trajectory visualization of camera poses from the parking garage dataset. Bottom row: Pose graph optimization on the synthetic ”sphere” dataset. Each column shows (left) the initial poses before optimization, (middle) results from Ceres Solver, and (right) results from our proposed eager-mode BA framework. Both optimization methods successfully… view at source ↗
read the original abstract

Bundle adjustment (BA) is a critical technique in various robotic applications such as simultaneous localization and mapping (SLAM), augmented reality (AR), and photogrammetry. BA optimizes parameters such as camera poses and 3D landmarks to align them with observations. With the growing importance of deep learning in perception systems, there is an increasing need to integrate BA with deep learning frameworks for enhanced reliability and performance. However, widely-used C++-based BA libraries, such as GTSAM, g$^2$o, and Ceres Solver, lack native integration with modern deep learning libraries like PyTorch. This limitation affects their flexibility, ease of debugging, and overall implementation efficiency. To address this gap, we introduce an eager-mode BA library seamlessly integrated with PyTorch with high efficiency. Our approach includes a sparsity-aware auto-differentiation design and GPU-accelerated sparse operations designed for 2nd-order optimization. Our eager-mode BA on GPU demonstrates substantial runtime efficiency, achieving an average speedup of 18.5$\times$, 22$\times$, and 23$\times$ across all benchmarks compared to GTSAM, g$^2$o, and Ceres, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces an eager-mode bundle adjustment (BA) library integrated with PyTorch. It employs a sparsity-aware auto-differentiation design and GPU-accelerated sparse operations for second-order optimization, claiming average speedups of 18.5× versus GTSAM, 22× versus g²o, and 23× versus Ceres across all benchmarks.

Significance. If the reported speedups are achieved while preserving numerical equivalence and convergence behavior to the C++ baselines, the work would enable tighter integration of BA with deep-learning pipelines in robotics and vision, improving flexibility for end-to-end systems in SLAM and AR.

major comments (2)
  1. [Abstract] Abstract: the central speedup claims (18.5×, 22×, 23×) are presented without any accompanying implementation details, benchmark descriptions, accuracy metrics, error analysis, or convergence criteria, so it is impossible to determine whether the data or design supports the numbers.
  2. [Approach (inferred from abstract claims)] The manuscript assumes a sparsity-aware autodiff design in eager-mode PyTorch combined with GPU sparse linear algebra can realize the claimed wall-clock gains and numerically equivalent iterates; no description of the custom kernels, Schur-complement handling, or precision safeguards is supplied to substantiate this premise.
minor comments (1)
  1. [Abstract] Notation for the g²o library is inconsistent (g$^2$o in the abstract).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript to improve clarity on the abstract claims and technical details of the approach.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central speedup claims (18.5×, 22×, 23×) are presented without any accompanying implementation details, benchmark descriptions, accuracy metrics, error analysis, or convergence criteria, so it is impossible to determine whether the data or design supports the numbers.

    Authors: We agree the abstract is concise and omits these supporting elements due to length limits. The manuscript body (Experiments section) contains the benchmark descriptions, accuracy metrics, error analysis, and convergence criteria, along with evidence of numerical equivalence to the C++ baselines. We will revise the abstract to briefly reference the benchmark setup and equivalence results. revision: yes

  2. Referee: [Approach (inferred from abstract claims)] The manuscript assumes a sparsity-aware autodiff design in eager-mode PyTorch combined with GPU sparse linear algebra can realize the claimed wall-clock gains and numerically equivalent iterates; no description of the custom kernels, Schur-complement handling, or precision safeguards is supplied to substantiate this premise.

    Authors: Section 3 of the manuscript outlines the sparsity-aware autodiff design and GPU sparse operations. To directly address the concern, we will expand this section in the revision with additional details on the custom kernels, Schur-complement implementation, and precision safeguards that maintain numerical equivalence and convergence behavior. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering implementation with external benchmarks

full rationale

The paper presents a PyTorch-based eager-mode BA implementation using sparsity-aware autodiff and GPU sparse ops, with performance claims benchmarked against independent external solvers (GTSAM, g²o, Ceres). No equations, derivations, or fitted parameters are present that reduce to self-definitions or self-citations. The central claims rest on reported wall-clock measurements rather than any load-bearing mathematical chain, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is an implementation contribution that rests on standard assumptions about PyTorch autograd behavior and GPU sparse linear algebra rather than new free parameters, axioms, or invented entities.

axioms (1)
  • domain assumption PyTorch autograd and GPU sparse matrix operations can be extended to support efficient second-order bundle adjustment without loss of correctness
    The design choices for sparsity-aware autodiff and GPU acceleration presuppose this engineering feasibility.

pith-pipeline@v0.9.0 · 5746 in / 1160 out tokens · 37935 ms · 2026-05-23T20:51:53.405742+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 2 internal anchors

  1. [1]

    Vr-gs: A physical dynamics-aware interactive gaussian splatting system in virtual reality,

    Y . Jiang, C. Yu, T. Xie, X. Li, Y . Feng, H. Wang, M. Li, H. Lau, F. Gao, Y . Yang, and C. Jiang, “VR-GS: A physical dynamics-aware interactive gaussian splatting system in virtual reality,” arXiv preprint arXiv:2401.16663, 2024

  2. [2]

    Detector-free structure from motion,

    X. He, J. Sun, Y . Wang, S. Peng, Q. Huang, H. Bao, and X. Zhou, “Detector-free structure from motion,” CVPR, 2024

  3. [3]

    AirSLAM: An efficient and illumination-robust point-line visual slam system,

    K. Xu, Y . Hao, S. Yuan, C. Wang, and L. Xie, “AirSLAM: An efficient and illumination-robust point-line visual slam system,” IEEE Transactions on Robotics (T-RO) , 2025. [Online]. Available: https://arxiv.org/abs/2408.03520

  4. [4]

    iMatching: Imperative correspondence learning,

    Z. Zhan, D. Gao, Y .-J. Lin, Y . Xia, and C. Wang, “iMatching: Imperative correspondence learning,” in European Conference on Computer Vision (ECCV), 2024. [Online]. Available: https://arxiv.org/abs/2312.02141

  5. [5]

    Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,

    Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 558–16 569, 2021

  6. [6]

    Imperative learning: A self-supervised neuro- symbolic learning framework for robot autonomy,

    C. Wang, K. Ji, J. Geng, Z. Ren, T. Fu, F. Yang, Y . Guo, H. He, X. Chen, Z. Zhan, Q. Du, S. Su, B. Li, Y . Qiu, Y . Du, Q. Li, Y . Yang, X. Lin, and Z. Zhao, “Imperative learning: A self-supervised neuro- symbolic learning framework for robot autonomy,” The International Journal of Robotics Research (IJRR) , 2025. [Online]. Available: https://arxiv.org/a...

  7. [7]

    iSLAM: Imperative SLAM,

    T. Fu, S. Su, Y . Lu, and C. Wang, “iSLAM: Imperative SLAM,” IEEE Robotics and Automation Letters (RA-L) , 2024. [Online]. Available: https://arxiv.org/abs/2306.07894

  8. [8]

    Vggt: Visual geometry grounded transformer,

    J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2025

  9. [9]

    PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transforma- tion and Graph Compilation

    J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. V oznesensky, B. Bao, P. Bell, D. Berard, E. Burovski, G. Chauhan, A. Chourdia, W. Constable, A. Desmaison, Z. DeVito, E. Ellison, W. Feng, J. Gong, M. Gschwind, B. Hirsh, S. Huang, K. Kalambarkar, L. Kirsch, M. Lazos, M. Lezcano, Y . Liang, J. Liang, Y . Lu, C. K. Luk, B. Maher, Y . Pan, C. Puhrsch, M....

  10. [10]

    PyTorch: An Imperative Style, High- Performance Deep Learning Library,

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An Imperative Style, High- Performance Deep Learning Library,” inAdvances in Neural Information Processing S...

  11. [11]

    TensorFlow: A system for large-scale machine learning

    M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. A. Tucker, V . Vasudevan, P. Warden, M. Wicke, Y . Yu, and X. Zhang, “Tensorflow: A system for large-scale machine learning,” CoRR, vol. abs/1605.08695, 2016. [Online]. Available: h...

  12. [12]

    The state of machine learning frameworks in 2019,

    H. He, “The state of machine learning frameworks in 2019,” The Gradient, 2019

  13. [13]

    borglab/gtsam,

    F. Dellaert and Contributors, “borglab/gtsam,” May 2022. [Online]. Available: https://github.com/borglab/gtsam

  14. [14]

    Ceres Solver,

    S. Agarwal, K. Mierle, and The Ceres Solver Team, “Ceres Solver,” Oct

  15. [15]

    Available: https://github.com/ceres-solver/ceres-solver

    [Online]. Available: https://github.com/ceres-solver/ceres-solver

  16. [16]

    G2o: A general framework for graph optimization,

    R. K ¨ummerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, “G2o: A general framework for graph optimization,” in IEEE Int. Conf. on Robotics and Automation (ICRA) , 06 2011, pp. 3607 – 3613

  17. [17]

    Theseus: A Library for Differentiable Nonlinear Optimization,

    L. Pineda, T. Fan, M. Monge, S. Venkataraman, P. Sodhi, R. T. Chen, J. Ortiz, D. DeTone, A. Wang, S. Anderson, J. Dong, B. Amos, and M. Mukadam, “Theseus: A Library for Differentiable Nonlinear Optimization,” Advances in Neural Information Processing Systems , 2022

  18. [18]

    Robust bundle adjustment revisited,

    C. Zach, “Robust bundle adjustment revisited,” in Computer Vision – ECCV 2014 , D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 772–787

  19. [19]

    A micro lie theory for state estimation in robotics,

    J. Sol `a, J. Deray, and D. Atchuthan, “A micro lie theory for state estimation in robotics,” 2021. [Online]. Available: https: //arxiv.org/abs/1812.01537

  20. [20]

    Bundle adjustment in the large,

    S. Agarwal, N. Snavely, S. M. Seitz, and R. Szeliski, “Bundle adjustment in the large,” in European Conference on Computer Vision (ECCV) . Springer, 2010, pp. 29–42

  21. [21]

    PyPose v0.6: The imperative programming interface for robotics,

    Z. Zhan, X. Li, Q. Li, H. He, A. Pandey, H. Xiao, Y . Xu, X. Chen, K. Xu, K. Cao, Z. Zhao, Z. Wang, H. Xu, Z. Fang, Y . Chen, W. Wang, X. Fang, Y . Du, T. Wu, X. Lin, Y . Qiu, F. Yang, J. Shi, S. Su, Y . Lu, T. Fu, K. Dantu, J. Wu, L. Xie, M. Hutter, L. Carlone, S. Scherer, D. Huang, Y . Hu, J. Geng, and C. Wang, “PyPose v0.6: The imperative programming i...

  22. [22]

    Tensorflow: A system for large-scale machine learning,

    M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V . Vasudevan, P. Warden, M. Wicke, Y . Yu, and X. Zheng, “Tensorflow: A system for large-scale machine learning,” in Proceedings of the 12th USENIX Symposium on Operating S...

  23. [23]

    TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning

    A. Agrawal, A. N. Modi, A. Passos, A. Lavoie, A. Agarwal, A. Shankar, I. Ganichev, J. Levenberg, M. Hong, R. Monga, and S. Cai, “Tensorflow eager: A multi-stage, python-embedded dsl for machine learning,” 2019. [Online]. Available: https://arxiv.org/abs/1903.01855

  24. [24]

    PyPose: A library for robot learning with physics-based optimization,

    C. Wang, D. Gao, K. Xu, J. Geng, Y . Hu, Y . Qiu, B. Li, F. Yang, B. Moon, A. Pandey, Aryan, J. Xu, T. Wu, H. He, D. Huang, Z. Ren, S. Zhao, T. Fu, P. Reddy, X. Lin, W. Wang, J. Shi, R. Talak, K. Cao, Y . Du, H. Wang, H. Yu, S. Wang, S. Chen, A. Kashyap, R. Bandaru, K. Dantu, J. Wu, L. Xie, L. Carlone, M. Hutter, and S. Scherer, “PyPose: A library for rob...

  25. [25]

    ▽slam: Dense slam meets automatic differentiation,

    K. M. Jatavallabhula, G. Iyer, and L. Paull, “ ▽slam: Dense slam meets automatic differentiation,” in 2020 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2020, pp. 2130–2137. 17

  26. [26]

    Deeplm: Large-scale nonlinear least squares on deep learning frameworks using stochastic domain decomposition,

    J. Huang, S. Huang, and M. Sun, “Deeplm: Large-scale nonlinear least squares on deep learning frameworks using stochastic domain decomposition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2021, pp. 10 308–10 317

  27. [27]

    OpenMP application program interface version 3.0,

    OpenMP Architecture Review Board, “OpenMP application program interface version 3.0,” May 2008. [Online]. Available: http://www. openmp.org/mp-documents/spec30.pdf

  28. [28]

    Multicore bundle adjustment,

    C. Wu, S. Agarwal, B. Curless, and S. M. Seitz, “Multicore bundle adjustment,” in Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition , ser. CVPR ’11. USA: IEEE Computer Society, 2011, p. 3057–3064. [Online]. Available: https://doi.org/10.1109/CVPR.2011.5995552

  29. [29]

    Decentralization and acceleration enables large-scale bundle adjustment,

    T. Fan, J. Ortiz, M. Hsiao, M. Monge, J. Dong, T. Murphey, and M. Mukadam, “Decentralization and acceleration enables large-scale bundle adjustment,” arXiv:2305.07026, 2023

  30. [30]

    C. Wang, K. M. Jatavallabhula, and M. Mukadam, Differentiable Optimization . Cambridge University Press, 2025. [Online]. Available: https://github.com/SLAM-Handbook-contributors/ slam-handbook-public-release/

  31. [31]

    pypose.optim.levenbergmarquardt,

    “pypose.optim.levenbergmarquardt,” https://pypose.org/docs/main/ generated/pypose.optim.LevenbergMarquardt/

  32. [32]

    Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets,

    M. Zheng, N. Chen, J. Zhu, X. Zeng, H. Qiu, Y . Jiang, X. Lu, and H. Qu, “Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets,” in IEEE/CVF International Conference on Computer Vision (ICCV) , 2023. [Online]. Available: https://arxiv.org/abs/2307.08383

  33. [33]

    PyTorch sparse bsr tensor documentation,

    “PyTorch sparse bsr tensor documentation,” https://pytorch.org/docs/ stable/sparse.html#sparse-bsr-tensor, 2024, accessed: 2024-09-12

  34. [34]

    Sparse Matrix - Coordinate List (COO),

    “Sparse Matrix - Coordinate List (COO),” https://en.wikipedia.org/wiki/ Sparse matrix#Coordinate list (COO), 2024, accessed: 2024-09-12

  35. [35]

    Tensor Indexing API,

    “Tensor Indexing API,” https://pytorch.org/cppdocs/notes/tensor indexing.html, 2024, accessed: 2024-09-13

  36. [36]

    Very high-speed computing systems,

    M. Flynn, “Very high-speed computing systems,” Proceedings of the IEEE, vol. 54, no. 12, pp. 1901–1909, 1966

  37. [37]

    Automatic differentiation with torch.autograd,

    “Automatic differentiation with torch.autograd,” https://pytorch.org/ tutorials/beginner/basics/autogradqs tutorial.html, accessed: 2024-09- 12

  38. [38]

    PyTorch Sparse CSR Tensor documentation,

    “PyTorch Sparse CSR Tensor documentation,” https://pytorch.org/docs/ stable/sparse.html#sparse-csr-tensor, 2024, accessed: 2024-09-12

  39. [39]

    Two fast algorithms for sparse matrices: Multiplication and permuted transposition,

    F. G. Gustavson, “Two fast algorithms for sparse matrices: Multiplication and permuted transposition,” ACM Trans. Math. Softw., vol. 4, no. 3, p. 250–269, sep 1978. [Online]. Available: https://doi.org/10.1145/355791. 355796

  40. [40]

    Optimizing sparse matrix–matrix multiplication for the gpu,

    S. Dalton, N. Bell, and L. N. Olson, “Optimizing sparse matrix–matrix multiplication for the gpu,” ACM Transactions on Mathematical Soft- ware, vol. 41, no. 4, pp. 1–20, 2015

  41. [41]

    cusparse library,

    NVIDIA Corporation, “cusparse library,” https://docs.nvidia.com/cuda/ cusparse/index.html, 2024, accessed: 2024-09-13

  42. [42]

    Warp: A high-performance python framework for gpu simulation and graphics,

    M. Macklin, “Warp: A high-performance python framework for gpu simulation and graphics,” https://github.com/nvidia/warp, March 2022, NVIDIA GPU Technology Conference (GTC)

  43. [43]

    J. H. Wilkinson and C. B. Moler, Matrix computations . GBR: John Wiley and Sons Ltd., 2003, p. 1103–1109

  44. [44]

    Triton language and compiler,

    Triton Contributors, “Triton language and compiler,” https://github.com/ triton-lang/triton, 2024, accessed: 2024-09-13

  45. [45]

    A.-L. Cholesky, “Note sur une m ´ethode de r ´esolution des ´equations normales provenant de l’application de la m ´ethode des moindres carr ´es a un syst `eme d’ ´equations lin ´eaires en nombre inf ´erieur a celui des inconnues. —application de la m ´ethode a la r ´esolution d’un syst `eme defini d’´equations lin ´eaires,” Bulletin g ´eod´esique, vol. 2...

  46. [46]

    cudss: A high-performance direct linear solver library,

    NVIDIA Corporation, “cudss: A high-performance direct linear solver library,” 2025, accessed: 2025-06-15. [Online]. Available: https://docs.nvidia.com/cuda/cudss/

  47. [47]

    Block preconditioning for the conjugate gradient method,

    P. Concus, G. Golub, and G. Meurant, “Block preconditioning for the conjugate gradient method,” LBL Publications , no. LBL-14856, 1982. [Online]. Available: https://escholarship.org/uc/item/0j60b61v

  48. [48]

    Cuda graphs,

    NVIDIA Corporation, “Cuda graphs,” 2025, accessed: 2025-06-15. [Online]. Available: https://docs.nvidia.com/cuda/ cuda-c-programming-guide/index.html#cuda-graphs

  49. [49]

    Pypose linear solver,

    “Pypose linear solver,” https://pypose.org/docs/main/generated/pypose. optim.solver.PINV/

  50. [50]

    Newton’s method with a model trust-region modification,

    D. C. Sorensen, “Newton’s method with a model trust-region modification,” University of North Texas Libraries, UNT Digital Library, Tech. Rep., September 1980, accessed: September 13, 2024. [Online]. Available: https://digital.library.unt.edu/ark:/67531/metadc283479/

  51. [51]

    Robust global translations with 1dsfm,

    K. Wilson and N. Snavely, “Robust global translations with 1dsfm,” in Proceedings of the European Conference on Computer Vision (ECCV) , 2014

  52. [52]

    Common Objects in 3D: Large-scale learning and evaluation of real-life 3D category reconstruction,

    J. Reizenstein, R. Shapovalov, P. Henzler, L. Sbordone, P. Labatut, and D. Novotny, “Common Objects in 3D: Large-scale learning and evaluation of real-life 3D category reconstruction,” in Proc. ICCV, 2021

  53. [53]

    oneapi threading building blocks (onetbb),

    Intel Corporation, “oneapi threading building blocks (onetbb),” https:// www.intel.com/content/www/us/en/developer/tools/oneapi/onetbb.html, 2021, version 2021.5

  54. [54]

    Superglue: Learning feature matching with graph neural networks,

    P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938–4947

  55. [55]

    Pixel-perfect structure-from-motion with featuremetric refinement,

    P. Lindenberger, P. Sarlin, V . Larsson, and M. Pollefeys, “Pixel-perfect structure-from-motion with featuremetric refinement,” arXiv.cs, vol. abs/2108.08291, 2021

  56. [56]

    PoseDiffusion: solving pose estimation via diffusion-aided bundle adjustment,

    J. Wang, C. Rupprecht, and D. Novotny, “PoseDiffusion: solving pose estimation via diffusion-aided bundle adjustment,” in Proc. ICCV, 2023

  57. [57]

    DUSt3R: Geometric 3D vision made easy,

    S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud, “DUSt3R: Geometric 3D vision made easy,” in Proc. CVPR, 2024

  58. [58]

    Grounding image matching in 3d with mast3r, 2024

    V . Leroy, Y . Cabon, and J. Revaud, “Grounding image matching in 3d with mast3r,” arXiv preprint arXiv:2406.09756 , 2024

  59. [59]

    VGGSfM: visual geometry grounded deep structure from motion,

    J. Wang, N. Karaev, C. Rupprecht, and D. Novotny, “VGGSfM: visual geometry grounded deep structure from motion,” in Proc. CVPR, 2024

  60. [60]

    Mv-dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds,

    Z. Tang, Y . Fan, D. Wang, H. Xu, R. Ranjan, A. Schwing, and Z. Yan, “Mv-dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds,” arXiv preprint arXiv:2412.06974 , 2024

  61. [61]

    Continuous 3d perception model with persistent state,

    Q. Wang, Y . Zhang, A. Holynski, A. A. Efros, and A. Kanazawa, “Continuous 3d perception model with persistent state,” 2025

  62. [62]

    Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views, 2026

    S. Zhang, J. Wang, Y . Xu, N. Xue, C. Rupprecht, X. Zhou, Y . Shen, and G. Wetzstein, “Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,” 2025. [Online]. Available: https://arxiv.org/abs/2502.12138

  63. [63]

    arXiv preprint arXiv:2501.13928 (2025)

    J. Yang, A. Sax, K. J. Liang, M. Henaff, H. Tang, A. Cao, J. Chai, F. Meier, and M. Feiszli, “Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass,” arXiv preprint arXiv:2501.13928 , 2025

  64. [64]

    Baspacho: Direct solver for sparse spd matrices for nonlinear optimization,

    Facebook Research, “Baspacho: Direct solver for sparse spd matrices for nonlinear optimization,” https://github.com/facebookresearch/baspacho, 2025, accessed: February 19, 2025

  65. [65]

    Scannet: Richly-annotated 3d reconstructions of indoor scenes,

    A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5828–5839

  66. [66]

    A large-scale outdoor multi-modal dataset and benchmark for novel view synthesis and implicit scene reconstruction,

    C. Lu, F. Yin, X. Chen, T. Chen, G. Yu, and J. Fan, “A large-scale outdoor multi-modal dataset and benchmark for novel view synthesis and implicit scene reconstruction,” arXiv preprint arXiv:2301.06782 , 2023

  67. [67]

    KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d,

    Y . Liao, J. Xie, and A. Geiger, “KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d,” Pattern Analysis and Machine Intelligence (PAMI) , 2022

  68. [68]

    BAD SLAM: Bundle adjusted direct RGB-D SLAM,

    T. Sch ¨ops, T. Sattler, and M. Pollefeys, “BAD SLAM: Bundle adjusted direct RGB-D SLAM,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2019

  69. [69]

    Learning feature descriptors using camera pose supervision,

    Q. Wang, X. Zhou, B. Hariharan, and N. Snavely, “Learning feature descriptors using camera pose supervision,” in European Conference on Computer Vision. Springer, 2020, pp. 757–774

  70. [70]

    Aspanformer: Detector-free image matching with adaptive span transformer,

    H. Chen, Z. Luo, L. Zhou, Y . Tian, M. Zhen, T. Fang, D. McKinnon, Y . Tsin, and L. Quan, “Aspanformer: Detector-free image matching with adaptive span transformer,” in Computer Vision–ECCV 2022: 17th Eu- ropean Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Springer, 2022, pp. 20–36

  71. [71]

    DKM: Dense kernelized feature matching for geometry estimation,

    J. Edstedt, I. Athanasiadis, M. Wadenb ¨ack, and M. Felsberg, “DKM: Dense kernelized feature matching for geometry estimation,” in IEEE Conference on Computer Vision and Pattern Recognition , 2023

  72. [72]

    Self-supervised ge- ometric perception,

    H. Yang, W. Dong, L. Carlone, and V . Koltun, “Self-supervised ge- ometric perception,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021, pp. 14 350–14 361

  73. [73]

    Scalability in perception for autonomous driving: Waymo open dataset,

    P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/C...

  74. [74]

    VERTIGO: Versatile Extensions for Robust Inference using Graphical Odometry,

    OpenSLAM-org, “VERTIGO: Versatile Extensions for Robust Inference using Graphical Odometry,” https://openslam-org.github.io/vertigo.html, accessed: 2025-04-11