Recognition: unknown
Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring
Pith reviewed 2026-05-10 17:38 UTC · model grok-4.3
The pith
A lightweight predictor scores each frame's mapping value before expensive geometric decoding, letting monocular SLAM skip most redundant frames.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LeanGate is a lightweight feed-forward network that outputs a geometric utility score for each incoming video frame. The score is computed before any dense feature extraction or matching from the Geometric Foundation Model, so frames judged low-value are dropped without performing the expensive steps. On standard SLAM benchmarks this gating cuts tracking floating-point operations by more than 85 percent, raises end-to-end throughput by a factor of five, and preserves the tracking and mapping accuracy of the original dense system.
What carries the argument
LeanGate, a lightweight feed-forward frame-gating network that predicts a geometric utility score to decide a frame's mapping value before heavy GFM feature extraction and matching.
If this is right
- More than 90 percent of incoming frames can be rejected without ever invoking dense GFM decoding.
- Tracking floating-point operations drop by over 85 percent on standard benchmarks.
- End-to-end throughput rises by a factor of five with no drop in accuracy.
- The gating module works as a plug-and-play addition to existing GFM-based SLAM pipelines.
- Keyframe selection happens before rather than after the costly geometric stage.
Where Pith is reading between the lines
- The same early-rejection idea could apply to other dense transformer pipelines where most inputs add little new information.
- Running the utility predictor on-device could let SLAM operate longer on battery-powered robots without custom accelerators.
- If the predictor is trained jointly with the downstream SLAM loss, its decisions might become even more aligned with final map quality.
- Extending the score to also estimate expected tracking drift could further reduce unnecessary bundle-adjustment steps.
Load-bearing premise
The lightweight network can correctly judge a frame's future mapping usefulness from its raw pixels without running the full expensive geometric decoding first.
What would settle it
Apply LeanGate to a long video sequence containing subtle but critical new geometry in frames the predictor rejects, then measure whether the final map completeness or tracking error falls below the dense baseline.
Figures
read the original abstract
Geometric Foundation Models (GFMs) have recently advanced monocular SLAM by providing robust, calibration-free 3D priors. However, deploying these models on dense video streams introduces significant computational redundancy. Current GFM-based SLAM systems typically rely on post hoc keyframe selection. Because of this, they must perform expensive dense geometric decoding simply to determine whether a frame contains novel geometry, resulting in late rejection and wasted computation. To mitigate this inefficiency, we propose LeanGate, a lightweight feed-forward frame-gating network. LeanGate predicts a geometric utility score to assess a frame's mapping value prior to the heavy GFM feature extraction and matching stages. As a predictive plug-and-play module, our approach bypasses over 90% of redundant frames. Evaluations on standard SLAM benchmarks demonstrate that LeanGate reduces tracking FLOPs by more than 85% and achieves a 5x end-to-end throughput speedup. Furthermore, it maintains the tracking and mapping accuracy of dense baselines. Project page: https://lean-gate.github.io/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LeanGate, a lightweight feed-forward neural network that predicts a geometric utility score for input frames in GFM-based monocular SLAM. This score is used to gate frames before performing expensive dense geometric feature extraction and matching, thereby avoiding redundant computation on frames with low mapping value. The method is presented as a plug-and-play module that can be inserted into existing pipelines; the abstract reports that it bypasses over 90% of redundant frames, reduces tracking FLOPs by more than 85%, delivers a 5x end-to-end throughput increase, and preserves the tracking and mapping accuracy of dense baselines on standard SLAM benchmarks.
Significance. If the reported speedups and accuracy preservation hold under rigorous verification, the work addresses a practical deployment bottleneck in recent GFM-SLAM systems by moving keyframe selection upstream of the heavy transformer stages. The plug-and-play design and substantial reported FLOPs/throughput gains could improve real-time viability on resource-limited hardware without requiring changes to the underlying GFM or SLAM backend. The empirical nature of the contribution (trained predictor rather than analytic derivation) makes the strength of the result dependent on the quality and breadth of the experimental validation.
major comments (2)
- [§4] §4 (Experimental Setup): the abstract claims maintenance of 'tracking and mapping accuracy of dense baselines,' but without reported error bars, number of runs, or per-sequence breakdowns it is impossible to assess whether the observed differences are statistically significant or merely within the variance of the dense baseline itself.
- [§3.2] §3.2 (LeanGate Architecture): the claim that the lightweight predictor accurately forecasts mapping value 'prior to the heavy GFM feature extraction' is load-bearing for the entire speedup argument; the manuscript must demonstrate that the predictor's false-negative rate on high-utility frames does not degrade downstream mapping quality beyond the reported tolerance.
minor comments (3)
- [Abstract] The abstract states 'bypasses over 90% of redundant frames' while the results claim 'more than 85% FLOPs reduction'; these two figures should be reconciled with a precise definition of 'redundant' and an explicit mapping from bypassed frames to measured FLOPs savings.
- [§3] Notation for the utility score (e.g., how it is normalized or thresholded) is introduced without a dedicated equation; adding a short definition in §3 would improve clarity.
- [Abstract] The project page URL is given but no supplementary material link or code repository is mentioned; providing at least a pointer to the trained model weights would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation of minor revision. We address each major comment below with clarifications and commit to targeted revisions that will improve the statistical rigor and validation of our claims.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Setup): the abstract claims maintenance of 'tracking and mapping accuracy of dense baselines,' but without reported error bars, number of runs, or per-sequence breakdowns it is impossible to assess whether the observed differences are statistically significant or merely within the variance of the dense baseline itself.
Authors: We agree that the absence of error bars and run statistics limits the ability to evaluate statistical significance. In the revised manuscript we will expand Section 4 to report error bars computed over five independent runs (different random seeds for both training and evaluation) on the TUM RGB-D and EuRoC sequences. We will also add per-sequence tables for ATE, RPE, and mapping completeness, allowing direct comparison of observed differences against the variance of the dense baseline. These additions will confirm that any small deviations remain within the baseline variance. revision: yes
-
Referee: [§3.2] §3.2 (LeanGate Architecture): the claim that the lightweight predictor accurately forecasts mapping value 'prior to the heavy GFM feature extraction' is load-bearing for the entire speedup argument; the manuscript must demonstrate that the predictor's false-negative rate on high-utility frames does not degrade downstream mapping quality beyond the reported tolerance.
Authors: We acknowledge that an explicit quantification of the false-negative rate is required to fully support the upstream gating argument. In the revised Section 3.2 we will insert a dedicated analysis that measures the false-negative rate on held-out validation sequences and correlates it with downstream mapping quality (reconstruction density and tracking drift). We will further include a threshold-sensitivity ablation showing that the resulting quality degradation stays within the tolerance already demonstrated by the main accuracy tables. These additions will directly substantiate that the predictor preserves mapping fidelity prior to GFM extraction. revision: yes
Circularity Check
No significant circularity; empirical gating validated by benchmarks
full rationale
The paper presents LeanGate as a lightweight feed-forward network trained to predict per-frame geometric utility scores, which are then used to gate expensive GFM processing in monocular SLAM. The central claims (85%+ FLOP reduction, 5x throughput, preserved accuracy) are framed as empirical outcomes measured on standard SLAM benchmarks rather than derived from first-principles equations or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked to force the results; the predictor is an independent learned component whose accuracy is externally falsifiable. The derivation chain therefore remains self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A lightweight neural network can be trained to predict geometric utility of frames for SLAM mapping.
invented entities (1)
-
LeanGate
no independent evidence
Reference graph
Works this paper leans on
-
[1]
C. Cadena, L. Carlone, H. Carrillo, Y . Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,”IEEE T-RO, vol. 32, no. 6, 2017
work page 2017
-
[2]
Pose estimation for augmented reality: a hands-on survey,
E. Marchand, H. Uchiyama, and F. Spindler, “Pose estimation for augmented reality: a hands-on survey,”IEEE TVCG, vol. 22, no. 12, 2015
work page 2015
-
[3]
Semi-dense visual odometry for ar on a smartphone,
T. Schöps, J. Engel, and D. Cremers, “Semi-dense visual odometry for ar on a smartphone,” inISMAR, 2014, pp. 145–150
work page 2014
-
[4]
Parallel tracking and mapping for small ar workspaces,
G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,” inISMAR, 2007, pp. 225–234
work page 2007
-
[5]
Orb-slam: A versatile and accurate monocular slam system,
R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: A versatile and accurate monocular slam system,”IEEE T-RO, vol. 31, no. 5, 2015
work page 2015
-
[6]
Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,
R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,”IEEE T-RO, vol. 33, no. 5, 2017
work page 2017
-
[7]
Multi-view stereo: A tutorial,
Y . Furukawa and C. Hernández, “Multi-view stereo: A tutorial,”FnT CGV, vol. 9, no. 1-2, 2015
work page 2015
-
[8]
Large-scale direct slam with stereo cameras,
J. Engel, J. Stückler, and D. Cremers, “Large-scale direct slam with stereo cameras,” inIROS, 2015, pp. 1935–1942
work page 2015
-
[9]
Pix- elwise view selection for unstructured multi-view stereo,
J. L. Schönberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pix- elwise view selection for unstructured multi-view stereo,” inECCV, 2016, pp. 501–518
work page 2016
-
[10]
Mvsnet: Depth inference for unstructured multi-view stereo,
Y . Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth inference for unstructured multi-view stereo,” inECCV, 2018, pp. 767–783
work page 2018
-
[11]
Dtam: Dense tracking and mapping in real-time,
R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” inICCV, 2011, pp. 2320–2327
work page 2011
-
[12]
Deeptam: Deep tracking and mapping,
H. Zhou, B. Ummenhofer, and T. Brox, “Deeptam: Deep tracking and mapping,” inECCV, 2018, pp. 822–838
work page 2018
-
[13]
Codeslam—learning a compact, optimisable representation for dense visual slam,
M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davison, “Codeslam—learning a compact, optimisable representation for dense visual slam,” inCVPR, 2018, pp. 2560–2568
work page 2018
-
[14]
Deepfactors: Real-time probabilistic dense monocular slam,
J. Czarnowski, T. Laidlow, R. Clark, and A. J. Davison, “Deepfactors: Real-time probabilistic dense monocular slam,”IEEE RA-L, vol. 5, no. 2, 2020
work page 2020
-
[15]
Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,
Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,”NeurIPS, vol. 34, pp. 16 558–16 569, 2021
work page 2021
-
[16]
Dust3r: Geometric 3d vision made easy,
S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud, “Dust3r: Geometric 3d vision made easy,” inCVPR, 2024, pp. 20 697–20 709
work page 2024
-
[17]
Grounding image matching in 3d with mast3r,
V . Leroy, Y . Cabon, and J. Revaud, “Grounding image matching in 3d with mast3r,” inECCV, 2024, pp. 71–91
work page 2024
-
[18]
Vggt: Visual geometry grounded transformer,
J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inCVPR, 2025, pp. 5294–5306
work page 2025
-
[19]
Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion,
B. P. Duisterhof, L. Zust, P. Weinzaepfel, V . Leroy, Y . Cabon, and J. Revaud, “Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion,” in3DV, 2025, pp. 1–10
work page 2025
-
[20]
MASt3R-SLAM: Real- time dense SLAM with 3D reconstruction priors,
R. Murai, E. Dexheimer, and A. J. Davison, “MASt3R-SLAM: Real- time dense SLAM with 3D reconstruction priors,” inCVPR, 2025
work page 2025
-
[21]
Z. Fan, W. Cong, K. Wen, K. Wang, J. Zhang, X. Ding, D. Xu, B. Ivanovic, M. Pavone, G. Pavlakoset al., “Instantsplat: Sparse-view gaussian splatting in seconds,”arXiv:2403.20309, 2024
-
[22]
A benchmark for the evaluation of rgb-d slam systems,
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” inIROS, 2012
work page 2012
-
[23]
Real-time rgb-d camera relocalization,
B. Glocker, S. Izadi, J. Shotton, and A. Criminisi, “Real-time rgb-d camera relocalization,” inISMAR, 2013, pp. 173–179
work page 2013
-
[24]
The euroc micro aerial vehicle datasets,
M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial vehicle datasets,” IJRR, 2016
work page 2016
-
[25]
Vision meets robotics: The kitti dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,”IJRR, vol. 32, no. 11, 2013
work page 2013
-
[26]
Bad slam: Bundle adjusted direct rgb-d slam,
T. Schops, T. Sattler, and M. Pollefeys, “Bad slam: Bundle adjusted direct rgb-d slam,” inCVPR, 2019, pp. 134–144
work page 2019
-
[27]
Introducing slambench, a performance and accuracy benchmarking methodology for slam,
L. Nardi, B. Bodin, M. Z. Zia, J. Mawer, A. Nisbet, P. H. Kelly, A. J. Davison, M. Luján, M. F. O’Boyle, G. Rileyet al., “Introducing slambench, a performance and accuracy benchmarking methodology for slam,” inICRA, 2015, pp. 5783–5790
work page 2015
-
[28]
J. Engel, V . Usenko, and D. Cremers, “A photometrically calibrated benchmark for monocular visual odometry,”arXiv:1607.02555, 2016
-
[29]
Svo: Fast semi-direct monocular visual odometry,
C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” inICRA, 2014, pp. 15–22
work page 2014
-
[30]
Lsd-slam: Large-scale direct monocular slam,
J. Engel, T. Schöps, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” inECCV, 2014, pp. 834–849
work page 2014
-
[31]
Superpoint: Self- supervised interest point detection and description,
D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self- supervised interest point detection and description,” inCVPRW, 2018, pp. 224–236
work page 2018
-
[32]
D2-net: A trainable cnn for joint description and detection of local features,
M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” inCVPR, 2019, pp. 8092–8101
work page 2019
-
[33]
Su- perglue: Learning feature matching with graph neural networks,
P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Su- perglue: Learning feature matching with graph neural networks,” in CVPR, 2020, pp. 4938–4947
work page 2020
-
[34]
Loftr: Detector- free local feature matching with transformers,
J. Sun, Z. Shen, Y . Wang, H. Bao, and X. Zhou, “Loftr: Detector- free local feature matching with transformers,” inCVPR, 2021, pp. 8922–8931
work page 2021
-
[35]
Ba-net: Dense bundle ad- justment network.arXiv preprint arXiv:1806.04807, 2018
C. Tang and P. Tan, “Ba-net: Dense bundle adjustment network,” arXiv:1806.04807, 2018
-
[36]
Raft: Recurrent all-pairs field transforms for optical flow,
Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” inECCV, 2020, pp. 402–419
work page 2020
-
[37]
Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence,
J. Bian, W.-Y . Lin, Y . Matsushita, S.-K. Yeung, T.-D. Nguyen, and M.-M. Cheng, “Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence,” inCVPR, 2017, pp. 4181–4190
work page 2017
-
[38]
Dkm: Dense kernelized feature matching for geometry estimation,
J. Edstedt, I. Athanasiadis, M. Wadenbäck, and M. Felsberg, “Dkm: Dense kernelized feature matching for geometry estimation,” inCVPR, 2023, pp. 17 765–17 775
work page 2023
-
[39]
Pdc-net+: Enhanced probabilistic dense correspondence network,
P. Truong, M. Danelljan, R. Timofte, and L. Van Gool, “Pdc-net+: Enhanced probabilistic dense correspondence network,”IEEE TPAMI, vol. 45, no. 8, 2023
work page 2023
-
[40]
Aslfeat: Learning local features of accurate shape and localization,
Z. Luo, L. Zhou, X. Bai, H. Chen, J. Zhang, Y . Yao, S. Li, T. Fang, and L. Quan, “Aslfeat: Learning local features of accurate shape and localization,” inCVPR, 2020, pp. 6589–6598
work page 2020
-
[41]
Lightglue: Local feature matching at light speed,
P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” inICCV, 2023, pp. 17 627–17 638
work page 2023
-
[42]
Key.net: Keypoint detection by handcrafted and learned cnn filters,
J. Barroso-Laguna, E. Riba, D. Ponsa, and K. Mikolajczyk, “Key.net: Keypoint detection by handcrafted and learned cnn filters,” inICCV, 2019
work page 2019
-
[43]
Segformer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”NeurIPS, vol. 34, pp. 12 077–12 090, 2021
work page 2021
-
[44]
Structure-from-motion revisited,
J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” inCVPR, 2016, pp. 4104–4113
work page 2016
-
[45]
imap: Implicit mapping and positioning in real-time,
E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and positioning in real-time,” inICCV, 2021, pp. 6229–6238
work page 2021
-
[46]
Nice-slam: Neural implicit scalable encoding for slam,
Z. Zhu, S. Peng, V . Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” inCVPR, 2022, pp. 12 786–12 796
work page 2022
-
[47]
Ec3r-slam: Efficient and consistent monocular dense slam with feed-forward 3d reconstruction,
L. Hu, N. A. Oufroukh, F. Bonardi, and R. Ghandour, “Ec3r-slam: Efficient and consistent monocular dense slam with feed-forward 3d reconstruction,” 2025
work page 2025
-
[48]
3d gaussian splatting for real-time radiance field rendering
B. Kerbl, G. Kopanas, T. Leimkühler, G. Drettakiset al., “3d gaussian splatting for real-time radiance field rendering.”ACM TOG, vol. 42, no. 4, 2023
work page 2023
-
[49]
Lamar: Benchmarking localization and mapping for augmented reality,
P.-E. Sarlin, M. Dusmanu, J. L. Schönberger, P. Speciale, L. Gruber, V . Larsson, O. Miksik, and M. Pollefeys, “Lamar: Benchmarking localization and mapping for augmented reality,” inECCV, 2022, pp. 686–704
work page 2022
-
[50]
Map-free visual relocalization: Metric pose relative to a single image,
E. Arnold, J. Wynn, S. Vicente, G. Garcia-Hernando, A. Monszpart, V . Prisacariu, D. Turmukhambetov, and E. Brachmann, “Map-free visual relocalization: Metric pose relative to a single image,” inECCV, 2022, pp. 690–708
work page 2022
-
[51]
Accelerated coordi- nate encoding: Learning to relocalize in minutes using rgb and poses,
E. Brachmann, T. Cavallari, and V . A. Prisacariu, “Accelerated coordi- nate encoding: Learning to relocalize in minutes using rgb and poses,” inCVPR, 2023, pp. 5044–5053
work page 2023
-
[52]
Vggt-slam: Dense rgb slam optimized on the sl (4) manifold,
D. Maggio, H. Lim, and L. Carlone, “Vggt-slam: Dense rgb slam optimized on the sl (4) manifold,”NeurIPS, vol. 39, 2025
work page 2025
-
[53]
VGGT-SLAM 2.0: Real-time Dense Feed- forward Scene Reconstruction,
D. Maggio and L. Carlone, “Vggt-slam 2.0: Real time dense feed- forward scene reconstruction,”arXiv preprint arXiv:2601.19887, 2026
-
[54]
R. Hartley and A. Zisserman,Multiple view geometry in computer vision. Cambridge university press, 2003
work page 2003
-
[55]
Bundle adjustment—a modern synthesis,
B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment—a modern synthesis,” inVision Algorithms, 1999, pp. 298–372
work page 1999
-
[56]
Point-slam: Dense neural point cloud-based slam,
E. Sandström, Y . Li, L. Van Gool, and M. R. Oswald, “Point-slam: Dense neural point cloud-based slam,” inICCV, 2023, pp. 18 433– 18 444
work page 2023
-
[57]
Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,
C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,”IEEE T-RO, vol. 37, no. 6, 2021
work page 2021
-
[58]
L. Lipson, Z. Teed, and J. Deng, “Deep patch visual slam,” inECCV, 2024, pp. 424–440
work page 2024
-
[59]
Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,
S. Zhang, J. Wang, Y . Xu, N. Xue, C. Rupprecht, X. Zhou, Y . Shen, and G. Wetzstein, “Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,” 2025
work page 2025
-
[60]
Scannet++: A high-fidelity dataset of 3d indoor scenes,
C. Yeshwanth, Y .-C. Liu, M. Nießner, and A. Dai, “Scannet++: A high-fidelity dataset of 3d indoor scenes,” inICCV, 2023, pp. 12–22
work page 2023
-
[61]
Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,
S. Zhang, J. Wang, Y . Xu, N. Xue, C. Rupprecht, X. Zhou, Y . Shen, and G. Wetzstein, “Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views,” inCVPR, 2025, pp. 21 936–21 947
work page 2025
-
[62]
MapAnything: Universal feed-forward metric 3D reconstruction,
N. Keetha, N. Müller, J. Schönberger, L. Porzi, Y . Zhang, T. Fischer, A. Knapitsch, D. Zauss, E. Weber, N. Antunes, J. Luiten, M. Lopez- Antequera, S. R. Bulò, C. Richardt, D. Ramanan, S. Scherer, and P. Kontschieder, “MapAnything: Universal feed-forward metric 3D reconstruction,” inInternational Conference on 3D Vision (3DV). IEEE, 2026
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.