Efficient Feature-Free Initialization for Monocular Visual-Inertial Systems Using a Feed-Forward 3D Model
Pith reviewed 2026-05-20 12:58 UTC · model grok-4.3
The pith
A feed-forward 3D model lets monocular visual-inertial systems initialize without tracking any visual features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a feature-free initialization procedure, built on up-to-scale point clouds from a single-image feed-forward 3D model and aligned with inertial measurements, can estimate the initial metric scale, velocity, and gravity vector more reliably and with far less data than methods that depend on visual feature tracking and correspondence.
What carries the argument
Feed-forward 3D model that outputs up-to-scale point clouds from individual images, which are then registered to short IMU sequences to solve the joint estimation of scale, velocity, and gravity direction.
If this is right
- Initialization succeeds in more than 90 percent of trials on standard benchmarks.
- Required sensor duration drops to typically less than 1.2 seconds.
- Performance holds across indoor and outdoor scenes, including those with visual degradation.
- System design simplifies by dropping all visual feature extraction and matching steps.
Where Pith is reading between the lines
- The same point-cloud predictions could support quick metric recovery in other monocular robotic tasks that currently rely on structure-from-motion bootstrapping.
- Because the method tolerates short data windows, it may allow repeated re-initialization during long missions when tracking is lost.
- Combining the predicted clouds with additional depth priors from the same model family could further tighten the scale estimate without extra sensors.
Load-bearing premise
The geometric structure in the predicted point clouds remains accurate enough, despite unknown absolute scale, for inertial fusion to recover reliable initial state estimates.
What would settle it
A dataset sequence where the 3D model produces point clouds whose relative geometry deviates substantially from ground truth, causing the fused initialization to produce scale or gravity errors larger than those of feature-based baselines.
Figures
read the original abstract
Fast and reliable initialization is critical for monocular visual-inertial navigation systems (VINS), as it establishes the starting conditions for subsequent state estimation. Despite steady progress, most existing methods heavily rely on visual feature correspondences and require 3-4 seconds of sensory data for successful initialization, which limits their applicability and efficiency. With the advent of feed-forward 3D models that can directly predict point clouds from images, we revisit the visual-inertial initialization problem from a concise perspective. In this work, we propose a feature-free initialization framework that leverages up-to-scale point clouds predicted by a feed-forward 3D model, thereby obviating the need for visual feature tracking and estimation. This design substantially reduces system complexity and improves the reliability of initialization. Experiments on public datasets demonstrate that the proposed feature-free initialization method achieves the highest success rate, exceeding 90%, and significantly reduces the data duration required for successful initialization, typically to under 1.2 s. We further validate our method on a self-collected dataset covering various indoor and outdoor scenarios, demonstrating robust performance, particularly in visually degraded environments where existing methods often fail. The code and dataset are available at https://github.com/Yuantai-Z/FF-VIO-Init.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a feature-free initialization method for monocular visual-inertial navigation systems (VINS) that replaces visual feature tracking with up-to-scale point clouds predicted by a pre-trained feed-forward 3D model. The approach jointly estimates initial scale, velocity, and gravity direction from short IMU sequences fused with these point clouds. Experiments on public datasets report success rates exceeding 90% with typical initialization times under 1.2 s, and additional validation on a self-collected dataset shows robustness in visually degraded indoor and outdoor scenes.
Significance. If the performance claims hold under closer scrutiny, the method could meaningfully simplify VINS pipelines by eliminating feature correspondence requirements and shortening the data window needed for reliable initialization. The shift to an external feed-forward 3D model is a notable departure from conventional feature-based or optimization-heavy initialization strategies and may prove useful in real-time or resource-constrained settings.
major comments (2)
- [Experiments] The central performance claims (success rate >90 %, initialization <1.2 s, robustness in degraded scenes) rest on the assumption that the feed-forward model's up-to-scale point clouds supply sufficient metric geometry for joint scale-velocity-gravity recovery. However, the experimental section provides only aggregate success rates without ablation studies that replace the predicted depths with ground-truth depths or report per-sequence depth-error statistics; this omission leaves the contribution of point-cloud accuracy unisolated and the headline claims only partially supported.
- [Method] The manuscript does not detail how scale ambiguity is resolved when fusing the up-to-scale point clouds with inertial measurements, nor does it quantify error propagation from depth prediction noise into the optimization; given the deliberately feature-free design, any systematic bias in the 3D model directly affects the recovered metric quantities and should be analyzed explicitly.
minor comments (2)
- [Abstract and Experiments] The abstract and experimental results mention quantitative comparisons but do not list the exact baseline methods, their reported success rates, or the precise success criteria used; adding a table with these values would improve clarity.
- [Experiments] No error bars or statistical significance tests accompany the reported success rates and timing figures; including these would strengthen the presentation of the quantitative results.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important opportunities to strengthen the experimental support and clarify the methodological details. We address each major comment below and will revise the manuscript accordingly to improve rigor and transparency.
read point-by-point responses
-
Referee: [Experiments] The central performance claims (success rate >90 %, initialization <1.2 s, robustness in degraded scenes) rest on the assumption that the feed-forward model's up-to-scale point clouds supply sufficient metric geometry for joint scale-velocity-gravity recovery. However, the experimental section provides only aggregate success rates without ablation studies that replace the predicted depths with ground-truth depths or report per-sequence depth-error statistics; this omission leaves the contribution of point-cloud accuracy unisolated and the headline claims only partially supported.
Authors: We agree that isolating the contribution of the predicted point-cloud accuracy would strengthen the claims. In the revised manuscript we will add ablation experiments on the public datasets (where ground-truth depths are available) that directly compare initialization performance using the feed-forward predictions versus ground-truth depths. We will also report per-sequence depth-prediction error statistics together with their correlation to initialization success and failure cases. revision: yes
-
Referee: [Method] The manuscript does not detail how scale ambiguity is resolved when fusing the up-to-scale point clouds with inertial measurements, nor does it quantify error propagation from depth prediction noise into the optimization; given the deliberately feature-free design, any systematic bias in the 3D model directly affects the recovered metric quantities and should be analyzed explicitly.
Authors: We acknowledge that the current description of scale recovery and noise propagation is insufficiently detailed. The scale factor is recovered jointly with velocity and gravity direction inside a single least-squares optimization that aligns the up-to-scale point clouds with IMU-predicted motion over the short initialization window. In the revision we will expand Section 3 with the complete optimization objective, the explicit scale parameterization, and a dedicated sensitivity analysis (including both analytic propagation bounds and empirical results) that quantifies how depth-prediction noise affects the recovered metric quantities. revision: yes
Circularity Check
No significant circularity; method relies on external pre-trained model and empirical validation
full rationale
The paper presents a practical initialization framework that feeds up-to-scale point clouds from an external feed-forward 3D model into an IMU-fusion optimizer for scale, velocity and gravity. All performance claims (>90 % success, <1.2 s duration, robustness in degraded scenes) are supported by direct experimental results on public benchmarks and a self-collected dataset rather than by any internal derivation, fitted parameter, or self-citation chain. No equation or algorithmic step reduces to a quantity defined by the same step; the 3D model itself is treated as a black-box input whose accuracy is an independent assumption, not a quantity derived inside the paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard VINS assumptions of rigid body motion, known camera intrinsics, and inertial sensor bias models hold during the short initialization window.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a feature-free initialization framework that leverages up-to-scale point clouds predicted by a feed-forward 3D model... closed-form linear system ¯A(·)x=¯b(·) ... state vector contains only scale, initial velocity, and gravity: x=[s I0v⊤I0 I0g⊤]⊤
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experiments on public datasets demonstrate that the proposed feature-free initialization method achieves the highest success rate, exceeding 90%, and significantly reduces the data duration required for successful initialization, typically to under 1.2 s.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A Multi-State Constraint Kalman Filter for Vision-aided Inertial Nav- igation,
A. I. Mourikis and S. I. Roumeliotis, “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Nav- igation,” inIEEE International Conference on Robotics and Automation (ICRA), 2007, pp. 3565–3572
work page 2007
-
[2]
Keyframe-Based Visual-Inertial Odometry Using Nonlinear Optimization,
S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, “Keyframe-Based Visual-Inertial Odometry Using Nonlinear Optimization,”The International Jour- nal of Robotics Research, vol. 34, no. 3, pp. 314–334, 2015
work page 2015
-
[3]
OpenVINS: A Research Platform for Visual-Inertial Es- timation,
P. Geneva, K. Eckenhoff, W. Lee, Y . Yang, and G. Huang, “OpenVINS: A Research Platform for Visual-Inertial Es- timation,” inIEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 4666–4672
work page 2020
-
[4]
LIC- Fusion: LiDAR-Inertial-Camera Odometry,
X. Zuo, P. Geneva, W. Lee, Y . Liu, and G. Huang, “LIC- Fusion: LiDAR-Inertial-Camera Odometry,” inIEEE/RSJ International Conference on Intelligent Robots and Sys- tems (IROS), 2019, pp. 5848–5854
work page 2019
-
[5]
Y . Zhang, F. Zhu, Q. Cai, J. Lv, Z. Xu, X. Chen, and X. Zhang, “Towards More Precise and Robust Position- ing in Urban Environments Through an Enhanced FGO- Based GNSS RTK Framework,”IEEE Transactions on Intelligent Vehicles, vol. 9, pp. 7603–7616, 2024
work page 2024
-
[6]
Estimating Body and Hand Motion in an Ego-sensed World,
B. Yi, V . Ye, M. Zheng, Y . Li, L. M ¨uller, G. Pavlakos, Y . Ma, J. Malik, and A. Kanazawa, “Estimating Body and Hand Motion in an Ego-sensed World,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 7072–7084
work page 2025
-
[7]
C. Kong, J. Fort, A. Kanget al., “Aria Gen 2 Pilot Dataset,” arXiv preprint arXiv:2510.16134, 2025
-
[8]
Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments,
L. Schmid, M. Abate, Y . Chang, and L. Carlone, “Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments,” in Robotics: Science and Systems (RSS), 2024
work page 2024
-
[9]
C. Kassab, M. Mattamala, L. Zhang, and M. Fallon, “Language-EXtended Indoor SLAM (LEXIS): A Versa- tile System for Real-time Visual Scene Understanding,” inIEEE International Conference on Robotics and Au- tomation (ICRA), 2024, pp. 15 988–15 994
work page 2024
-
[10]
GNSS/Multisensor Fusion Using Continuous-Time Fac- tor Graph Optimization for Robust Localization,
H. Zhang, C.-C. Chen, H. Vallery, and T. D. Barfoot, “GNSS/Multisensor Fusion Using Continuous-Time Fac- tor Graph Optimization for Robust Localization,”IEEE Transactions on Robotics, vol. 40, pp. 4003–4023, 2024
work page 2024
-
[11]
Z. Xu, F. Zhu, Z. Zhang, C. Jian, J. Lv, Y . Zhang, and X. Zhang, “PO-GVINS: A Tightly Coupled GNSS- Visual-Inertial Navigation Framework Using Pose-Only Representation,”IEEE Robotics and Automation Letters, vol. 10, pp. 10 830–10 837, 2025
work page 2025
-
[12]
Estimator Initializa- tion in Vision-Aided Inertial Navigation with Unknown Camera-IMU Calibration,
T.-C. Dong-Si and A. I. Mourikis, “Estimator Initializa- tion in Vision-Aided Inertial Navigation with Unknown Camera-IMU Calibration,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012, pp. 1064–1071
work page 2012
-
[13]
StructVIO: Visual-Inertial Odometry With Structural Regularity of Man-Made Environments,
D. Zou, Y . Wu, L. Pei, H. Ling, and W. Yu, “StructVIO: Visual-Inertial Odometry With Structural Regularity of Man-Made Environments,”IEEE Transactions on Robotics, vol. 35, pp. 999–1013, 2019
work page 2019
-
[14]
ORB-SLAM3: An Ac- curate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,
C. Campos, R. Elvira, J. J. G. Rodriguez, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM3: An Ac- curate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,”IEEE Transactions on Robotics, vol. 37, pp. 1874–1890, 2021
work page 2021
-
[15]
DUSt3R: Geometric 3D Vision Made Easy,
S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Re- vaud, “DUSt3R: Geometric 3D Vision Made Easy,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 20 697–20 709
work page 2024
-
[16]
VGGT: Visual Geometry Grounded Transformer,
J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “VGGT: Visual Geometry Grounded Transformer,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 5294– 5306
work page 2025
-
[17]
Continuous 3D Perception Model with Persistent State,
Q. Wang, Y . Zhang, A. Holynski, A. A. Efros, and A. Kanazawa, “Continuous 3D Perception Model with Persistent State,” inIEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2025, pp. 10 510–10 522
work page 2025
-
[18]
VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator,
T. Qin, P. Li, and S. Shen, “VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004– 1020, 2018
work page 2018
-
[19]
Visual-Inertial Monoc- ular SLAM With Map Reuse,
R. Mur-Artal and J. D. Tard ´os, “Visual-Inertial Monoc- ular SLAM With Map Reuse,”IEEE Robotics and Au- tomation Letters, vol. 2, pp. 796–803, 2017
work page 2017
-
[20]
Closed-Form Solution of Visual-Inertial Structure from Motion,
A. Martinelli, “Closed-Form Solution of Visual-Inertial Structure from Motion,”International Journal of Com- puter Vision, vol. 106, no. 2, pp. 138–152, 2014
work page 2014
-
[21]
J. Kaiser, A. Martinelli, F. Fontana, and D. Scaramuzza, “Simultaneous State Initialization and Gyroscope Bias Calibration in Visual Inertial Aided Navigation,”IEEE Robotics and Automation Letters, vol. 2, pp. 18–25, 2017
work page 2017
-
[22]
Fast and Robust Initialization for Visual-Inertial SLAM,
C. Campos, J. M. Montiel, and J. D. Tard ´os, “Fast and Robust Initialization for Visual-Inertial SLAM,” inIEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 1288–1294
work page 2019
-
[23]
A Rotation- Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization,
Y . He, B. Xu, Z. Ouyang, and H. Li, “A Rotation- Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 739–748
work page 2023
-
[24]
sqrtvins: Ro- bust and Ultrafast Square-Root Filter-Based 3D Motion Tracking,
Y . Peng, C. Chen, K. Wu, and G. Huang, “sqrtvins: Ro- bust and Ultrafast Square-Root Filter-Based 3D Motion Tracking,”IEEE Transactions on Robotics, vol. 41, pp. 6570–6589, 2025
work page 2025
-
[25]
Structure-from- Motion Revisited,
J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from- Motion Revisited,” inIEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2016, pp. 4104–4113
work page 2016
-
[26]
Grounding Image Matching in 3D with MASt3R,
V . Leroy, Y . Cabon, and J. Revaud, “Grounding Image Matching in 3D with MASt3R,” inEuropean Conference on Computer Vision (ECCV), 2024, pp. 71–91
work page 2024
-
[27]
TTT3R: 3D Reconstruction as Test-Time Training
X. Chen, Y . Chen, Y . Xiu, A. Geiger, and A. Chen, “TTT3R: 3D Reconstruction as Test-Time Training,” arXiv preprint arXiv:2509.26645, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
3D Reconstruction with Spatial Memory
H. Wang and L. Agapito, “3D Reconstruction with Spa- tial Memory,” arXiv preprint arXiv:2408.16061, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
MASt3R- SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors,
R. Murai, E. Dexheimer, and A. J. Davison, “MASt3R- SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 16 695– 16 705
work page 2025
-
[30]
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
D. Maggio, H. Lim, and L. Carlone, “VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold,” arXiv preprint arXiv:2505.12549, 2025
work page internal anchor Pith review arXiv 2025
-
[31]
M. Hu, W. Yin, C. Zhang, Z. Cai, X. Long, H. Chen, K. Wang, G. Yu, C. Shen, and S. Shen, “Metric3D V2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Es- timation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10 579–10 596, 2024
work page 2024
-
[32]
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
N. Keetha, N. M ¨uller, J. Sch ¨onberger, L. Porzi, Y . Zhang, T. Fischer, A. Knapitsch, D. Zauss, E. We- ber, N. Antunes, J. Luiten, M. Lopez-Antequera, S. R. Bul `o, C. Richardt, D. Ramanan, S. Scherer, and P. Kontschieder, “MapAnything: Universal Feed- Forward Metric 3D Reconstruction,” arXiv preprint arXiv:2509.13414, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
H. Wang and L. Agapito, “AMB3R: Accurate Feed- Forward Metric-Scale 3D Reconstruction with Backend,” arXiv preprint arXiv:2511.20343, 2025
-
[34]
DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras,
Z. Teed and J. Deng, “DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021, pp. 16 558–16 569
work page 2021
-
[35]
Imperative Learning: A Self-Supervised Neuro-Symbolic Learning Framework for Robot Autonomy,
C. Wang, K. Ji, J. Geng, Z. Ren, T. Fu, F. Yang, Y . Guo, H. He, X. Chen, Z. Zhan, Q. Du, S. Su, B. Li, Y . Qiu, Y . Du, Q. Li, Y . Yang, X. Lin, and Z. Zhao, “Imperative Learning: A Self-Supervised Neuro-Symbolic Learning Framework for Robot Autonomy,”The International Journal of Robotics Research, p. 02783649251353181, 2025
work page 2025
-
[36]
SLAM- Former: Putting SLAM into One Transformer,
Y . Yuan, Z. Chen, K. Li, W. Wang, and H. Zhao, “SLAM- Former: Putting SLAM into One Transformer,” arXiv preprint arXiv:2509.16909, 2025
-
[37]
CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth,
X. Zuo, N. Merrill, W. Li, Y . Liu, M. Pollefeys, and G. Huang, “CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth,” inIEEE Interna- tional Conference on Robotics and Automation (ICRA), 2021, pp. 14 382–14 388
work page 2021
-
[38]
Visual-Inertial SLAM as Sim- ple as A, B, VINS,
N. Merrill and G. Huang, “Visual-Inertial SLAM as Sim- ple as A, B, VINS,” arXiv preprint arXiv:2406.05969, 2024
-
[39]
L. Wang, L. Guo, Z. Xu, Q. Wang, F. Gao, and X. Chen, “LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping,” arXiv preprint arXiv:2511.01186, 2025
-
[40]
Learned Monocular Depth Priors in Visual-Inertial Initialization,
Y . Zhou, A. Kar, E. Turner, A. Kowdle, C. X. Guo, R. C. DuToit, and K. Tsotsos, “Learned Monocular Depth Priors in Visual-Inertial Initialization,” inEuropean Con- ference on Computer Vision (ECCV), 2022, pp. 552–570
work page 2022
-
[41]
Fast Monocular Visual-Inertial Initialization Leveraging Learned Single-View Depth,
N. Merrill, P. Geneva, S. Katragadda, C. Chen, and G. Huang, “Fast Monocular Visual-Inertial Initialization Leveraging Learned Single-View Depth,” inRobotics: Science and Systems (RSS), 2023
work page 2023
-
[42]
Strapdown Inertial Navigation Integration Algorithm Design Part 1: Attitude Algorithms,
P. G. Savage, “Strapdown Inertial Navigation Integration Algorithm Design Part 1: Attitude Algorithms,”Journal of Guidance, Control, and Dynamics, vol. 21, pp. 19–28, 1998
work page 1998
-
[43]
T. Lupton and S. Sukkarieh, “Visual-Inertial-Aided Nav- igation for High-Dynamic Motion in Built Environ- ments Without Initial Conditions,”IEEE Transactions on Robotics, vol. 28, no. 1, pp. 61–76, 2012
work page 2012
-
[44]
IMU Preintegration on Manifold for Efficient Visual- Inertial Maximum-a-Posteriori Estimation,
C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “IMU Preintegration on Manifold for Efficient Visual- Inertial Maximum-a-Posteriori Estimation,” inRobotics: Science and Systems (RSS), 2015
work page 2015
-
[45]
Consistency Analysis and Improvement of Vision-aided Inertial Navigation,
J. A. Hesch, D. G. Kottas, S. L. Bowman, and S. I. Roumeliotis, “Consistency Analysis and Improvement of Vision-aided Inertial Navigation,”IEEE Transactions on Robotics, vol. 30, pp. 158–176, 2014
work page 2014
-
[46]
Inverse Depth Parametrization for Monocular SLAM,
J. Civera, A. J. Davison, and J. M. M. Montiel, “Inverse Depth Parametrization for Monocular SLAM,”IEEE Transactions on Robotics, vol. 24, pp. 932–945, 2008
work page 2008
-
[47]
Learn- ing Single Camera Depth Estimation Using Dual-Pixels,
R. Garg, N. Wadhwa, S. Ansari, and J. T. Barron, “Learn- ing Single Camera Depth Estimation Using Dual-Pixels,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7628–7637
work page 2019
-
[48]
The Eu- RoC Micro Aerial Vehicle Datasets,
M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The Eu- RoC Micro Aerial Vehicle Datasets,”The International Journal of Robotics Research, vol. 35, pp. 1157–1163, 2016
work page 2016
-
[49]
The TUM VI Benchmark for Evaluat- ing Visual-Inertial Odometry,
D. Schubert, T. Goll, N. Demmel, V . Usenko, J. St¨uckler, and D. Cremers, “The TUM VI Benchmark for Evaluat- ing Visual-Inertial Odometry,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1680–1687
work page 2018
-
[50]
A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry,
Z. Zhang and D. Scaramuzza, “A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7244–7251
work page 2018
-
[51]
Depth Anything 3: Recovering the Visual Space from Any Views
H. Lin, S. Chen, J. Liew, D. Y . Chen, Z. Li, G. Shi, J. Feng, and B. Kang, “Depth Anything 3: Recover- ing the Visual Space from Any Views,” arXiv preprint arXiv:2511.10647, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Y . Wang, J. Zhou, H. Zhu, W. Chang, Y . Zhou, Z. Li, J. Chen, J. Pang, C. Shen, and T. He, “π 3: Scal- able Permutation-Equivariant Visual Geometry Learn- ing,” arXiv preprint arXiv:2507.13347, 2025. SUPPLEMENTARYMATERIAL VII. METHODDETAILS In this section, we provide additional algorithmic details of the proposed feature-free method. A. Rank Analysis of...
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.