pith. sign in

arxiv: 2606.19067 · v1 · pith:F2VXH25Hnew · submitted 2026-06-17 · 💻 cs.RO · cs.CV

Sensor Configuration Matters: A Systematic Evaluation of Multimodal SLAM on Quadruped Robots

Pith reviewed 2026-06-26 20:56 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords SLAM evaluationquadruped robotssensor configurationvisual-inertial odometrylegged locomotionstereo visionglobal shutterlocalization robustness
0
0 comments X

The pith

Sensor choices in cameras and inertial units shape SLAM resilience on quadruped robots under aggressive motion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates multiple state-of-the-art SLAM methods on data recorded from an ANYmal D quadruped to isolate the effects of different sensor setups. It shows that stereo camera configurations deliver better accuracy and fewer failures than monocular or RGB-D options, global-shutter cameras reduce motion blur problems compared with rolling-shutter ones, and adding standard inertial measurements can lower performance in vision-heavy pipelines. These results matter because quadrupeds produce foot impacts, vibrations, and fast rotations that standard perception systems were not designed to handle. If the patterns hold, robot builders gain concrete rules for picking hardware that improves navigation reliability rather than relying only on software changes.

Core claim

Through controlled tests on the GrandTour dataset, stereo configurations consistently outperform monocular and RGB-D modalities, global shutter cameras reduce motion-induced tracking failures relative to rolling shutter cameras, and standard inertial integration can degrade the performance of primarily vision-based frameworks under the harsh dynamics of legged locomotion.

What carries the argument

Systematic comparison that isolates camera modality, shutter type, and inertial sensor tier while measuring localization accuracy, robustness, and compute use on the GrandTour quadruped recordings.

If this is right

  • Stereo camera pairs should be selected over monocular or RGB-D for higher tracking success rates during foot impacts and rotations.
  • Global shutter cameras reduce the frequency of motion-induced tracking losses compared with rolling shutter alternatives.
  • Vision-dominant SLAM pipelines can achieve higher resilience when standard inertial measurements are omitted under legged conditions.
  • Sensor payload design can be guided by explicit trade-offs in accuracy, robustness, and resource use rather than defaulting to full multimodal stacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same isolation approach could be applied to other dynamic platforms such as bipeds or aerial vehicles to check whether the same hardware rankings appear.
  • Algorithm developers might add explicit models of shutter timing or vibration spectra to compensate for the sensor weaknesses identified here.
  • Procurement decisions for field robots could shift priority toward camera hardware specifications before adding extra sensor layers.

Load-bearing premise

The GrandTour recordings and the selected SLAM algorithms are representative of typical legged locomotion challenges and do not favor particular sensor effects.

What would settle it

A repeat of the same evaluation on a new quadruped dataset or with additional SLAM methods that finds no consistent performance gaps across stereo versus monocular, global versus rolling shutter, or vision-only versus vision-inertial setups.

Figures

Figures reproduced from arXiv: 2606.19067 by Abhinav Valada, Arne Roennau, Fabian Schmidt, Markus Enzweiler, Nils Seibert, Roberto Corlito.

Figure 1
Figure 1. Figure 1: Example images from the selected missions reflecting diverse environments. From left to right: M10, M13, M19, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ORB feature matching during high-velocity rotation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean ATE for top-performing configurations. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Estimated trajectories for mission M24 illustrating that global shutter prevents terminal tracking failures during rapid turning maneuvers and reduces drift across all evalu￾ated SLAM frameworks. merical instabilities or state propagation divergence triggered by the high-frequency mechanical shocks of quadrupedal locomotion. Conversely, for the filter-based FAST-LIVO2 framework, utilizing the high-end Hone… view at source ↗
read the original abstract

Autonomous navigation of quadrupedal robots in diverse environments fundamentally relies on resilient Simultaneous Localization and Mapping (SLAM). While visual-inertial SLAM has matured across wheeled, handheld, and aerial platforms, a critical evaluation gap remains regarding how hardware-level sensor configurations affect performance under the aggressive dynamics of legged locomotion. Quadrupeds introduce distinct embodiment-induced sensory challenges, including foot-impact shocks, high-frequency mechanical vibrations, and rapid angular rotations, which degrade standard perception pipelines. To address this gap, we present a systematic evaluation of state-of-the-art visual, visual-inertial, and LiDAR-visual-inertial SLAM methods using the GrandTour dataset recorded on an ANYmal D quadruped. We isolate and quantify the impacts of camera modalities, shutter techniques, and inertial sensor tiers, analyzing their trade-offs across localization accuracy, algorithmic robustness, and computational resource utilization. Our empirical findings demonstrate that hardware selection has substantial influence on system resilience: stereo configurations consistently outperform monocular and RGB-D modalities, global shutter cameras significantly mitigate motion-induced tracking failures compared to rolling shutter cameras, and, crucially, standard inertial integration can degrade the performance of primarily vision-based frameworks under harsh legged locomotion. These insights additionally offer concrete design guidelines for tailoring custom sensor payloads to achieve dependable perception on agile legged systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper conducts a systematic empirical evaluation of state-of-the-art visual, visual-inertial, and LiDAR-visual-inertial SLAM algorithms on the GrandTour dataset collected with an ANYmal D quadruped. It isolates effects of camera modality (mono/stereo/RGB-D), shutter type (global/rolling), and inertial integration tier, reporting impacts on localization accuracy, robustness to tracking failures, and compute usage. The central empirical claims are that stereo configurations outperform monocular and RGB-D, global-shutter cameras reduce motion-induced failures relative to rolling-shutter, and standard IMU fusion can degrade primarily vision-based pipelines under the shock and vibration of legged locomotion; these observations are used to derive sensor-payload design guidelines.

Significance. If the quantified differences hold under the stated conditions, the work supplies actionable hardware-selection evidence for perception stacks on dynamic legged platforms, an area where most prior SLAM benchmarks have used wheeled or handheld data. The systematic isolation of sensor parameters across multiple SOTA methods is a clear strength of the benchmarking design.

major comments (1)
  1. [Abstract / Evaluation setup] Abstract / Evaluation setup: the central claim that 'hardware selection has substantial influence on system resilience' and that 'standard inertial integration can degrade' vision-based performance rests on the representativeness of the GrandTour dataset and the chosen SOTA method implementations. No cross-dataset validation, motion-profile statistics (shock spectra, rotation rates), or explicit checks for post-hoc sequence/method selection are described; if the dataset motion statistics or implementation details systematically favor stereo/global-shutter or penalize IMU fusion for reasons unrelated to the claimed hardware effects, the reported resilience gaps do not generalize to typical legged locomotion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our systematic evaluation. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract / Evaluation setup] Abstract / Evaluation setup: the central claim that 'hardware selection has substantial influence on system resilience' and that 'standard inertial integration can degrade' vision-based performance rests on the representativeness of the GrandTour dataset and the chosen SOTA method implementations. No cross-dataset validation, motion-profile statistics (shock spectra, rotation rates), or explicit checks for post-hoc sequence/method selection are described; if the dataset motion statistics or implementation details systematically favor stereo/global-shutter or penalize IMU fusion for reasons unrelated to the claimed hardware effects, the reported resilience gaps do not generalize to typical legged locomotion.

    Authors: We appreciate the referee's emphasis on generalizability. The GrandTour dataset was collected on an ANYmal D quadruped traversing varied indoor/outdoor terrains to reflect typical legged dynamics including shocks and vibrations. We will add explicit motion-profile statistics (e.g., shock spectra, rotation-rate histograms) and a clearer description of sequence selection criteria in the revision to demonstrate that sequences were not post-hoc filtered to favor particular modalities. While cross-dataset validation lies outside the scope of this single-platform study, the trends hold consistently across multiple independent SOTA implementations, which mitigates concerns about implementation-specific bias. These changes will be incorporated. revision: partial

Circularity Check

0 steps flagged

No circularity: pure empirical benchmarking of existing SLAM methods

full rationale

The paper conducts a systematic empirical evaluation of off-the-shelf SLAM algorithms on the GrandTour dataset recorded on an ANYmal D quadruped. No derivations, fitted parameters, predictions, or uniqueness theorems appear; all claims rest on direct performance measurements across sensor modalities. The central findings (stereo outperforming monocular/RGB-D, global shutter benefits, IMU degradation under legged motion) are obtained by running published methods on fixed data and are externally falsifiable by replication, with no self-citation chains or self-definitional reductions. Dataset representativeness is an assumption about coverage, not a circularity in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the paper performs empirical comparison of existing SLAM pipelines rather than introducing new models or derivations.

pith-pipeline@v0.9.1-grok · 5779 in / 1176 out tokens · 28632 ms · 2026-06-26T20:56:29.410807+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references

  1. [1]

    Anymal-a highly mobile and dynamic quadrupedal robot,

    M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloeschet al., “Anymal-a highly mobile and dynamic quadrupedal robot,” in2016 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2016, pp. 38–44

  2. [2]

    Cerberus in the darpa subterranean challenge,

    M. Tranzatto, T. Miki, M. Dharmadhikari, L. Bernreiter, M. Kulkarni, F. Mascarich, O. Andersson, S. Khattak, M. Hutter, R. Siegwartet al., “Cerberus in the darpa subterranean challenge,”Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

  3. [3]

    Learning robust perceptive locomotion for quadrupedal robots in the wild,

    T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,”Science robotics, vol. 7, no. 62, p. eabk2822, 2022

  4. [4]

    Fast traversability estimation for wild visual navigation,

    J. Frey, M. Mattamala, N. Chebrolu, C. Cadena, M. Fallon, and M. Hutter, “Fast traversability estimation for wild visual navigation,” inRobotics: Science and Systems, vol. 19. Robotics: Science and Systems, 2023

  5. [5]

    Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,

    C. Campos, R. Elvira, J. J. G. Rodr ´ıguez, J. M. Montiel, and J. D. Tard ´os, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,”IEEE transactions on robotics, vol. 37, no. 6, pp. 1874–1890, 2021

  6. [6]

    Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation,

    M. Labb ´e and F. Michaud, “Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation,”Journal of field robotics, vol. 36, no. 2, pp. 416–446, 2019

  7. [7]

    Vilens: Visual, inertial, lidar, and leg odometry for all-terrain legged robots,

    D. Wisth, M. Camurri, and M. Fallon, “Vilens: Visual, inertial, lidar, and leg odometry for all-terrain legged robots,”IEEE Transactions on Robotics, vol. 39, no. 1, pp. 309–326, 2022

  8. [8]

    M3ed: Multi-robot, multi-sensor, multi-environment event dataset,

    K. Chaney, F. Cladera, Z. Wang, A. Bisulco, M. A. Hsieh, C. Korpela, V . Kumar, C. J. Taylor, and K. Daniilidis, “M3ed: Multi-robot, multi-sensor, multi-environment event dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 4016–4023

  9. [9]

    Envodat: A large-scale multisen- sory dataset for robotic spatial awareness and semantic reasoning in heterogeneous environments,

    L. Nwankwo, B. Ellensohn, V . Dave, P. Hofer, J. Forstner, M. Vill- neuve, R. Galler, and E. Rueckert, “Envodat: A large-scale multisen- sory dataset for robotic spatial awareness and semantic reasoning in heterogeneous environments,”arXiv preprint arXiv:2410.22200, 2024

  10. [10]

    Tail: A terrain-aware multi-modal slam dataset for robot locomotion in deformable granular environments,

    C. Yao, Y . Ge, G. Shi, Z. Wang, N. Yang, Z. Zhu, H. Wei, Y . Zhao, J. Wu, and Z. Jia, “Tail: A terrain-aware multi-modal slam dataset for robot locomotion in deformable granular environments,”IEEE Robotics and Automation Letters, vol. 9, no. 7, pp. 6696–6703, 2024

  11. [11]

    Fusionportable: A multi-sensor campus- scene dataset for evaluation of localization and mapping accuracy on diverse platforms,

    J. Jiao, H. Wei, T. Hu, X. Hu, Y . Zhu, Z. He, J. Wu, J. Yu, X. Xie, H. Huanget al., “Fusionportable: A multi-sensor campus- scene dataset for evaluation of localization and mapping accuracy on diverse platforms,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 3851–3856

  12. [12]

    Grandtour: A legged robotics dataset in the wild for multi-modal perception and state estimation,

    J. Frey, T. Tuna, F. Fu, K. Patterson, T. Xu, M. Fallon, C. Ca- dena, and M. Hutter, “Grandtour: A legged robotics dataset in the wild for multi-modal perception and state estimation,”arXiv preprint arXiv:2602.18164, 2026

  13. [13]

    Boxi: Design decisions in the context of algorithmic performance for robotics,

    J. Frey, T. Tuna, L. F. T. Fu, C. Weibel, K. Patterson, B. Krummen- acher, M. M¨uller, J. Nubert, M. Fallon, C. Cadenaet al., “Boxi: Design decisions in the context of algorithmic performance for robotics,” in Robotics: Science and Systems Conference (RSS 2025), 2025

  14. [14]

    The euroc micro aerial vehicle datasets,

    M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial vehicle datasets,” The International Journal of Robotics Research, vol. 35, no. 10, pp. 1157–1163, 2016

  15. [15]

    Are we ready for autonomous driving? the kitti vision benchmark suite,

    A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354– 3361

  16. [16]

    A benchmark for the evaluation of rgb-d slam systems,

    J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 573–580

  17. [17]

    Tartanair: A dataset to push the limits of visual slam,

    W. Wang, D. Zhu, X. Wang, Y . Hu, Y . Qiu, C. Wang, Y . Hu, A. Kapoor, and S. Scherer, “Tartanair: A dataset to push the limits of visual slam,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 4909–4916

  18. [18]

    The tum vi benchmark for evaluating visual-inertial odometry,

    D. Schubert, T. Goll, N. Demmel, V . Usenko, J. St ¨uckler, and D. Cre- mers, “The tum vi benchmark for evaluating visual-inertial odometry,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1680–1687

  19. [19]

    Investigating the impact of loop closing on visual slam localization accuracy in agricultural applications,

    F. Schmidt, F. Holzm ¨uller, M. Kaiser, C. Blessing, and M. Enzweiler, “Investigating the impact of loop closing on visual slam localization accuracy in agricultural applications,”Advances in Signal Processing and Artificial Intelligence, vol. 50, no. 162, p. 152, 2024

  20. [20]

    Visual- inertial slam for unstructured outdoor environments: Benchmarking the benefits and computational costs of loop closing,

    F. Schmidt, C. Blessing, M. Enzweiler, and A. Valada, “Visual- inertial slam for unstructured outdoor environments: Benchmarking the benefits and computational costs of loop closing,”Journal of Field Robotics, vol. 42, no. 7, pp. 3726–3747, 2025

  21. [21]

    Rover: A multi-season dataset for visual slam,

    F. Schmidt, J. Daubermann, M. Mitschke, C. Blessing, S. Meyer, M. Enzweiler, and A. Valada, “Rover: A multi-season dataset for visual slam,”IEEE Transactions on Robotics, 2025

  22. [22]

    Nerf and gaussian splatting slam in the wild,

    F. Schmidt, M. Enzweiler, and A. Valada, “Nerf and gaussian splatting slam in the wild,”arXiv preprint arXiv:2412.03263, 2024

  23. [23]

    Subt-mrs dataset: Push- ing slam towards all-weather environments,

    S. Zhao, Y . Gao, T. Wu, D. Singh, R. Jiang, H. Sun, M. Sarawata, Y . Qiu, W. Whittaker, I. Higginset al., “Subt-mrs dataset: Push- ing slam towards all-weather environments,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 647–22 657

  24. [24]

    Fusionportablev2: A unified multi-sensor dataset for generalized slam across diverse platforms and scalable environments,

    H. Wei, J. Jiao, X. Hu, J. Yu, X. Xie, J. Wu, Y . Zhu, Y . Liu, L. Wang, and M. Liu, “Fusionportablev2: A unified multi-sensor dataset for generalized slam across diverse platforms and scalable environments,” arXiv preprint arXiv:2404.08563, 2024

  25. [25]

    Fast-livo2: Fast, direct lidar–inertial–visual odometry,

    C. Zheng, W. Xu, Z. Zou, T. Hua, C. Yuan, D. He, B. Zhou, Z. Liu, J. Lin, F. Zhuet al., “Fast-livo2: Fast, direct lidar–inertial–visual odometry,”IEEE Transactions on Robotics, vol. 41, pp. 326–346, 2024

  26. [26]

    Deep patch visual slam,

    L. Lipson, Z. Teed, and J. Deng, “Deep patch visual slam,” in European Conference on Computer Vision. Springer, 2024, pp. 424– 440

  27. [27]

    Bags of binary words for fast place recognition in image sequences,

    D. G ´alvez-L´opez and J. D. Tard´os, “Bags of binary words for fast place recognition in image sequences,”IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188–1197, 2012

  28. [28]

    Deep patch visual odometry,

    Z. Teed, L. Lipson, and J. Deng, “Deep patch visual odometry,” Advances in Neural Information Processing Systems, vol. 36, pp. 39 033–39 051, 2023

  29. [29]

    evo: Python package for the evaluation of odometry and slam

    M. Grupp, “evo: Python package for the evaluation of odometry and slam.” https://github.com/MichaelGrupp/evo, 2017

  30. [30]

    A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry,

    Z. Zhang and D. Scaramuzza, “A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry,” in2018 IEEE/RSJ interna- tional conference on intelligent robots and systems (IROS). IEEE, 2018, pp. 7244–7251

  31. [31]

    Least-squares estimation of transformation parameters between two point patterns,

    S. Umeyama, “Least-squares estimation of transformation parameters between two point patterns,”IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 13, no. 04, pp. 376–380, 1991