X-IONet: Cross-Platform Inertial Odometry Network for Pedestrian and Legged Robot
Pith reviewed 2026-05-17 23:50 UTC · model grok-4.3
The pith
X-IONet uses a single IMU with rule-based expert selection and dual-stage attention to deliver accurate odometry for both pedestrians and legged robots.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
X-IONet incorporates a rule-based expert selection module to classify motion platforms and route IMU sequences to platform-specific expert networks. The displacement prediction network features a dual-stage attention architecture that jointly models long-range temporal dependencies and inter-axis correlations, enabling accurate motion representation. It outputs both displacement and associated uncertainty, which are further fused through an Extended Kalman Filter (EKF) for robust state estimation.
What carries the argument
rule-based expert selection module that classifies IMU data as pedestrian or legged-robot motion and routes it to dedicated expert networks equipped with dual-stage attention
If this is right
- A single IMU-based system can replace separate pedestrian and robot navigation pipelines.
- Uncertainty estimates from the attention network directly improve the stability of EKF-based state tracking.
- Error reductions observed on RoNIN, GrandTour, and Go2 datasets follow from the platform-specific routing.
- The same architecture supports deployment in environments that mix human and quadruped motion.
Where Pith is reading between the lines
- Extending the classification rules to wheeled platforms or mixed teams of humans and robots would test the framework's broader applicability.
- The dual-stage attention may capture general motion invariants that could reduce reliance on hand-crafted motion models in other sensor fusion tasks.
- Collecting a larger set of edge-case IMU sequences could reveal whether the expert networks truly generalize beyond the current training distributions.
Load-bearing premise
The rule-based expert selection module can reliably classify IMU sequences as pedestrian or legged-robot motion, and the platform-specific expert networks generalize to new motion patterns not seen during training.
What would settle it
Performance collapse or frequent misclassification when the system encounters unseen gaits, speeds, or surface conditions not represented in the training sets.
Figures
read the original abstract
Learning-based inertial odometry has achieved remarkable progress in pedestrian navigation. However, extending these methods to quadruped robots remains challenging due to their distinct and highly dynamic motion patterns. Models that perform well on pedestrian data often experience severe degradation when deployed on legged platforms. To tackle this challenge, we introduce X-IONet, a cross-platform inertial odometry framework that operates solely using a single Inertial Measurement Unit (IMU). X-IONet incorporates a rule-based expert selection module to classify motion platforms and route IMU sequences to platform-specific expert networks. The displacement prediction network features a dual-stage attention architecture that jointly models long-range temporal dependencies and inter-axis correlations, enabling accurate motion representation. It outputs both displacement and associated uncertainty, which are further fused through an Extended Kalman Filter (EKF) for robust state estimation. Extensive experiments on the public RoNIN pedestrian dataset, the GrandTour quadruped dataset, and a self-collected Go2 quadruped dataset demonstrate that X-IONet achieves state-of-the-art performance, reducing ATE and RTE by 14.3% and 11.4% on RoNIN, 11.8% and 9.7% on GrandTour, and 52.8% and 41.3% on Go2. These results highlight X-IONet's effectiveness for accurate and robust inertial navigation across both human and legged robot platforms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces X-IONet, a cross-platform inertial odometry framework using only a single IMU. It employs a rule-based expert selection module to classify motion platforms (pedestrian vs. legged robot) and route sequences to platform-specific expert networks. These networks use a dual-stage attention architecture to jointly model temporal dependencies and inter-axis correlations, predicting displacement and uncertainty that are fused via an Extended Kalman Filter. Experiments on the RoNIN pedestrian dataset, GrandTour quadruped dataset, and a self-collected Go2 quadruped dataset report state-of-the-art results with ATE/RTE reductions of 14.3%/11.4%, 11.8%/9.7%, and 52.8%/41.3% respectively.
Significance. If the performance gains are shown to be robust and not attributable to platform-specific routing artifacts or implementation details, the work would meaningfully extend learning-based inertial odometry to legged robots by addressing the domain gap in motion dynamics between humans and quadrupeds.
major comments (2)
- [Abstract and Method Description] The rule-based expert selection module is load-bearing for the cross-platform claim, yet the manuscript supplies no explicit classification rules, thresholds, accuracy metrics, or ablation on misclassification rates (e.g., on gait transitions or out-of-distribution IMU patterns). Without these, the reported ATE/RTE improvements cannot be confidently attributed to the dual-stage attention architecture rather than correct expert routing.
- [Experiments] The quantitative SOTA claims lack supporting details on baseline implementations, data splits, statistical testing, or controls for post-hoc tuning. This leaves the central performance improvements only partially supported, as the abstract provides no variance, absolute error values, or comparison methodology.
minor comments (1)
- [Abstract] The abstract reports percentage reductions without accompanying absolute ATE/RTE values or standard deviations, which would aid assessment of practical impact.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We appreciate the referee's careful reading and address each major comment below, outlining the revisions we will incorporate to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract and Method Description] The rule-based expert selection module is load-bearing for the cross-platform claim, yet the manuscript supplies no explicit classification rules, thresholds, accuracy metrics, or ablation on misclassification rates (e.g., on gait transitions or out-of-distribution IMU patterns). Without these, the reported ATE/RTE improvements cannot be confidently attributed to the dual-stage attention architecture rather than correct expert routing.
Authors: We agree that the rule-based expert selection module is central to the cross-platform contribution and that its details must be made explicit. In the revised manuscript, we will add a dedicated subsection describing the classification rules, including the specific IMU-derived features (e.g., acceleration variance thresholds and dominant frequency bands) and decision thresholds used to route sequences to the pedestrian or quadruped expert. We will also report classification accuracy on all three evaluation datasets and include an ablation that quantifies the effect of misclassification rates, with particular attention to gait transitions and out-of-distribution IMU segments. These additions will allow readers to separate the contributions of the routing mechanism from those of the dual-stage attention architecture. revision: yes
-
Referee: [Experiments] The quantitative SOTA claims lack supporting details on baseline implementations, data splits, statistical testing, or controls for post-hoc tuning. This leaves the central performance improvements only partially supported, as the abstract provides no variance, absolute error values, or comparison methodology.
Authors: We acknowledge that the experimental section requires greater transparency to support the reported improvements. The revised manuscript will expand the experimental protocol with full specifications of all baseline implementations (including code references or hyperparameter settings), the exact train/validation/test splits for RoNIN, GrandTour, and the Go2 dataset, and absolute ATE/RTE values accompanied by standard deviations across repeated runs. We will add statistical significance testing (e.g., paired Wilcoxon tests) and explicitly state that no post-hoc tuning was performed on held-out test sequences. These changes will provide a clearer and more reproducible basis for the quantitative claims. revision: yes
Circularity Check
No circularity: empirical results from trained network on public datasets
full rationale
The paper presents an ML architecture (rule-based expert selection routing to platform-specific networks with dual-stage attention, plus EKF fusion) whose performance claims are measured via standard ATE/RTE metrics on held-out sequences from RoNIN, GrandTour, and Go2. No equations or derivations are supplied that reduce the reported gains to quantities defined by the authors' own fitted parameters or self-citations; the method description relies on conventional attention blocks and filtering without self-referential definitions or fitted-input-as-prediction patterns. The central claims therefore remain independent of the inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- rule-based classification thresholds
axioms (1)
- domain assumption IMU measurements provide sufficient information to distinguish motion platform type and to predict displacement
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat embedding and orbit structure unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The displacement prediction network features a dual-stage attention architecture that jointly models long-range temporal dependencies and inter-axis correlations
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
rule-based expert selection module to classify motion platforms
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem,
R. Clark, S. Wang, H. Wen, A. Markham, and N. Trigoni, “Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem,” inProceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017
work page 2017
-
[2]
Esvo2: Direct visual-inertial odometry with stereo event cameras,
J. Niu, S. Zhong, X. Lu, S. Shen, G. Gallego, and Y . Zhou, “Esvo2: Direct visual-inertial odometry with stereo event cameras,”IEEE Transactions on Robotics, 2025
work page 2025
-
[3]
Fast-lio2: Fast direct lidar- inertial odometry,
W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar- inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022
work page 2053
-
[4]
J. Li, S. Yuan, M. Cao, T.-M. Nguyen, K. Cao, and L. Xie, “Hcto: Optimality-aware lidar inertial odometry with hybrid continuous time optimization for compact wearable mapping system,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 211, pp. 228–243, 2024
work page 2024
-
[5]
ig-lio: An incremental gicp- based tightly-coupled lidar-inertial odometry,
Z. Chen, Y . Xu, S. Yuan, and L. Xie, “ig-lio: An incremental gicp- based tightly-coupled lidar-inertial odometry,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1883–1890, 2024
work page 2024
-
[6]
Deep kalman filter: Simultaneous multi-sensor integration and modelling; a gnss/imu case study,
S. Hosseinyalamdary, “Deep kalman filter: Simultaneous multi-sensor integration and modelling; a gnss/imu case study,”sensors, vol. 18, no. 5, p. 1316, 2018
work page 2018
-
[7]
X. Meng, H. Tan, P. Yan, Q. Zheng, G. Chen, and J. Jiang, “A gnss/ins integrated navigation compensation method based on cnn– gru+ irakf hybrid model during gnss outages,”IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–15, 2024
work page 2024
-
[8]
Deep learning for inertial positioning: A survey,
C. Chen and X. Pan, “Deep learning for inertial positioning: A survey,” IEEE transactions on intelligent transportation systems, vol. 25, no. 9, pp. 10 506–10 523, 2024
work page 2024
-
[9]
Ionet: Learning to cure the curse of drift in inertial odometry,
C. Chen, X. Lu, A. Markham, and N. Trigoni, “Ionet: Learning to cure the curse of drift in inertial odometry,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018
work page 2018
-
[10]
Tlio: Tight learned inertial odometry,
W. Liu, D. Caruso, E. Ilg, J. Dong, A. I. Mourikis, K. Daniilidis, V . Kumar, and J. Engel, “Tlio: Tight learned inertial odometry,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5653–5660, 2020
work page 2020
-
[11]
Imunet: Efficient regres- sion architecture for inertial imu navigation and positioning,
B. Zeinali, H. Zanddizari, and M. J. Chang, “Imunet: Efficient regres- sion architecture for inertial imu navigation and positioning,”IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–13, 2024
work page 2024
-
[12]
Ctin: Robust contextual transformer network for inertial navigation,
B. Rao, E. Kazemi, Y . Ding, D. M. Shila, F. M. Tucker, and L. Wang, “Ctin: Robust contextual transformer network for inertial navigation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 5, 2022, pp. 5413–5421
work page 2022
-
[13]
imot: Inertial motion transformer for inertial navigation,
S. M. Nguyen, D. V . Le, and P. Havinga, “imot: Inertial motion transformer for inertial navigation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 6, 2025, pp. 6209– 6217
work page 2025
-
[14]
Ronin: Robust neural inertial navigation in the wild: Benchmark, evaluations, & new methods,
S. Herath, H. Yan, and Y . Furukawa, “Ronin: Robust neural inertial navigation in the wild: Benchmark, evaluations, & new methods,” in2020 IEEE international conference on robotics and automation (ICRA). IEEE, 2020, pp. 3146–3152
work page 2020
-
[15]
Learning inertial odometry for dynamic legged robot state estimation,
R. Buchanan, M. Camurri, F. Dellaert, and M. Fallon, “Learning inertial odometry for dynamic legged robot state estimation,” in Conference on robot learning. PMLR, 2022, pp. 1575–1584
work page 2022
-
[16]
Airio: Learning inertial odometry with enhanced imu feature observability,
Y . Qiu, C. Xu, Y . Chen, S. Zhao, J. Geng, and S. Scherer, “Airio: Learning inertial odometry with enhanced imu feature observability,” arXiv preprint arXiv:2501.15659, 2025
-
[17]
Learned inertial odometry for autonomous drone racing,
G. Cioffi, L. Bauersfeld, E. Kaufmann, and D. Scaramuzza, “Learned inertial odometry for autonomous drone racing,”IEEE Robotics and Automation Letters, vol. 8, no. 5, pp. 2684–2691, 2023
work page 2023
-
[18]
Dido: Deep inertial quadrotor dynamical odometry,
K. Zhang, C. Jiang, J. Li, S. Yang, T. Ma, C. Xu, and F. Gao, “Dido: Deep inertial quadrotor dynamical odometry,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9083–9090, 2022
work page 2022
-
[19]
D. Yang, H. Liu, X. Jin, J. Chen, C. Wang, X. Ding, and K. Xu, “Enhancing vio robustness under sudden lighting variation: A learning- based imu dead-reckoning for uav localization,”IEEE Robotics and Automation Letters, vol. 9, no. 5, pp. 4535–4542, 2024
work page 2024
-
[20]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
work page 2016
-
[21]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997
work page 1997
-
[22]
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv preprint arXiv:1803.01271, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[23]
Bert: Pre-training of deep bidirectional transformers for language understanding,
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186
work page 2019
-
[24]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[25]
Rio: Rotation-equivariance supervised learning of robust inertial odometry,
X. Cao, C. Zhou, D. Zeng, and Y . Wang, “Rio: Rotation-equivariance supervised learning of robust inertial odometry,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6614–6623
work page 2022
-
[26]
Eqnio: Subequivariant neural inertial odometry,
R. K. Jayanth, Y . Xu, Z. Wang, E. Chatzipantazis, D. Gehrig, and K. Daniilidis, “Eqnio: Subequivariant neural inertial odometry,”arXiv preprint arXiv:2408.06321, 2024
-
[27]
Neural inertial odometry from lie events,
R. K. Jayanth, Y . Xu, E. Chatzipantazis, K. Daniilidis, and D. Gehrig, “Neural inertial odometry from lie events,”arXiv preprint arXiv:2505.09780, 2025
-
[28]
Implicit self-augmentation and soft dominance prediction for pedestrian inertial localization,
Y . Li, Z. Shi, Y . Hou, L. Xie, H. Chen, Y . Yan, and E. Yin, “Implicit self-augmentation and soft dominance prediction for pedestrian inertial localization,”IEEE Transactions on Instrumentation and Measure- ment, 2025
work page 2025
-
[29]
Sensor data fusion for body state estimation in a hexapod robot with dynamical gaits,
P.-C. Lin, H. Komsuoglu, and D. E. Koditschek, “Sensor data fusion for body state estimation in a hexapod robot with dynamical gaits,” IEEE Transactions on Robotics, vol. 22, no. 5, pp. 932–943, 2006
work page 2006
-
[30]
The two-state implicit filter recursive estimation for mobile robots,
M. Bloesch, M. Burri, H. Sommer, R. Siegwart, and M. Hutter, “The two-state implicit filter recursive estimation for mobile robots,”IEEE Robotics and Automation Letters, vol. 3, no. 1, pp. 573–580, 2017
work page 2017
-
[31]
Multi-imu propri- oceptive odometry for legged robots,
S. Yang, Z. Zhang, B. Bokser, and Z. Manchester, “Multi-imu propri- oceptive odometry for legged robots,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 774–779
work page 2023
-
[32]
Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,” inThe eleventh international conference on learning representations, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.