MARIO: Motion-Augmented Real-Time Multi-Sensor Inertial Odometry
Pith reviewed 2026-06-28 10:12 UTC · model grok-4.3
The pith
A learned IMU-inferred pose prior enforces human motion constraints to reduce inertial odometry drift by up to 36%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Grounding inertial odometry in human kinematics through a learned IMU-inferred pose prior that promotes physically consistent motion constraints, then integrating this prior into existing IO architectures, reduces positional drift by up to 36 percent on the Nymeria dataset. A sensor-fusion framework that further incorporates auxiliary signals from magnetometers, barometers, and secondary IMUs reduces drift by up to 42 percent and improves robustness and generalization across diverse motion conditions.
What carries the argument
learned IMU-inferred pose prior that enforces physically consistent human motion constraints within the odometry estimation pipeline
If this is right
- Positional drift is reduced by up to 36 percent when the pose prior is integrated into existing IO architectures on the Nymeria dataset.
- A sensor-fusion framework using magnetometers, barometers, and secondary IMUs further reduces positional drift by up to 42 percent.
- The fusion strategy improves robustness and generalization across diverse motion conditions.
- The combined approach unifies human motion kinematics with multimodal sensing to set a new benchmark for camera-less human tracking.
Where Pith is reading between the lines
- The same pose-prior construction could be tested on other large human-motion datasets to check whether the 36 percent drift reduction holds beyond Nymeria.
- The multi-sensor fusion layer might be extended to additional lightweight signals such as heart-rate or GPS when they become available on future AR hardware.
- Longer tracking sessions without drift accumulation could support continuous applications such as indoor navigation or rehabilitation monitoring.
Load-bearing premise
The learned IMU-inferred pose prior accurately captures and enforces human motion dynamics without introducing new errors or biases into the odometry estimates.
What would settle it
Running the baseline IO architecture with and without the learned pose prior on the full Nymeria dataset and finding equal or higher average positional drift when the prior is included would falsify the central claim.
Figures
read the original abstract
Inertial odometry (IO) using only Inertial Measurement Units (IMUs) provides a lightweight solution for human motion tracking in augmented reality (AR) and wearable devices. Recent learning-based IO methods have improved the generalizability of inertial localization through large-scale pretraining on human motion datasets. However, these approaches remain prone to drift and noise because they do not explicitly capture human motion dynamics, especially on daily activity datasets such as Nymeria. In this work, we propose to ground inertial odometry in human kinematics through a learned IMU-inferred pose prior, which promotes physically consistent motion constraints. We integrate this pose prior into existing IO architectures and reduce positional drift by up to 36% on the challenging Nymeria dataset, which is 5x larger than datasets used in prior work. We further improve long-term performance with a sensor-fusion framework that incorporates auxiliary signals from lightweight sensors already available on commercial AR glasses, including magnetometers, barometers, and secondary IMUs. With this fusion strategy, positional drift is reduced by up to 42%, improving robustness and generalization across diverse motion conditions. Together, our results introduce a new paradigm for inertial and lightweight odometry by unifying human motion kinematics with multimodal sensing, setting a new benchmark for accurate and robust camera-less human tracking. Our website is available at https://spice-lab.org/projects/MARIO/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents MARIO, a learning-based inertial odometry framework that augments existing IO architectures with a learned IMU-inferred pose prior derived from human motion data to enforce physically consistent kinematic constraints. It reports up to 36% reduction in positional drift on the large-scale Nymeria dataset (5x larger than prior benchmarks) and up to 42% further improvement via fusion of auxiliary sensors (magnetometers, barometers, secondary IMUs) available on commercial AR glasses, claiming a new paradigm for camera-less, robust human tracking.
Significance. If the pose prior demonstrably supplies independent kinematic constraints rather than additional supervised capacity, the work would advance lightweight, drift-resistant odometry for AR/wearables by scaling to daily activities on substantially larger datasets and integrating readily available multimodal signals. The reported gains on Nymeria would be notable if supported by ablations isolating the prior's contribution.
major comments (2)
- [Abstract and §4 (method description)] The central claim that the IMU-inferred pose prior 'promotes physically consistent motion constraints' (abstract) lacks supporting evidence: no evaluation shows that output trajectories satisfy independent kinematic invariants (e.g., near-zero foot-contact velocity, pelvis height bounds, or joint-angle limits) at higher rates than the baseline IO method, nor any ablation that isolates the prior from network capacity or from the auxiliary-sensor fusion module.
- [Abstract and experimental results section] The reported 36% and 42% positional-drift reductions are presented without error bars, statistical significance tests, or details on data exclusion criteria and train/test splits on Nymeria; this makes it impossible to determine whether the gains are robust or could be explained by dataset-specific correlations rather than the kinematic prior.
minor comments (2)
- [Abstract] The abstract states Nymeria is '5x larger than datasets used in prior work' but does not name the prior datasets or provide size comparisons in a table.
- [Methods] Notation for the pose prior (e.g., how it is integrated as a loss term or constraint into the base IO architecture) should be formalized with an equation in the methods section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and §4 (method description)] The central claim that the IMU-inferred pose prior 'promotes physically consistent motion constraints' (abstract) lacks supporting evidence: no evaluation shows that output trajectories satisfy independent kinematic invariants (e.g., near-zero foot-contact velocity, pelvis height bounds, or joint-angle limits) at higher rates than the baseline IO method, nor any ablation that isolates the prior from network capacity or from the auxiliary-sensor fusion module.
Authors: We agree that the current manuscript does not include direct evaluations of kinematic invariants (such as foot-contact velocity or pelvis height bounds) or ablations that isolate the pose prior from network capacity and the auxiliary-sensor fusion module. In the revised version, we will add quantitative comparisons of these kinematic metrics against baselines and controlled ablations that vary network capacity while holding other components fixed to isolate the prior's contribution. revision: yes
-
Referee: [Abstract and experimental results section] The reported 36% and 42% positional-drift reductions are presented without error bars, statistical significance tests, or details on data exclusion criteria and train/test splits on Nymeria; this makes it impossible to determine whether the gains are robust or could be explained by dataset-specific correlations rather than the kinematic prior.
Authors: We acknowledge that the reported improvements lack error bars, statistical significance tests, and explicit details on Nymeria data splits and exclusion criteria. In the revision, we will include error bars or confidence intervals on all reported metrics, conduct and report statistical significance tests, and provide full documentation of the train/test splits along with any exclusion criteria applied to the dataset. revision: yes
Circularity Check
No circularity: empirical integration of learned prior with external validation
full rationale
The paper presents a learned IMU-inferred pose prior integrated into existing IO architectures, with reported positional drift reductions (36%/42%) on the external Nymeria dataset. No equations, self-citations, or parameter-fitting steps are exhibited that reduce the central claims to inputs by construction. The derivation chain consists of architectural integration and multimodal fusion whose outputs are evaluated against independent benchmarks rather than being definitionally equivalent to the training data or prior results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Rio: Rotation-equivariance supervised learning of robust in- ertial odometry
Xiya Cao, Caifa Zhou, Dandan Zeng, and Yongliang Wang. Rio: Rotation-equivariance supervised learning of robust in- ertial odometry. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 6614–6623, 2022. 2
2022
-
[2]
Ionet: Learning to cure the curse of drift in inertial odometry
Changhao Chen, Xiaoxuan Lu, Andrew Markham, and Niki Trigoni. Ionet: Learning to cure the curse of drift in inertial odometry. InProceedings of the AAAI conference on artifi- cial intelligence, 2018. 2
2018
-
[3]
Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem, 2017
Ronald Clark, Sen Wang, Hongkai Wen, Andrew Markham, and Niki Trigoni. Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem, 2017. 1
2017
-
[4]
Project Aria: A New Tool for Egocentric Multi-Modal AI Research
Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, et al. Project aria: A new tool for egocentric multi-modal ai research.arXiv preprint arXiv:2308.13561, 2023. 2, 4
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Visual-inertial navigation: A concise re- view
Guoquan Huang. Visual-inertial navigation: A concise re- view. In2019 international conference on robotics and au- tomation (ICRA), pages 9572–9582. IEEE, 2019. 3
2019
-
[6]
Deep iner- tial poser: Learning to reconstruct human pose from sparse inertial measurements in real time.ACM Transactions on Graphics (TOG), 37(6):1–15, 2018
Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J Black, Otmar Hilliges, and Gerard Pons-Moll. Deep iner- tial poser: Learning to reconstruct human pose from sparse inertial measurements in real time.ACM Transactions on Graphics (TOG), 37(6):1–15, 2018. 3
2018
-
[7]
Eqnio: Subequivariant neural inertial odometry.arXiv preprint arXiv:2408.06321, 2024
Royina Karegoudra Jayanth, Yinshuang Xu, Ziyun Wang, Evangelos Chatzipantazis, Daniel Gehrig, and Kostas Dani- ilidis. Eqnio: Subequivariant neural inertial odometry.arXiv preprint arXiv:2408.06321, 2024. 1, 2, 3, 6
-
[8]
Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation
Yifeng Jiang, Yuting Ye, Deepak Gopinath, Jungdam Won, Alexander W Winkler, and C Karen Liu. Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation. InSIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022. 3
2022
-
[9]
A. R. Jimenez, F. Seco, C. Prieto, and J. Guevara. A compar- ison of pedestrian dead-reckoning algorithms using a low- cost mems imu. In2009 6th IEEE International Symposium on Intelligent Signal Processing, pages 37–42. IEEE, 2009. 2
2009
-
[10]
Jim ´enez, F
A.R. Jim ´enez, F. Seco, J.C. Prieto, and J. Guevara. Indoor pedestrian navigation using an ins/ekf framework for yaw drift reduction and a foot-mounted imu. In2010 7th Work- shop on Positioning, Navigation and Communication, pages 135–143, 2010. 2
2010
-
[11]
Vqf: Highly accurate imu orientation estimation with bias estimation and magnetic dis- turbance rejection.Information Fusion, 91:187–204, 2023
Daniel Laidig and Thomas Seel. Vqf: Highly accurate imu orientation estimation with bias estimation and magnetic dis- turbance rejection.Information Fusion, 91:187–204, 2023. 3
2023
-
[12]
Lidar odometry survey: recent advancements and re- maining challenges.Intelligent Service Robotics, 17(2):95– 118, 2024
Dongjae Lee, Minwoo Jung, Wooseong Yang, and Ayoung Kim. Lidar odometry survey: recent advancements and re- maining challenges.Intelligent Service Robotics, 17(2):95– 118, 2024. 3
2024
-
[13]
Mins: Efficient and robust multisensor-aided inertial navigation system, 2023
Woosik Lee, Patrick Geneva, Chuchu Chen, and Guoquan Huang. Mins: Efficient and robust multisensor-aided inertial navigation system, 2023. 3
2023
-
[14]
Ultraposer: Pushing the limits of imu-based full- body pose estimation with ultrasound sensing on consumer wearables
Yadong Li, Shuning Wang, Yongjian Fu, Justin Chen, 2 Xingyu Chen, Ju Ren, Xinyu Zhang, Akshay Gadre, and Ke Sun. Ultraposer: Pushing the limits of imu-based full- body pose estimation with ultrasound sensing on consumer wearables. InProceedings of the 38th Annual ACM Sym- posium on User Interface Software and Technology, pages 1–15, 2025. 3
2025
-
[15]
M2eit: Multi-domain mixture of experts for robust neural inertial tracking
Yan Li, Yang Xu, Changhao Chen, Zhongchen Shi, Wei Chen, Liang Xie, Hongbo Chen, and Erwei Yin. M2eit: Multi-domain mixture of experts for robust neural inertial tracking. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 28207– 28216, 2025. 2
2025
-
[16]
Mourikis, Kostas Daniilidis, Vijay Kumar, and Jakob Engel
Wenxin Liu, David Caruso, Eddy Ilg, Jing Dong, Anasta- sios I. Mourikis, Kostas Daniilidis, Vijay Kumar, and Jakob Engel. Tlio: Tight learned inertial odometry.IEEE Robotics and Automation Letters, 5(4):5653–5660, 2020. 2, 3, 5, 6
2020
-
[17]
Smpl: A skinned multi- person linear model
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: A skinned multi- person linear model. InSeminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023. 2
2023
-
[18]
Aria everyday activ- ities dataset, 2024
Zhaoyang Lv, Nicholas Charron, Pierre Moulon, Alexan- der Gamino, Cheng Peng, Chris Sweeney, Edward Miller, Huixuan Tang, Jeff Meissner, Jing Dong, Kiran Somasun- daram, Luis Pesqueira, Mark Schwesinger, Omkar Parkhi, Qiao Gu, Renzo De Nardi, Shangyi Cheng, Steve Saarinen, Vijay Baiyya, Yuyang Zou, Richard Newcombe, Jakob Ju- lian Engel, Xiaqing Pan, and ...
2024
-
[19]
Karen Liu, Ziwei Liu, Jakob En- gel, Renzo De Nardi, and Richard Newcombe
Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexan- der Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob En- gel, Renzo De Nardi, and Richard Newcombe. Nymeria: A massive collection of multimodal egocentric daily motion in the wild, 2024. 1, 2, 4, 5
2024
-
[20]
Sebastian O. H. Madgwick, Andrew J. L. Harrison, and Ravi Vaidyanathan. Estimation of imu and marg orientation using a gradient descent algorithm. In2011 IEEE International Conference on Rehabilitation Robotics (ICORR), pages 1–7, Zurich, Switzerland, 2011. IEEE. 3
2011
-
[21]
Nonlinear complementary filters on the special orthogonal group.IEEE Transactions on Automatic Control, 53(5): 1203–1218, 2008
Robert Mahony, Tarek Hamel, and Jean-Michel Pflimlin. Nonlinear complementary filters on the special orthogonal group.IEEE Transactions on Automatic Control, 53(5): 1203–1218, 2008. 3
2008
-
[22]
Imuposer: Full-body pose estimation using imus in phones, watches, and earbuds
Vimal Mollyn, Riku Arakawa, Mayank Goel, Chris Harri- son, and Karan Ahuja. Imuposer: Full-body pose estimation using imus in phones, watches, and earbuds. InProceedings of the 2023 CHI Conference on Human Factors in Comput- ing Systems, pages 1–12, 2023. 3
2023
-
[23]
Vins-mono: A robust and versatile monocular visual-inertial state estimator
Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018. 1
2018
-
[24]
Airio: Learning inertial odometry with enhanced imu feature observability, 2025
Yuheng Qiu, Can Xu, Yutian Chen, Shibo Zhao, Junyi Geng, and Sebastian Scherer. Airio: Learning inertial odometry with enhanced imu feature observability, 2025. 1, 2, 3, 6
2025
-
[25]
Magshield: Towards better robustness in sparse inertial motion capture under magnetic disturbances,
Yunzhe Shao, Xinyu Yi, Lu Yin, Shihui Guo, Junhai Yong, and Feng Xu. Magshield: Towards better robustness in sparse inertial motion capture under magnetic disturbances,
-
[26]
Idol: Iner- tial deep orientation-estimation and localization
Scott Sun, Dennis Melamed, and Kris Kitani. Idol: Iner- tial deep orientation-estimation and localization. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 6128–6137, 2021. 2
2021
-
[27]
Diffusionposer: Real-time human motion reconstruction from arbitrary sparse sensors using autoregressive diffusion
Tom Van Wouwe, Seunghwan Lee, Antoine Falisse, Scott Delp, and C Karen Liu. Diffusionposer: Real-time human motion reconstruction from arbitrary sparse sensors using autoregressive diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2513–2523, 2024. 3
2024
-
[28]
Sparse inertial poser: Automatic 3d hu- man pose estimation from sparse imus
Timo V on Marcard, Bodo Rosenhahn, Michael J Black, and Gerard Pons-Moll. Sparse inertial poser: Automatic 3d hu- man pose estimation from sparse imus. InComputer graph- ics forum, pages 349–360. Wiley Online Library, 2017. 3
2017
-
[29]
Ego4o: Egocentric human motion capture and understanding from multi-modal input, 2025
Jian Wang, Rishabh Dabral, Diogo Luvizon, Zhe Cao, Lingjie Liu, Thabo Beeler, and Christian Theobalt. Ego4o: Egocentric human motion capture and understanding from multi-modal input, 2025. 1
2025
-
[30]
Xsens IMU Systems.https: //www.xsens.com
Xsens Technologies B.V . Xsens IMU Systems.https: //www.xsens.com. Accessed: 2024-03-07. 3
2024
-
[31]
Mobileposer: Real-time full-body pose estimation and 3d human translation from imus in mobile consumer de- vices
Vasco Xu, Chenfeng Gao, Henry Hoffmann, and Karan Ahuja. Mobileposer: Real-time full-body pose estimation and 3d human translation from imus in mobile consumer de- vices. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, New York, NY , USA, 2024. Association for Computing Machinery. 3
2024
-
[32]
Ridi: Robust imu double integration
Hang Yan, Qi Shan, and Yasutaka Furukawa. Ridi: Robust imu double integration. InProceedings of the European con- ference on computer vision (ECCV), pages 621–636, 2018. 2
2018
-
[33]
Ronin: Robust neural inertial navigation in the wild: Benchmark, evaluations, and new methods, 2019
Hang Yan, Sachini Herath, and Yasutaka Furukawa. Ronin: Robust neural inertial navigation in the wild: Benchmark, evaluations, and new methods, 2019. 1, 2, 3, 6
2019
-
[34]
Tof-ip: time-of-flight enhanced sparse inertial poser for real-time human motion capture
Yuan Yao, Shifan Jiang, Yangqing Hou, Chengxu Zuo, Xin- rui Chen, Shihui Guo, and Yipeng Qin. Tof-ip: time-of-flight enhanced sparse inertial poser for real-time human motion capture. 2025. 3
2025
-
[35]
Transpose: Real-time 3d human translation and pose estimation with six inertial sensors.ACM Transactions on Graphics (TOG), 40(4):1–13,
Xinyu Yi, Yuxiao Zhou, and Feng Xu. Transpose: Real-time 3d human translation and pose estimation with six inertial sensors.ACM Transactions on Graphics (TOG), 40(4):1–13,
-
[36]
Phys- ical inertial poser (pip): Physics-aware real-time human mo- tion tracking from sparse inertial sensors
Xinyu Yi, Yuxiao Zhou, Marc Habermann, Soshi Shimada, Vladislav Golyanik, Christian Theobalt, and Feng Xu. Phys- ical inertial poser (pip): Physics-aware real-time human mo- tion tracking from sparse inertial sensors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13167–13178, 2022. 3
2022
-
[37]
Physical non-inertial poser (pnp): modeling non-inertial effects in sparse-inertial human motion capture
Xinyu Yi, Yuxiao Zhou, and Feng Xu. Physical non-inertial poser (pnp): modeling non-inertial effects in sparse-inertial human motion capture. InACM SIGGRAPH 2024 Confer- ence Papers, pages 1–11, 2024. 3
2024
-
[38]
Improving global motion estimation in sparse imu-based motion capture with physics.ACM Transactions on Graphics (TOG), 44(4):1–16,
Xinyu Yi, Shaohua Pan, and Feng Xu. Improving global motion estimation in sparse imu-based motion capture with physics.ACM Transactions on Graphics (TOG), 44(4):1–16,
-
[39]
Baroposer: Real-time human motion tracking from imus and barometers in every- day devices
Libo Zhang, Xinyu Yi, and Feng Xu. Baroposer: Real-time human motion tracking from imus and barometers in every- day devices. InProceedings of the 38th Annual ACM Sympo- sium on User Interface Software and Technology, page 1–9. ACM, 2025. 3
2025
-
[40]
Tartan imu: A light foundation model for inertial positioning in robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard, Yuheng Qiu, Wenshan Wang, and Sebastian Scherer. Tartan imu: A light foundation model for inertial positioning in robotics. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 22520–22529, 2025. 2
2025
-
[41]
On the continuity of rotation representations in neural networks, 2020
Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks, 2020. 3
2020
-
[42]
Transformer imu calibrator: Dynamic on-body imu calibration for inertial motion capture.ACM Transac- tions on Graphics (TOG), 44(4):1–14, 2025
Chengxu Zuo, Jiawei Huang, Xiao Jiang, Yuan Yao, Xian- gren Shi, Rui Cao, Xinyu Yi, Feng Xu, Shihui Guo, and Yipeng Qin. Transformer imu calibrator: Dynamic on-body imu calibration for inertial motion capture.ACM Transac- tions on Graphics (TOG), 44(4):1–14, 2025. 3 4
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.